- Linux-kselftest-mirror - lists.linaro.org

[RFC PATCH 0/7] mm/damon: remove DAMON debugfs interface

by SeongJae Park

DAMON debugfs interface was the only user interface of DAMON at the beginning[1]. However, it turned out the interface would be not good enough for long-term flexibility and stability. In Feb 2022[2], we therefore introduced DAMON sysfs interface as an alternative user interface that aims long-term flexibility and stability. With its introduction, DAMON debugfs interface has announced to be deprecated in near future. In Feb 2023[3], we announced the official deprecation of DAMON debugfs interface. In Jan 2024[4], we further made the deprecation difficult to be ignored. And as of this writing (2024-10-14), no problem or concerns about the deprecation have reported. Apparently users are already moved to the alternative, or made good plans for the change. Remove the DAMON debugfs interface code from the tree. Given the past timeline and the absence of reported problems or concerns, it is safe enough to be done. That said, we will not drop the RFC tag of this patch series at least until the end of this year, to use this as the real last call for users. [1] https://lore.kernel.org/20210716081449.22187-1-sj38.park@gmail.com [2] https://lore.kernel.org/20220228081314.5770-1-sj@kernel.org [3] https://lore.kernel.org/20230209192009.7885-1-sj@kernel.org [4] https://lore.kernel.org/20240130013549.89538-1-sj@kernel.org SeongJae Park (7): Docs/admin-guide/mm/damon/usage: remove DAMON debugfs interface documentation Docs/mm/damon/design: update for removal of DAMON debugfs interface selftests/damon/config: remove configs for DAMON debugfs interface selftests selftests/damon: remove tests for DAMON debugfs interface kunit: configs: remove configs for DAMON debugfs interface tests mm/damon: remove DAMON debugfs interface kunit tests mm/damon: remove DAMON debugfs interface Documentation/admin-guide/mm/damon/usage.rst | 309 ----- Documentation/mm/damon/design.rst | 23 +- mm/damon/Kconfig | 30 - mm/damon/Makefile | 1 - mm/damon/dbgfs.c | 1148 ----------------- mm/damon/tests/.kunitconfig | 7 - mm/damon/tests/dbgfs-kunit.h | 173 --- tools/testing/kunit/configs/all_tests.config | 3 - tools/testing/selftests/damon/.gitignore | 3 - tools/testing/selftests/damon/Makefile | 11 +- tools/testing/selftests/damon/config | 1 - .../testing/selftests/damon/debugfs_attrs.sh | 17 - .../debugfs_duplicate_context_creation.sh | 27 - .../selftests/damon/debugfs_empty_targets.sh | 21 - .../damon/debugfs_huge_count_read_write.sh | 22 - .../damon/debugfs_rm_non_contexts.sh | 19 - .../selftests/damon/debugfs_schemes.sh | 19 - .../selftests/damon/debugfs_target_ids.sh | 19 - .../damon/debugfs_target_ids_pid_leak.c | 68 - .../damon/debugfs_target_ids_pid_leak.sh | 22 - ...fs_target_ids_read_before_terminate_race.c | 80 -- ...s_target_ids_read_before_terminate_race.sh | 14 - .../selftests/damon/huge_count_read_write.c | 48 - 23 files changed, 11 insertions(+), 2074 deletions(-) delete mode 100644 mm/damon/dbgfs.c delete mode 100644 mm/damon/tests/dbgfs-kunit.h delete mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh delete mode 100755 tools/testing/selftests/damon/debugfs_duplicate_context_creation.sh delete mode 100755 tools/testing/selftests/damon/debugfs_empty_targets.sh delete mode 100755 tools/testing/selftests/damon/debugfs_huge_count_read_write.sh delete mode 100755 tools/testing/selftests/damon/debugfs_rm_non_contexts.sh delete mode 100755 tools/testing/selftests/damon/debugfs_schemes.sh delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids.sh delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.c delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.sh delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.c delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.sh delete mode 100644 tools/testing/selftests/damon/huge_count_read_write.c base-commit: 5ef943709a1b88304aa6e8cb8683a25bf81874f0 -- 2.39.5

1 year, 2 months

1
4
0 0

[PATCH net-next v04 0/3] net: af_packet: allow joining a fanout when link is down

by Gur Stavi

PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING. This scenario was identified while studying DPDK pmd_af_packet_drv. Since sockets are only created during initialization, there is no reason to fail the initialization if a single link is temporarily down. This patch allows PACKET socket to join a fanout while not RUNNING. Selftest psock_fanout is extended to test this "fanout while link down" scenario. Selftest psock_fanout is also extended to test fanout create/join by socket that did not bind or specified a protocol, which carries an implicit bind. This is the only test that was performed. Changes: V04: * Minimized code change. * Removed test of ifindex. A socket that went through bind "unlisted" race can join a fanout. V03: https://lore.kernel.org/netdev/cover.1728555449.git.gur.stavi@huawei.com * psock_fanout: add test for joining fanout with unbound socket. * Test that socket can receive packets before adding it to a fanout match. This is kind of replaces the RUNNING test that was removed. * Initialize po->ifindex in packet_create. To -1 if no protocol is specified and add an explicit initialization to 0 if protocol is specified. * Refactor relevant code in fanout_add within bind_lock, as a sequence of if {} else if {}, in order to reduce indentation of nested if statements and provide specific error codes. V02: https://lore.kernel.org/netdev/cover.1728382839.git.gur.stavi@huawei.com * psock_fanout: use explicit loopback up/down instead of toggle. * psock_fanout: don't try to restore loopback state on failure. * Rephrase commit message about "leaving a fanout". V01: https://lore.kernel.org/netdev/cover.1728303615.git.gur.stavi@huawei.com/ Gur Stavi (3): af_packet: allow fanout_add when socket is not RUNNING selftests: net/psock_fanout: socket joins fanout when link is down selftests: net/psock_fanout: unbound socket fanout net/packet/af_packet.c | 9 +-- tools/testing/selftests/net/psock_fanout.c | 78 +++++++++++++++++++++- 2 files changed, 80 insertions(+), 7 deletions(-) base-commit: c531f2269a53db5cf64b24baf785ccbcda52970f -- 2.45.2

1 year, 2 months

3
7
0 0

[PATCH net-next v1 0/3] Threads support in proc connector

by Anjali Kulkarni

Recently we committed a fix to allow processes to receive notifications for non-zero exits via the process connector module. Commit is a4c9a56e6a2c. However, for threads, when it does a pthread_exit(&exit_status) call, the kernel is not aware of the exit status with which pthread_exit is called. It is sent by child thread to the parent process, if it is waiting in pthread_join(). Hence, for a thread exiting abnormally, kernel cannot send notifications to any listening processes. The exception to this is if the thread is sent a signal which it has not handled, and dies along with it's process as a result; for eg. SIGSEGV or SIGKILL. In this case, kernel is aware of the non-zero exit and sends a notification for it. For our use case, we cannot have parent wait in pthread_join, one of the main reasons for this being that we do not want to track normal pthread_exit(), which could be a very large number. We only want to be notified of any abnormal exits. Hence, threads are created with pthread_attr_t set to PTHREAD_CREATE_DETACHED. To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector API, which allows a thread to send it's exit status to kernel either when it needs to call pthread_exit() with non-zero value to indicate some error or from signal handler before pthread_exit(). v->v1 changes: - Handled comment by Simon Horman to remove unused err in cn_proc.c - Handled comment by Simon Horman to make adata and key_display static in cn_hash_test.c Anjali Kulkarni (3): connector/cn_proc: Add hash table for threads connector/cn_proc: Kunit tests for threads hash table connector/cn_proc: Selftest for threads drivers/connector/Makefile | 2 +- drivers/connector/cn_hash.c | 240 ++++++++++++++++++ drivers/connector/cn_proc.c | 58 ++++- drivers/connector/connector.c | 96 ++++++- include/linux/connector.h | 47 ++++ include/linux/sched.h | 2 +- include/uapi/linux/cn_proc.h | 4 +- lib/Kconfig.debug | 17 ++ lib/Makefile | 1 + lib/cn_hash_test.c | 167 ++++++++++++ lib/cn_hash_test.h | 12 + tools/testing/selftests/connector/Makefile | 23 +- .../testing/selftests/connector/proc_filter.c | 5 + tools/testing/selftests/connector/thread.c | 90 +++++++ .../selftests/connector/thread_filter.c | 93 +++++++ 15 files changed, 847 insertions(+), 10 deletions(-) create mode 100644 drivers/connector/cn_hash.c create mode 100644 lib/cn_hash_test.c create mode 100644 lib/cn_hash_test.h create mode 100644 tools/testing/selftests/connector/thread.c create mode 100644 tools/testing/selftests/connector/thread_filter.c -- 2.46.0

1 year, 2 months

3
9
0 0

[PATCH v2 0/6] Make set_dev_pasid op supporting domain replacement

by Yi Liu

This splits the preparation works of the iommu and the Intel iommu driver out from the iommufd pasid attach/replace series. [1] To support domain replacement, the definition of the set_dev_pasid op needs to be enhanced. Meanwhile, the existing set_dev_pasid callbacks should be extended as well to suit the new definition. This series first prepares the Intel iommu set_dev_pasid op for the new definition, adds the missing set_dev_pasid support for nested domain, makes ARM SMMUv3 set_dev_pasid op to suit the new definition, and in the end enhances the definition of set_dev_pasid op. The AMD set_dev_pasid callback is extended to fail if the caller tries to do domain replacement to meet the new definition of set_dev_pasid op. AMD iommu driver would support it later per Vasant [2]. [1] https://lore.kernel.org/linux-iommu/20240412081516.31168-1-yi.l.liu@intel.c… [2] https://lore.kernel.org/linux-iommu/fa9c4fc3-9365-465e-8926-b4d2d6361b9c@am… v2: - Make ARM SMMUv3 set_dev_pasid op support domain replacement (Jason) - Drop patch 03 of v1 (Kevin) - Multiple tweaks in VT-d driver (Kevin) v1: https://lore.kernel.org/linux-iommu/20240628085538.47049-1-yi.l.liu@intel.c… Regards, Yi Liu Jason Gunthorpe (1): iommu/arm-smmu-v3: Make smmuv3 set_dev_pasid() op support replace Lu Baolu (1): iommu/vt-d: Add set_dev_pasid callback for nested domain Yi Liu (4): iommu: Pass old domain to set_dev_pasid op iommu/vt-d: Move intel_drain_pasid_prq() into intel_pasid_tear_down_entry() iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement iommu: Make set_dev_pasid op support domain replacement drivers/iommu/amd/amd_iommu.h | 3 +- drivers/iommu/amd/pasid.c | 6 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +- drivers/iommu/intel/iommu.c | 122 ++++++++++++------ drivers/iommu/intel/iommu.h | 3 + drivers/iommu/intel/nested.c | 1 + drivers/iommu/intel/pasid.c | 13 +- drivers/iommu/intel/pasid.h | 8 +- drivers/iommu/intel/svm.c | 6 +- drivers/iommu/iommu.c | 3 +- include/linux/iommu.h | 5 +- 13 files changed, 129 insertions(+), 56 deletions(-) -- 2.34.1

1 year, 2 months

7
33
0 0

[PATCH] selftests: clone3: Use the capget and capset syscall directly

by zhouyuhang

From: zhouyuhang <zhouyuhang(a)kylinos.cn> The libcap commit aca076443591 ("Make cap_t operations thread safe.") added a __u8 mutex at the beginning of the struct _cap_struct,it changes the offset of the members in the structure that breaks the assumption made in the "struct libcap" definition in clone3_cap_checkpoint_restore.c.So use the capget and capset syscall directly and remove the libcap library dependency like the commit 663af70aabb7 ("bpf: selftests: Add helpers to directly use the capget and capset syscall") does. Signed-off-by: zhouyuhang <zhouyuhang(a)kylinos.cn> --- tools/testing/selftests/clone3/Makefile | 1 - .../clone3/clone3_cap_checkpoint_restore.c | 60 +++++++++---------- 2 files changed, 28 insertions(+), 33 deletions(-) diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile index 84832c369a2e..59d26e8da8d2 100644 --- a/tools/testing/selftests/clone3/Makefile +++ b/tools/testing/selftests/clone3/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 CFLAGS += -g -std=gnu99 $(KHDR_INCLUDES) -LDLIBS += -lcap TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid \ clone3_cap_checkpoint_restore diff --git a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c index 3c196fa86c99..111912e2aead 100644 --- a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c +++ b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c @@ -15,7 +15,7 @@ #include <stdio.h> #include <stdlib.h> #include <stdbool.h> -#include <sys/capability.h> +#include <linux/capability.h> #include <sys/prctl.h> #include <sys/syscall.h> #include <sys/types.h> @@ -27,6 +27,13 @@ #include "../kselftest_harness.h" #include "clone3_selftests.h" +#ifndef CAP_CHECKPOINT_RESTORE +#define CAP_CHECKPOINT_RESTORE 40 +#endif + +int capget(cap_user_header_t header, cap_user_data_t data); +int capset(cap_user_header_t header, const cap_user_data_t data); + static void child_exit(int ret) { fflush(stdout); @@ -87,47 +94,36 @@ static int test_clone3_set_tid(struct __test_metadata *_metadata, return ret; } -struct libcap { - struct __user_cap_header_struct hdr; - struct __user_cap_data_struct data[2]; -}; - static int set_capability(void) { - cap_value_t cap_values[] = { CAP_SETUID, CAP_SETGID }; - struct libcap *cap; - int ret = -1; - cap_t caps; - - caps = cap_get_proc(); - if (!caps) { - perror("cap_get_proc"); + struct __user_cap_data_struct data[2]; + struct __user_cap_header_struct hdr = { + .version = _LINUX_CAPABILITY_VERSION_3, + }; + __u32 cap0 = 1 << CAP_SETUID | 1 << CAP_SETGID; + __u32 cap1 = 1 << (CAP_CHECKPOINT_RESTORE - 32); + int ret; + + ret = capget(&hdr, data); + if (ret) { + perror("capget"); return -1; } /* Drop all capabilities */ - if (cap_clear(caps)) { - perror("cap_clear"); - goto out; - } + memset(&data, 0, sizeof(data)); - cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_values, CAP_SET); - cap_set_flag(caps, CAP_PERMITTED, 2, cap_values, CAP_SET); + data[0].effective |= cap0; + data[0].permitted |= cap0; - cap = (struct libcap *) caps; + data[1].effective |= cap1; + data[1].permitted |= cap1; - /* 40 -> CAP_CHECKPOINT_RESTORE */ - cap->data[1].effective |= 1 << (40 - 32); - cap->data[1].permitted |= 1 << (40 - 32); - - if (cap_set_proc(caps)) { - perror("cap_set_proc"); - goto out; + ret = capset(&hdr, data); + if (ret) { + perror("capset"); + return -1; } - ret = 0; -out: - if (cap_free(caps)) - perror("cap_free"); return ret; } -- 2.25.1

1 year, 2 months

2
7
0 0

[PATCH v13 00/40] arm64/gcs: Provide support for GCS in userspace

by Mark Brown

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling. When GCS is active a secondary stack called the Guarded Control Stack is maintained, protected with a memory attribute which means that it can only be written with specific GCS operations. The current GCS pointer can not be directly written to by userspace. When a BL is executed the value stored in LR is also pushed onto the GCS, and when a RET is executed the top of the GCS is popped and compared to LR with a fault being raised if the values do not match. GCS operations may only be performed on GCS pages, a data abort is generated if they are not. The combination of hardware enforcement and lack of extra instructions in the function entry and exit paths should result in something which has less overhead and is more difficult to attack than a purely software implementation like clang's shadow stacks. This series implements support for use of GCS by userspace, along with support for use of GCS within KVM guests. It does not enable use of GCS by either EL1 or EL2, this will be implemented separately. Executables are started without GCS and must use a prctl() to enable it, it is expected that this will be done very early in application execution by the dynamic linker or other startup code. For dynamic linking this will be done by checking that everything in the executable is marked as GCS compatible. x86 has an equivalent feature called shadow stacks, this series depends on the x86 patches for generic memory management support for the new guarded/shadow stack page type and shares APIs as much as possible. As there has been extensive discussion with the wider community around the ABI for shadow stacks I have as far as practical kept implementation decisions close to those for x86, anticipating that review would lead to similar conclusions in the absence of strong reasoning for divergence. The main divergence I am concious of is that x86 allows shadow stack to be enabled and disabled repeatedly, freeing the shadow stack for the thread whenever disabled, while this implementation keeps the GCS allocated after disable but refuses to reenable it. This is to avoid races with things actively walking the GCS during a disable, we do anticipate that some systems will wish to disable GCS at runtime but are not aware of any demand for subsequently reenabling it. x86 uses an arch_prctl() to manage enable and disable, since only x86 and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a patch set for the equivalent RISC-V Zicfiss feature which I initially adopted fairly directly but following review feedback has been revised quite a bit. We currently maintain the x86 pattern of implicitly allocating a shadow stack for threads started with shadow stack enabled, there has been some discussion of removing this support and requiring the use of clone3() with explicit allocation of shadow stacks instead. I have no strong feelings either way, implicit allocation is not really consistent with anything else we do and creates the potential for errors around thread exit but on the other hand it is existing ABI on x86 and minimises the changes needed in userspace code. glibc and bionic changes using this ABI have been implemented and tested. Headless Android systems have been validated and Ross Burton has used this code has been used to bring up a Yocto system with GCS enabed as standard, a test implementation of V8 support has also been done. uprobes are not currently supported, missing emulation was identified late in review. There is an open issue with support for CRIU, on x86 this required the ability to set the GCS mode via ptrace. This series supports configuring mode bits other than enable/disable via ptrace but it needs to be confirmed if this is sufficient. It is likely that we could relax some of the barriers added here with some more targeted placements, this is left for further study. There is an in process series adding clone3() support for shadow stacks: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke… Previous versions of this series depended on that, this dependency has been removed in order to make merging easier. [1] https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v13: - Rebase onto v6.12-rc1. - Allocate VM_HIGH_ARCH_6 since protection keys used all the existing bits. - Implement mm_release() and free transparently allocated GCSs there. - Use bit 32 of AT_HWCAP for GCS due to AT_HWCAP2 being filled. - Since we now only set GCSCRE0_EL1 on change ensure that it is initialised with GCSPR_EL0 accessible to EL0. - Fix OOM handling on thread copy. - Link to v12: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec947436a@kernel.org Changes in v12: - Clarify and simplify the signal handling code so we work with the register state. - When checking for write aborts to shadow stack pages ensure the fault is a data abort. - Depend on !UPROBES. - Comment cleanups. - Link to v11: https://lore.kernel.org/r/20240822-arm64-gcs-v11-0-41b81947ecb5@kernel.org Changes in v11: - Remove the dependency on the addition of clone3() support for shadow stacks, rebasing onto v6.11-rc3. - Make ID_AA64PFR1_EL1.GCS writeable in KVM. - Hide GCS registers when GCS is not enabled for KVM guests. - Require HCRX_EL2.GCSEn if booting at EL1. - Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of if we boot at EL2 or EL1. - Remove some stray use of bit 63 in signal cap tokens. - Warn if we see a GCS with VM_SHARED. - Remove rdundant check for VM_WRITE in fault handling. - Cleanups and clarifications in the ABI document. - Clean up and improve documentation of some sync placement. - Only set the EL0 GCS mode if it's actually changed. - Various minor fixes and tweaks. - Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org Changes in v10: - Fix issues with THP. - Tighten up requirements for initialising GCSCR*. - Only generate GCS signal frames for threads using GCS. - Only context switch EL1 GCS registers if S1PIE is enabled. - Move context switch of GCSCRE0_EL1 to EL0 context switch. - Make GCS registers unconditionally visible to userspace. - Use FHU infrastructure. - Don't change writability of ID_AA64PFR1_EL1 for KVM. - Remove unused arguments from alloc_gcs(). - Typo fixes. - Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org Changes in v9: - Rebase onto v6.10-rc3. - Restructure and clarify memory management fault handling. - Fix up basic-gcs for the latest clone3() changes. - Convert to newly merged KVM ID register based feature configuration. - Fixes for NV traps. - Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org Changes in v8: - Invalidate signal cap token on stack when consuming. - Typo and other trivial fixes. - Don't try to use process_vm_write() on GCS, it intentionally does not work. - Fix leak of thread GCSs. - Rebase onto latest clone3() series. - Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org Changes in v7: - Rebase onto v6.7-rc2 via the clone3() patch series. - Change the token used to cap the stack during signal handling to be compatible with GCSPOPM. - Fix flags for new page types. - Fold in support for clone3(). - Replace copy_to_user_gcs() with put_user_gcs(). - Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org Changes in v6: - Rebase onto v6.6-rc3. - Add some more gcsb_dsync() barriers following spec clarifications. - Due to ongoing discussion around clone()/clone3() I've not updated anything there, the behaviour is the same as on previous versions. - Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org Changes in v5: - Don't map any permissions for user GCSs, we always use EL0 accessors or use a separate mapping of the page. - Reduce the standard size of the GCS to RLIMIT_STACK/2. - Enforce a PAGE_SIZE alignment requirement on map_shadow_stack(). - Clarifications and fixes to documentation. - More tests. - Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org Changes in v4: - Implement flags for map_shadow_stack() allowing the cap and end of stack marker to be enabled independently or not at all. - Relax size and alignment requirements for map_shadow_stack(). - Add more blurb explaining the advantages of hardware enforcement. - Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org Changes in v3: - Rebase onto v6.5-rc4. - Add a GCS barrier on context switch. - Add a GCS stress test. - Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org Changes in v2: - Rebase onto v6.5-rc3. - Rework prctl() interface to allow each bit to be locked independently. - map_shadow_stack() now places the cap token based on the size requested by the caller not the actual space allocated. - Mode changes other than enable via ptrace are now supported. - Expand test coverage. - Various smaller fixes and adjustments. - Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org --- Mark Brown (40): mm: Introduce ARCH_HAS_USER_SHADOW_STACK mm: Define VM_HIGH_ARCH_6 arm64/mm: Restructure arch_validate_flags() for extensibility prctl: arch-agnostic prctl for shadow stack mman: Add map_shadow_stack() flags arm64: Document boot requirements for Guarded Control Stacks arm64/gcs: Document the ABI for Guarded Control Stacks arm64/sysreg: Add definitions for architected GCS caps arm64/gcs: Add manual encodings of GCS instructions arm64/gcs: Provide put_user_gcs() arm64/gcs: Provide basic EL2 setup to allow GCS usage at EL0 and EL1 arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) arm64/mm: Allocate PIE slots for EL0 guarded control stack mm: Define VM_SHADOW_STACK for arm64 when we support GCS arm64/mm: Map pages for guarded control stack KVM: arm64: Manage GCS access and registers for guests arm64/idreg: Add overrride for GCS arm64/hwcap: Add hwcap for GCS arm64/traps: Handle GCS exceptions arm64/mm: Handle GCS data aborts arm64/gcs: Context switch GCS state for EL0 arm64/gcs: Ensure that new threads have a GCS arm64/gcs: Implement shadow stack prctl() interface arm64/mm: Implement map_shadow_stack() arm64/signal: Set up and restore the GCS context for signal handlers arm64/signal: Expose GCS state in signal frames arm64/ptrace: Expose GCS via ptrace and core files arm64: Add Kconfig for Guarded Control Stack (GCS) kselftest/arm64: Verify the GCS hwcap kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Add very basic GCS test program kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add test coverage for GCS mode locking kselftest/arm64: Add GCS signal tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Enable GCS for the FP stress tests KVM: selftests: arm64: Add GCS registers to get-reg-list Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/arch/arm64/booting.rst | 32 + Documentation/arch/arm64/elf_hwcaps.rst | 4 + Documentation/arch/arm64/gcs.rst | 230 +++++++ Documentation/arch/arm64/index.rst | 1 + Documentation/filesystems/proc.rst | 2 +- arch/arm64/Kconfig | 21 + arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 30 + arch/arm64/include/asm/esr.h | 28 +- arch/arm64/include/asm/exception.h | 2 + arch/arm64/include/asm/gcs.h | 107 +++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_host.h | 12 + arch/arm64/include/asm/mman.h | 23 +- arch/arm64/include/asm/mmu_context.h | 9 + arch/arm64/include/asm/pgtable-prot.h | 14 +- arch/arm64/include/asm/processor.h | 7 + arch/arm64/include/asm/sysreg.h | 20 + arch/arm64/include/asm/uaccess.h | 40 ++ arch/arm64/include/asm/vncr_mapping.h | 2 + arch/arm64/include/uapi/asm/hwcap.h | 3 +- arch/arm64/include/uapi/asm/ptrace.h | 8 + arch/arm64/include/uapi/asm/sigcontext.h | 9 + arch/arm64/kernel/cpufeature.c | 23 + arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/entry-common.c | 23 + arch/arm64/kernel/pi/idreg-override.c | 2 + arch/arm64/kernel/process.c | 94 +++ arch/arm64/kernel/ptrace.c | 62 +- arch/arm64/kernel/signal.c | 227 ++++++- arch/arm64/kernel/traps.c | 11 + arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 31 + arch/arm64/kvm/sys_regs.c | 27 +- arch/arm64/mm/Makefile | 1 + arch/arm64/mm/fault.c | 40 ++ arch/arm64/mm/gcs.c | 254 +++++++ arch/arm64/mm/mmap.c | 9 +- arch/arm64/tools/cpucaps | 1 + arch/x86/Kconfig | 1 + arch/x86/include/uapi/asm/mman.h | 3 - fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 18 +- include/uapi/asm-generic/mman.h | 4 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 22 + kernel/sys.c | 30 + mm/Kconfig | 6 + tools/testing/selftests/arm64/Makefile | 2 +- tools/testing/selftests/arm64/abi/hwcap.c | 19 + tools/testing/selftests/arm64/fp/assembler.h | 15 + tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + tools/testing/selftests/arm64/fp/sve-test.S | 2 + tools/testing/selftests/arm64/fp/za-test.S | 2 + tools/testing/selftests/arm64/fp/zt-test.S | 2 + tools/testing/selftests/arm64/gcs/.gitignore | 5 + tools/testing/selftests/arm64/gcs/Makefile | 24 + tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 tools/testing/selftests/arm64/gcs/basic-gcs.c | 357 ++++++++++ tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++ .../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++ tools/testing/selftests/arm64/gcs/gcs-stress.c | 530 +++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++ tools/testing/selftests/arm64/gcs/libc-gcs.c | 728 +++++++++++++++++++++ tools/testing/selftests/arm64/signal/.gitignore | 1 + .../testing/selftests/arm64/signal/test_signals.c | 17 +- .../testing/selftests/arm64/signal/test_signals.h | 6 + .../selftests/arm64/signal/test_signals_utils.c | 32 +- .../selftests/arm64/signal/test_signals_utils.h | 39 ++ .../arm64/signal/testcases/gcs_exception_fault.c | 62 ++ .../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++ .../arm64/signal/testcases/gcs_write_fault.c | 67 ++ .../selftests/arm64/signal/testcases/testcases.c | 7 + .../selftests/arm64/signal/testcases/testcases.h | 1 + tools/testing/selftests/kvm/aarch64/get-reg-list.c | 28 + 75 files changed, 4120 insertions(+), 34 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20230303-arm64-gcs-e311ab0d8729 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 2 months

6
56
0 0

[PATCH net-next 00/10] selftests: net: Introduce deferred commands

by Petr Machata

Recently, a defer helper was added to Python selftests. The idea is to keep cleanup commands close to their dirtying counterparts, thereby making it more transparent what is cleaning up what, making it harder to miss a cleanup, and make the whole cleanup business exception safe. All these benefits are applicable to bash as well, exception safety can be interpreted in terms of safety vs. a SIGINT. This patchset therefore introduces a framework of several helpers that serve to schedule cleanups in bash selftests. - Patch #1 has more details about the primitives being introduced. Patch #2 adds a fallback cleanup() function to lib.sh, because ideally selftests wouldn't need to introduce a dedicated cleanup function at all. - Patch #3 adds a parameter to stop_traffic(), which makes it possible to start other background processes after the traffic is started without confusing the cleanup. - Patches #4 to #10 convert a number of selftests. The goal was to convert all tests that use start_traffic / stop_traffic to the defer framework. Leftover traffic generators are a particularly painful sort of a missed cleanup. Normal unfinished cleanups can usually be cleaned up simply by rerunning the test and interrupting it early to let the cleanups run again / in full. This does not work with stop_traffic, because it is only issued at the end of the test case that starts the traffic. At the same time, leftover traffic generators influence follow-up test runs, and are hard to notice. The tests were however converted whole-sale, not just their traffic bits. Thus they form a proof of concept of the defer framework. v1 (from the RFC): - Patch #1: - Added the priority defer track - Dropped defer_scoped_fn, added in_defer_scope - Extracted to a separate independent module - Patch #2: - Moved this bit to a separate patch - Patch #3: - New patch - Patch #4 (RED): - Squashed the individual RED-related patches into one - Converted the SW datapath RED selftest as well - Patch #5 (TBF): - Fully converted the selftest, not just stop_traffic - Patches #6, #7, #8, #9, #10: - New patch Petr Machata (10): selftests: net: lib: Introduce deferred commands selftests: forwarding: Add a fallback cleanup() selftests: forwarding: lib: Allow passing PID to stop_traffic() selftests: RED: Use defer for test cleanup selftests: TBF: Use defer for test cleanup selftests: ETS: Use defer for test cleanup selftests: mlxsw: qos_mc_aware: Use defer for test cleanup selftests: mlxsw: qos_ets_strict: Use defer for test cleanup selftests: mlxsw: qos_max_descriptors: Use defer for test cleanup selftests: mlxsw: devlink_trap_police: Use defer for test cleanup .../drivers/net/mlxsw/devlink_trap_policer.sh | 85 ++++----- .../drivers/net/mlxsw/qos_ets_strict.sh | 167 ++++++++--------- .../drivers/net/mlxsw/qos_max_descriptors.sh | 118 +++++------- .../drivers/net/mlxsw/qos_mc_aware.sh | 146 +++++++-------- .../selftests/drivers/net/mlxsw/sch_ets.sh | 26 ++- .../drivers/net/mlxsw/sch_red_core.sh | 171 +++++++++--------- .../drivers/net/mlxsw/sch_red_ets.sh | 24 +-- .../drivers/net/mlxsw/sch_red_root.sh | 18 +- tools/testing/selftests/net/forwarding/lib.sh | 13 +- .../selftests/net/forwarding/sch_ets.sh | 7 +- .../selftests/net/forwarding/sch_ets_core.sh | 81 +++------ .../selftests/net/forwarding/sch_ets_tests.sh | 14 +- .../selftests/net/forwarding/sch_red.sh | 103 ++++------- .../selftests/net/forwarding/sch_tbf_core.sh | 91 +++------- .../net/forwarding/sch_tbf_etsprio.sh | 7 +- .../selftests/net/forwarding/sch_tbf_root.sh | 3 +- tools/testing/selftests/net/lib.sh | 3 + tools/testing/selftests/net/lib/Makefile | 2 +- tools/testing/selftests/net/lib/sh/defer.sh | 115 ++++++++++++ 19 files changed, 587 insertions(+), 607 deletions(-) create mode 100644 tools/testing/selftests/net/lib/sh/defer.sh -- 2.45.0

1 year, 2 months

2
14
0 0

[RFC PATCH 0/4] implement lightweight guard pages

by Lorenzo Stoakes

Userland library functions such as allocators and threading implementations often require regions of memory to act as 'guard pages' - mappings which, when accessed, result in a fatal signal being sent to the accessing process. The current means by which these are implemented is via a PROT_NONE mmap() mapping, which provides the required semantics however incur an overhead of a VMA for each such region. With a great many processes and threads, this can rapidly add up and incur a significant memory penalty. It also has the added problem of preventing merges that might otherwise be permitted. This series takes a different approach - an idea suggested by Vlasimil Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the provenance becomes a little tricky to ascertain after this - please forgive any omissions!) - rather than locating the guard pages at the VMA layer, instead placing them in page tables mapping the required ranges. Early testing of the prototype version of this code suggests a 5 times speed up in memory mapping invocations (in conjunction with use of process_madvise()) and a 13% reduction in VMAs on an entirely idle android system and unoptimised code. We expect with optimisation and a loaded system with a larger number of guard pages this could significantly increase, but in any case these numbers are encouraging. This way, rather than having separate VMAs specifying which parts of a range are guard pages, instead we have a VMA spanning the entire range of memory a user is permitted to access and including ranges which are to be 'guarded'. After mapping this, a user can specify which parts of the range should result in a fatal signal when accessed. By restricting the ability to specify guard pages to memory mapped by existing VMAs, we can rely on the mappings being torn down when the mappings are ultimately unmapped and everything works simply as if the memory were not faulted in, from the point of view of the containing VMAs. This mechanism in effect poisons memory ranges similar to hardware memory poisoning, only it is an entirely software-controlled form of poisoning. Any poisoned region of memory is also able to 'unpoisoned', that is, to have its poison markers removed. The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this poisoning. Poisoning can be performed across multiple VMAs and any existing mappings will be cleared, that is zapped, before installing the poisoned page table mappings. There is no concept of 'nested' poisoning, multiple attempts to poison a range will, after the first poisoning, have no effect. Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned memory, so a user can safely unpoison a range of memory and clear only poison page table mappings leaving the rest intact. The actual mechanism by which the page table entries are specified makes use of existing logic - PTE markers, which are used for the userfaultfd UFFDIO_POISON mechanism. Unfortunately PTE_MARKER_POISONED is not suited for the guard page mechanism as it results in VM_FAULT_HWPOISON semantics in the fault handler, so we add our own specific PTE_MARKER_GUARD and adapt existing logic to handle it. We also extend the generic page walk mechanism to allow for installation of PTEs (carefully restricted to memory management logic only to prevent unwanted abuse). We ensure that zapping performed by, for instance, MADV_DONTNEED, does not remove guard poison markers, nor does forking (except when VM_WIPEONFORK is specified for a VMA which implies a total removal of memory characteristics). It's important to note that the guard page implementation is emphatically NOT a security feature, so a user can remove the poisoning if they wish. We simply implement it in such a way as to provide the least surprising behaviour. An extensive set of self-tests are provided which ensure behaviour is as expected and additionally self-documents expected behaviour of poisoned ranges. Suggested-by: Vlastimil Babka <vbabka(a)suze.cz> Suggested-by: Jann Horn <jannh(a)google.com> Suggested-by: David Hildenbrand <david(a)redhat.com> Lorenzo Stoakes (4): mm: pagewalk: add the ability to install PTEs mm: add PTE_MARKER_GUARD PTE marker mm: madvise: implement lightweight guard page mechanism selftests/mm: add self tests for guard page feature arch/alpha/include/uapi/asm/mman.h | 3 + arch/mips/include/uapi/asm/mman.h | 3 + arch/parisc/include/uapi/asm/mman.h | 3 + arch/xtensa/include/uapi/asm/mman.h | 3 + include/linux/mm_inline.h | 2 +- include/linux/pagewalk.h | 18 +- include/linux/swapops.h | 26 +- include/uapi/asm-generic/mman-common.h | 3 + mm/hugetlb.c | 3 + mm/internal.h | 6 + mm/madvise.c | 158 +++ mm/memory.c | 18 +- mm/mprotect.c | 3 +- mm/mseal.c | 1 + mm/pagewalk.c | 174 ++-- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/guard-pages.c | 1168 ++++++++++++++++++++++ 18 files changed, 1525 insertions(+), 69 deletions(-) create mode 100644 tools/testing/selftests/mm/guard-pages.c -- 2.46.2

1 year, 2 months

6
24
0 0

[PATCH -next] selftests/cgroup: Fix compile error in test_cpu.c

by Xiu Jianfeng

From: Xiu Jianfeng <xiujianfeng(a)huawei.com> When compiling the cgroup selftests with the following command: make -C tools/testing/selftests/cgroup/ the compiler complains as below: test_cpu.c: In function ‘test_cpucg_nice’: test_cpu.c:284:39: error: incompatible type for argument 2 of ‘hog_cpus_timed’ 284 | hog_cpus_timed(cpucg, param); | ^~~~~ | | | struct cpu_hog_func_param test_cpu.c:132:53: note: expected ‘void *’ but argument is of type ‘struct cpu_hog_func_param’ 132 | static int hog_cpus_timed(const char *cgroup, void *arg) | ~~~~~~^~~ Fix it by passing the address of param to hog_cpus_timed(). Fixes: 2e82c0d4562a ("cgroup/rstat: Selftests for niced CPU statistics") Signed-off-by: Xiu Jianfeng <xiujianfeng(a)huawei.com> --- tools/testing/selftests/cgroup/test_cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_cpu.c b/tools/testing/selftests/cgroup/test_cpu.c index 201ce14cb422..a2b50af8e9ee 100644 --- a/tools/testing/selftests/cgroup/test_cpu.c +++ b/tools/testing/selftests/cgroup/test_cpu.c @@ -281,7 +281,7 @@ static int test_cpucg_nice(const char *root) /* Try to keep niced CPU usage as constrained to hog_cpu as possible */ nice(1); - hog_cpus_timed(cpucg, param); + hog_cpus_timed(cpucg, &param); exit(0); } else { waitpid(pid, &status, 0); -- 2.34.1

1 year, 2 months

2
1
0 0

[PATCH v2 net-next] selftests: tc-testing: Fixed Typo error

by Karan Sanghavi

This commit combines two fixes for typographical errors in the "name" fields of the JSON objects with IDs "4319" and "4341" in the tc-testing selftests. For the files tc-tests/filters/cgroup.json and /tc-tests/filters/flow.json. v2: - Combine two earlier patches into one - Links to v1 of each patch [1] https://lore.kernel.org/all/Zqp9asVA-q_OzDP-@Emma/ [2] https://lore.kernel.org/all/Zqp92oXa9joXk4T9@Emma/ Signed-off-by: Karan Sanghavi <karansanghvi98(a)gmail.com> --- tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json | 2 +- tools/testing/selftests/tc-testing/tc-tests/filters/flow.json | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json b/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json index 03723cf84..6897ff5ad 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json @@ -1189,7 +1189,7 @@ }, { "id": "4319", - "name": "Replace cgroup filter with diffferent match", + "name": "Replace cgroup filter with different match", "category": [ "filter", "cgroup" diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json b/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json index 58189327f..996448afe 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json +++ b/tools/testing/selftests/tc-testing/tc-tests/filters/flow.json @@ -507,7 +507,7 @@ }, { "id": "4341", - "name": "Add flow filter with muliple ops", + "name": "Add flow filter with multiple ops", "category": [ "filter", "flow" -- 2.43.0

1 year, 2 months

2
1
0 0

[PATCH] selftests/intel_pstate: fix operand expected

by Alessandro Zanni

This fix solves these errors, when calling kselftest with targets "intel_pstate": - ./run.sh: line 90: / 1000: syntax error: operand expected (error token is "/ 1000") - ./run.sh: line 92: / 1000: syntax error: operand expected (error token is "/ 1000") To error was found by running tests manually with the command: make kselftest TARGETS=intel_pstate Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com> --- tools/testing/selftests/intel_pstate/run.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/intel_pstate/run.sh b/tools/testing/selftests/intel_pstate/run.sh index e7008f614ad7..39130a359535 100755 --- a/tools/testing/selftests/intel_pstate/run.sh +++ b/tools/testing/selftests/intel_pstate/run.sh @@ -87,9 +87,11 @@ mkt_freq=${_mkt_freq}0 # Get the ranges from cpupower _min_freq=$(cpupower frequency-info -l | tail -1 | awk ' { print $1 } ') -min_freq=$(($_min_freq / 1000)) +min_freq=$((_min_freq / 1000)) +echo "min_freq:" +echo $min_freq _max_freq=$(cpupower frequency-info -l | tail -1 | awk ' { print $2 } ') -max_freq=$(($_max_freq / 1000)) +max_freq=$((_max_freq / 1000)) [ $EVALUATE_ONLY -eq 0 ] && for freq in `seq $max_freq -100 $min_freq` -- 2.43.0

1 year, 2 months

2
1
0 0

[PATCH] selftests:timers: remove local CLOCKID defines

by Shuah Khan

timers tests defines CLOCKIDs locally. Remove all local CLOCKIDs except CLOCK_HWSPECIFIC and use defines from time.h header file. CLOCK_HWSPECIFIC and CLOCK_SGI_CYCLE are the same and CLOCK_SGI_CYCLE is deprecated, Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> --- tools/testing/selftests/timers/adjtick.c | 2 -- .../selftests/timers/alarmtimer-suspend.c | 15 --------------- .../selftests/timers/inconsistency-check.c | 19 ++++--------------- tools/testing/selftests/timers/nanosleep.c | 18 ++++-------------- tools/testing/selftests/timers/nsleep-lat.c | 19 ++++--------------- tools/testing/selftests/timers/raw_skew.c | 2 -- .../testing/selftests/timers/set-timer-lat.c | 16 +++------------- 7 files changed, 15 insertions(+), 76 deletions(-) diff --git a/tools/testing/selftests/timers/adjtick.c b/tools/testing/selftests/timers/adjtick.c index cb9a30f54662..777d9494b683 100644 --- a/tools/testing/selftests/timers/adjtick.c +++ b/tools/testing/selftests/timers/adjtick.c @@ -26,8 +26,6 @@ #include "../kselftest.h" -#define CLOCK_MONOTONIC_RAW 4 - #define MILLION 1000000 long systick; diff --git a/tools/testing/selftests/timers/alarmtimer-suspend.c b/tools/testing/selftests/timers/alarmtimer-suspend.c index 62da2a3f949e..2da382df5eaa 100644 --- a/tools/testing/selftests/timers/alarmtimer-suspend.c +++ b/tools/testing/selftests/timers/alarmtimer-suspend.c @@ -31,21 +31,6 @@ #include <include/vdso/time64.h> #include "../kselftest.h" -#define CLOCK_REALTIME 0 -#define CLOCK_MONOTONIC 1 -#define CLOCK_PROCESS_CPUTIME_ID 2 -#define CLOCK_THREAD_CPUTIME_ID 3 -#define CLOCK_MONOTONIC_RAW 4 -#define CLOCK_REALTIME_COARSE 5 -#define CLOCK_MONOTONIC_COARSE 6 -#define CLOCK_BOOTTIME 7 -#define CLOCK_REALTIME_ALARM 8 -#define CLOCK_BOOTTIME_ALARM 9 -#define CLOCK_HWSPECIFIC 10 -#define CLOCK_TAI 11 -#define NR_CLOCKIDS 12 - - #define UNREASONABLE_LAT (NSEC_PER_SEC * 5) /* hopefully we resume in 5 secs */ #define SUSPEND_SECS 15 diff --git a/tools/testing/selftests/timers/inconsistency-check.c b/tools/testing/selftests/timers/inconsistency-check.c index 75650cf0503f..9d1573769d55 100644 --- a/tools/testing/selftests/timers/inconsistency-check.c +++ b/tools/testing/selftests/timers/inconsistency-check.c @@ -31,21 +31,10 @@ #include <include/vdso/time64.h> #include "../kselftest.h" -#define CALLS_PER_LOOP 64 - -#define CLOCK_REALTIME 0 -#define CLOCK_MONOTONIC 1 -#define CLOCK_PROCESS_CPUTIME_ID 2 -#define CLOCK_THREAD_CPUTIME_ID 3 -#define CLOCK_MONOTONIC_RAW 4 -#define CLOCK_REALTIME_COARSE 5 -#define CLOCK_MONOTONIC_COARSE 6 -#define CLOCK_BOOTTIME 7 -#define CLOCK_REALTIME_ALARM 8 -#define CLOCK_BOOTTIME_ALARM 9 +/* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 -#define CLOCK_TAI 11 -#define NR_CLOCKIDS 12 + +#define CALLS_PER_LOOP 64 char *clockstring(int clockid) { @@ -152,7 +141,7 @@ int main(int argc, char *argv[]) { int clockid, opt; int userclock = CLOCK_REALTIME; - int maxclocks = NR_CLOCKIDS; + int maxclocks = CLOCK_TAI + 1; int runtime = 10; struct timespec ts; diff --git a/tools/testing/selftests/timers/nanosleep.c b/tools/testing/selftests/timers/nanosleep.c index 9a354e38a569..252c6308c569 100644 --- a/tools/testing/selftests/timers/nanosleep.c +++ b/tools/testing/selftests/timers/nanosleep.c @@ -30,19 +30,8 @@ #include <include/vdso/time64.h> #include "../kselftest.h" -#define CLOCK_REALTIME 0 -#define CLOCK_MONOTONIC 1 -#define CLOCK_PROCESS_CPUTIME_ID 2 -#define CLOCK_THREAD_CPUTIME_ID 3 -#define CLOCK_MONOTONIC_RAW 4 -#define CLOCK_REALTIME_COARSE 5 -#define CLOCK_MONOTONIC_COARSE 6 -#define CLOCK_BOOTTIME 7 -#define CLOCK_REALTIME_ALARM 8 -#define CLOCK_BOOTTIME_ALARM 9 +/* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 -#define CLOCK_TAI 11 -#define NR_CLOCKIDS 12 #define UNSUPPORTED 0xf00f @@ -131,11 +120,12 @@ int main(int argc, char **argv) { long long length; int clockid, ret; + int max_clocks = CLOCK_TAI + 1; ksft_print_header(); - ksft_set_plan(NR_CLOCKIDS); + ksft_set_plan(max_clocks); - for (clockid = CLOCK_REALTIME; clockid < NR_CLOCKIDS; clockid++) { + for (clockid = CLOCK_REALTIME; clockid < max_clocks; clockid++) { /* Skip cputime clockids since nanosleep won't increment cputime */ if (clockid == CLOCK_PROCESS_CPUTIME_ID || diff --git a/tools/testing/selftests/timers/nsleep-lat.c b/tools/testing/selftests/timers/nsleep-lat.c index f6a99490b291..de23dc0c9f97 100644 --- a/tools/testing/selftests/timers/nsleep-lat.c +++ b/tools/testing/selftests/timers/nsleep-lat.c @@ -29,20 +29,8 @@ #define UNRESONABLE_LATENCY 40000000 /* 40ms in nanosecs */ - -#define CLOCK_REALTIME 0 -#define CLOCK_MONOTONIC 1 -#define CLOCK_PROCESS_CPUTIME_ID 2 -#define CLOCK_THREAD_CPUTIME_ID 3 -#define CLOCK_MONOTONIC_RAW 4 -#define CLOCK_REALTIME_COARSE 5 -#define CLOCK_MONOTONIC_COARSE 6 -#define CLOCK_BOOTTIME 7 -#define CLOCK_REALTIME_ALARM 8 -#define CLOCK_BOOTTIME_ALARM 9 +/* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 -#define CLOCK_TAI 11 -#define NR_CLOCKIDS 12 #define UNSUPPORTED 0xf00f @@ -144,11 +132,12 @@ int main(int argc, char **argv) { long long length; int clockid, ret; + int max_clocks = CLOCK_TAI + 1; ksft_print_header(); - ksft_set_plan(NR_CLOCKIDS - CLOCK_REALTIME - SKIPPED_CLOCK_COUNT); + ksft_set_plan(max_clocks - CLOCK_REALTIME - SKIPPED_CLOCK_COUNT); - for (clockid = CLOCK_REALTIME; clockid < NR_CLOCKIDS; clockid++) { + for (clockid = CLOCK_REALTIME; clockid < max_clocks; clockid++) { /* Skip cputime clockids since nanosleep won't increment cputime */ if (clockid == CLOCK_PROCESS_CPUTIME_ID || diff --git a/tools/testing/selftests/timers/raw_skew.c b/tools/testing/selftests/timers/raw_skew.c index ea50e4efc422..957f7cd29cb1 100644 --- a/tools/testing/selftests/timers/raw_skew.c +++ b/tools/testing/selftests/timers/raw_skew.c @@ -28,8 +28,6 @@ #include <include/vdso/time64.h> #include "../kselftest.h" -#define CLOCK_MONOTONIC_RAW 4 - #define shift_right(x, s) ({ \ __typeof__(x) __x = (x); \ __typeof__(s) __s = (s); \ diff --git a/tools/testing/selftests/timers/set-timer-lat.c b/tools/testing/selftests/timers/set-timer-lat.c index 5365e9ae61c3..4574f8f04542 100644 --- a/tools/testing/selftests/timers/set-timer-lat.c +++ b/tools/testing/selftests/timers/set-timer-lat.c @@ -31,19 +31,8 @@ #include <include/vdso/time64.h> #include "../kselftest.h" -#define CLOCK_REALTIME 0 -#define CLOCK_MONOTONIC 1 -#define CLOCK_PROCESS_CPUTIME_ID 2 -#define CLOCK_THREAD_CPUTIME_ID 3 -#define CLOCK_MONOTONIC_RAW 4 -#define CLOCK_REALTIME_COARSE 5 -#define CLOCK_MONOTONIC_COARSE 6 -#define CLOCK_BOOTTIME 7 -#define CLOCK_REALTIME_ALARM 8 -#define CLOCK_BOOTTIME_ALARM 9 +/* CLOCK_HWSPECIFIC == CLOCK_SGI_CYCLE (Deprecated) */ #define CLOCK_HWSPECIFIC 10 -#define CLOCK_TAI 11 -#define NR_CLOCKIDS 12 #define UNRESONABLE_LATENCY 40000000 /* 40ms in nanosecs */ @@ -253,6 +242,7 @@ int main(void) struct sigaction act; int signum = SIGRTMAX; int ret = 0; + int max_clocks = CLOCK_TAI + 1; /* Set up signal handler: */ sigfillset(&act.sa_mask); @@ -261,7 +251,7 @@ int main(void) sigaction(signum, &act, NULL); printf("Setting timers for every %i seconds\n", TIMER_SECS); - for (clock_id = 0; clock_id < NR_CLOCKIDS; clock_id++) { + for (clock_id = 0; clock_id < max_clocks; clock_id++) { if ((clock_id == CLOCK_PROCESS_CPUTIME_ID) || (clock_id == CLOCK_THREAD_CPUTIME_ID) || -- 2.40.1

1 year, 2 months

3
3
0 0

[PATCH] selftests: timers: Remove unneeded semicolon

by Chen Ni

Remove unnecessary semicolons reported by Coccinelle/coccicheck and the semantic patch at scripts/coccinelle/misc/semicolon.cocci. Signed-off-by: Chen Ni <nichen(a)iscas.ac.cn> --- tools/testing/selftests/timers/set-timer-lat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/set-timer-lat.c b/tools/testing/selftests/timers/set-timer-lat.c index 5365e9ae61c3..7a1a2382538c 100644 --- a/tools/testing/selftests/timers/set-timer-lat.c +++ b/tools/testing/selftests/timers/set-timer-lat.c @@ -79,7 +79,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } -- 2.25.1

1 year, 2 months

3
3
0 0

[RFC PATCH] selftest/tcp-ao: Add filter tests

by Leo Stone

Add tests that check if getsockopt(TCP_AO_GET_KEYS) returns the right keys when using different filters. Sample output: > # ok 114 filter keys: by sndid, rcvid, address > # ok 115 filter keys: by sndid, rcvid > # ok 116 filter keys: by is_current > # ok 117 filter keys: by is_rnext Signed-off-by: Leo Stone <leocstone(a)gmail.com> --- This patch is meant to address the TODO in setsockopt-closed.c: > /* > * TODO: check getsockopt(TCP_AO_GET_KEYS) with different filters > * returning proper nr & keys; > */ Is this a reasonable way to do these tests? If so, what cases should I add? --- .../selftests/net/tcp_ao/setsockopt-closed.c | 158 +++++++++++++++++- 1 file changed, 157 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c b/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c index 084db4ecdff6..4c8aa06eef5a 100644 --- a/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c +++ b/tools/testing/selftests/net/tcp_ao/setsockopt-closed.c @@ -6,6 +6,8 @@ static union tcp_addr tcp_md5_client; +#define FILTER_TEST_NKEYS 16 + static int test_port = 7788; static void make_listen(int sk) { @@ -813,12 +815,166 @@ static void duplicate_tests(void) setsockopt_checked(sk, TCP_AO_ADD_KEY, &ao, EEXIST, "duplicate: SendID differs"); } + +static void fetch_all_keys(int sk, struct tcp_ao_getsockopt *keys) +{ + socklen_t optlen = sizeof(struct tcp_ao_getsockopt); + + memset(keys, 0, sizeof(struct tcp_ao_getsockopt) * FILTER_TEST_NKEYS); + keys[0].get_all = 1; + keys[0].nkeys = FILTER_TEST_NKEYS; + if (getsockopt(sk, IPPROTO_TCP, TCP_AO_GET_KEYS, &keys[0], &optlen)) + test_error("getsockopt"); +} + +static int prepare_test_keys(struct tcp_ao_getsockopt *keys) +{ + struct tcp_ao_add test_ao[FILTER_TEST_NKEYS]; + u8 rcvid = 100, sndid = 100; + const char *test_password = "Test password number "; + char test_password_scratch[64] = {}; + int sk = socket(test_family, SOCK_STREAM, IPPROTO_TCP); + + if (sk < 0) + test_error("socket()"); + + for (int i = 0; i < FILTER_TEST_NKEYS; i++) { + snprintf(test_password_scratch, 64, "%s %d", test_password, i); + test_prepare_key(&test_ao[i], DEFAULT_TEST_ALGO, this_ip_dest, false, false, + DEFAULT_TEST_PREFIX, 0, sndid++, rcvid++, 0, 0, + strlen(test_password_scratch), test_password_scratch); + } + test_ao[0].set_current = 1; + test_ao[1].set_rnext = 1; + /* One key with a different addr and overlapping sndid, rcvid */ + tcp_addr_to_sockaddr_in(&test_ao[2].addr, &this_ip_addr, 0); + test_ao[2].sndid = 100; + test_ao[2].rcvid = 100; + + /* Add keys in a random order */ + for (int i = 0; i < FILTER_TEST_NKEYS; i++) { + int randidx = rand() % (FILTER_TEST_NKEYS - i); + + if (setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &test_ao[randidx], + sizeof(struct tcp_ao_add))) + test_error("setsockopt()"); + memcpy(&test_ao[randidx], &test_ao[FILTER_TEST_NKEYS - 1 - i], + sizeof(struct tcp_ao_add)); + } + + fetch_all_keys(sk, keys); + + return sk; +} + +/* Assumes passwords are unique */ +static int compare_mkts(struct tcp_ao_getsockopt *expected, int nexpected, + struct tcp_ao_getsockopt *actual, int nactual) +{ + int matches = 0; + + for (int i = 0; i < nexpected; i++) { + for (int j = 0; j < nactual; j++) { + if (memcmp(expected[i].key, actual[j].key, TCP_AO_MAXKEYLEN) == 0) + matches++; + } + } + return nexpected - matches; +} + +static void filter_keys_checked(int sk, struct tcp_ao_getsockopt *filter, + struct tcp_ao_getsockopt *expected, + unsigned int nexpected, const char *tst) +{ + struct tcp_ao_getsockopt all_keys[FILTER_TEST_NKEYS] = {}; + struct tcp_ao_getsockopt filtered_keys[FILTER_TEST_NKEYS] = {}; + socklen_t len = sizeof(struct tcp_ao_getsockopt); + + fetch_all_keys(sk, all_keys); + memcpy(&filtered_keys[0], filter, sizeof(struct tcp_ao_getsockopt)); + filtered_keys[0].nkeys = FILTER_TEST_NKEYS; + if (getsockopt(sk, IPPROTO_TCP, TCP_AO_GET_KEYS, filtered_keys, &len)) + test_error("getsockopt"); + if (filtered_keys[0].nkeys != nexpected) + test_error("wrong nr of keys, expected %u got %u", nexpected, + filtered_keys[0].nkeys); + if (compare_mkts(expected, nexpected, filtered_keys, filtered_keys[0].nkeys)) + test_error("got wrong keys back"); + test_ok("filter keys: %s", tst); + + close(sk); + memset(filter, 0, sizeof(struct tcp_ao_getsockopt)); +} + +static void filter_tests(void) +{ + struct tcp_ao_getsockopt original_keys[FILTER_TEST_NKEYS]; + struct tcp_ao_getsockopt expected_keys[FILTER_TEST_NKEYS]; + struct tcp_ao_getsockopt filter = {}; + int sk, f, nmatches; + + f = 2; + sk = prepare_test_keys(original_keys); + filter.rcvid = original_keys[f].rcvid; + filter.sndid = original_keys[f].sndid; + memcpy(&filter.addr, &original_keys[f].addr, sizeof(original_keys[f].addr)); + filter.prefix = original_keys[f].prefix; + filter_keys_checked(sk, &filter, &original_keys[f], 1, "by sndid, rcvid, address"); + + f = -1; + sk = prepare_test_keys(original_keys); + for (int i = 0; i < original_keys[0].nkeys; i++) { + if (original_keys[i].is_current) { + f = i; + break; + } + } + if (f < 0) + test_error("No current key after adding one"); + filter.is_current = 1; + filter_keys_checked(sk, &filter, &original_keys[f], 1, "by is_current"); + + f = -1; + sk = prepare_test_keys(original_keys); + for (int i = 0; i < original_keys[0].nkeys; i++) { + if (original_keys[i].is_rnext) { + f = i; + break; + } + } + if (f < 0) + test_error("No rnext key after adding one"); + filter.is_rnext = 1; + filter_keys_checked(sk, &filter, &original_keys[f], 1, "by is_rnext"); + + f = -1; + nmatches = 0; + sk = prepare_test_keys(original_keys); + for (int i = 0; i < original_keys[0].nkeys; i++) { + if (original_keys[i].sndid == 100) { + f = i; + memcpy(&expected_keys[nmatches], &original_keys[i], + sizeof(struct tcp_ao_getsockopt)); + nmatches++; + } + } + if (f < 0) + test_error("No key for sndid 100"); + if (nmatches != 2) + test_error("Should have 2 keys with sndid 100"); + filter.rcvid = original_keys[f].rcvid; + filter.sndid = original_keys[f].sndid; + filter.addr.ss_family = test_family; + filter_keys_checked(sk, &filter, expected_keys, nmatches, "by sndid, rcvid"); +} + static void *client_fn(void *arg) { if (inet_pton(TEST_FAMILY, __TEST_CLIENT_IP(2), &tcp_md5_client) != 1) test_error("Can't convert ip address"); extend_tests(); einval_tests(); + filter_tests(); duplicate_tests(); /* * TODO: check getsockopt(TCP_AO_GET_KEYS) with different filters @@ -830,6 +986,6 @@ static void *client_fn(void *arg) int main(int argc, char *argv[]) { - test_init(121, client_fn, NULL); + test_init(125, client_fn, NULL); return 0; } -- 2.43.0

1 year, 2 months

2
1
0 0

[PATCH 6.6 206/213] selftests/rseq: Fix mm_cid test failure

by Greg Kroah-Hartman

6.6-stable review patch. If anyone has any objections, please let me know. ------------------ From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> commit a0cc649353bb726d4aa0db60dce467432197b746 upstream. Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by: glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)") Without this fix, rseq selftests for mm_cid fail: ./run_param_test.sh Default parameters Running test spinlock Running compare-twice test spinlock Running mm_cid test spinlock Error: cpu id getter unavailable Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Peter Zijlstra <peterz(a)infradead.org> CC: Boqun Feng <boqun.feng(a)gmail.com> CC: "Paul E. McKenney" <paulmck(a)kernel.org> Cc: Shuah Khan <skhan(a)linuxfoundation.org> CC: Carlos O'Donell <carlos(a)redhat.com> CC: Florian Weimer <fweimer(a)redhat.com> CC: linux-kselftest(a)vger.kernel.org CC: stable(a)vger.kernel.org Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- tools/testing/selftests/rseq/rseq.c | 110 ++++++++++++++++++++++++------------ tools/testing/selftests/rseq/rseq.h | 10 --- 2 files changed, 77 insertions(+), 43 deletions(-) --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -60,12 +60,6 @@ unsigned int rseq_size = -1U; /* Flags used during rseq registration. */ unsigned int rseq_flags; -/* - * rseq feature size supported by the kernel. 0 if the registration was - * unsuccessful. - */ -unsigned int rseq_feature_size = -1U; - static int rseq_ownership; static int rseq_reg_success; /* At least one rseq registration has succeded. */ @@ -111,6 +105,43 @@ int rseq_available(void) } } +/* The rseq areas need to be at least 32 bytes. */ +static +unsigned int get_rseq_min_alloc_size(void) +{ + unsigned int alloc_size = rseq_size; + + if (alloc_size < ORIG_RSEQ_ALLOC_SIZE) + alloc_size = ORIG_RSEQ_ALLOC_SIZE; + return alloc_size; +} + +/* + * Return the feature size supported by the kernel. + * + * Depending on the value returned by getauxval(AT_RSEQ_FEATURE_SIZE): + * + * 0: Return ORIG_RSEQ_FEATURE_SIZE (20) + * > 0: Return the value from getauxval(AT_RSEQ_FEATURE_SIZE). + * + * It should never return a value below ORIG_RSEQ_FEATURE_SIZE. + */ +static +unsigned int get_rseq_kernel_feature_size(void) +{ + unsigned long auxv_rseq_feature_size, auxv_rseq_align; + + auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); + assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); + + auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); + assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); + if (auxv_rseq_feature_size) + return auxv_rseq_feature_size; + else + return ORIG_RSEQ_FEATURE_SIZE; +} + int rseq_register_current_thread(void) { int rc; @@ -119,7 +150,7 @@ int rseq_register_current_thread(void) /* Treat libc's ownership as a successful registration. */ return 0; } - rc = sys_rseq(&__rseq_abi, rseq_size, 0, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG); if (rc) { if (RSEQ_READ_ONCE(rseq_reg_success)) { /* Incoherent success/failure within process. */ @@ -140,28 +171,12 @@ int rseq_unregister_current_thread(void) /* Treat libc's ownership as a successful unregistration. */ return 0; } - rc = sys_rseq(&__rseq_abi, rseq_size, RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG); if (rc) return -1; return 0; } -static -unsigned int get_rseq_feature_size(void) -{ - unsigned long auxv_rseq_feature_size, auxv_rseq_align; - - auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); - assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); - - auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); - assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); - if (auxv_rseq_feature_size) - return auxv_rseq_feature_size; - else - return ORIG_RSEQ_FEATURE_SIZE; -} - static __attribute__((constructor)) void rseq_init(void) { @@ -178,28 +193,54 @@ void rseq_init(void) } if (libc_rseq_size_p && libc_rseq_offset_p && libc_rseq_flags_p && *libc_rseq_size_p != 0) { + unsigned int libc_rseq_size; + /* rseq registration owned by glibc */ rseq_offset = *libc_rseq_offset_p; - rseq_size = *libc_rseq_size_p; + libc_rseq_size = *libc_rseq_size_p; rseq_flags = *libc_rseq_flags_p; - rseq_feature_size = get_rseq_feature_size(); - if (rseq_feature_size > rseq_size) - rseq_feature_size = rseq_size; + + /* + * Previous versions of glibc expose the value + * 32 even though the kernel only supported 20 + * bytes initially. Therefore treat 32 as a + * special-case. glibc 2.40 exposes a 20 bytes + * __rseq_size without using getauxval(3) to + * query the supported size, while still allocating a 32 + * bytes area. Also treat 20 as a special-case. + * + * Special-cases are handled by using the following + * value as active feature set size: + * + * rseq_size = min(32, get_rseq_kernel_feature_size()) + */ + switch (libc_rseq_size) { + case ORIG_RSEQ_FEATURE_SIZE: + fallthrough; + case ORIG_RSEQ_ALLOC_SIZE: + { + unsigned int rseq_kernel_feature_size = get_rseq_kernel_feature_size(); + + if (rseq_kernel_feature_size < ORIG_RSEQ_ALLOC_SIZE) + rseq_size = rseq_kernel_feature_size; + else + rseq_size = ORIG_RSEQ_ALLOC_SIZE; + break; + } + default: + /* Otherwise just use the __rseq_size from libc as rseq_size. */ + rseq_size = libc_rseq_size; + break; + } return; } rseq_ownership = 1; if (!rseq_available()) { rseq_size = 0; - rseq_feature_size = 0; return; } rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer(); rseq_flags = 0; - rseq_feature_size = get_rseq_feature_size(); - if (rseq_feature_size == ORIG_RSEQ_FEATURE_SIZE) - rseq_size = ORIG_RSEQ_ALLOC_SIZE; - else - rseq_size = RSEQ_THREAD_AREA_ALLOC_SIZE; } static __attribute__((destructor)) @@ -209,7 +250,6 @@ void rseq_exit(void) return; rseq_offset = 0; rseq_size = -1U; - rseq_feature_size = -1U; rseq_ownership = 0; } --- a/tools/testing/selftests/rseq/rseq.h +++ b/tools/testing/selftests/rseq/rseq.h @@ -68,12 +68,6 @@ extern unsigned int rseq_size; /* Flags used during rseq registration. */ extern unsigned int rseq_flags; -/* - * rseq feature size supported by the kernel. 0 if the registration was - * unsuccessful. - */ -extern unsigned int rseq_feature_size; - enum rseq_mo { RSEQ_MO_RELAXED = 0, RSEQ_MO_CONSUME = 1, /* Unused */ @@ -193,7 +187,7 @@ static inline uint32_t rseq_current_cpu( static inline bool rseq_node_id_available(void) { - return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, node_id); + return (int) rseq_size >= rseq_offsetofend(struct rseq_abi, node_id); } /* @@ -207,7 +201,7 @@ static inline uint32_t rseq_current_node static inline bool rseq_mm_cid_available(void) { - return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, mm_cid); + return (int) rseq_size >= rseq_offsetofend(struct rseq_abi, mm_cid); } static inline uint32_t rseq_current_mm_cid(void)

1 year, 2 months

1
0
0 0

[PATCH 6.11 205/214] selftests/rseq: Fix mm_cid test failure

by Greg Kroah-Hartman

6.11-stable review patch. If anyone has any objections, please let me know. ------------------ From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> commit a0cc649353bb726d4aa0db60dce467432197b746 upstream. Adapt the rseq.c/rseq.h code to follow GNU C library changes introduced by: glibc commit 2e456ccf0c34 ("Linux: Make __rseq_size useful for feature detection (bug 31965)") Without this fix, rseq selftests for mm_cid fail: ./run_param_test.sh Default parameters Running test spinlock Running compare-twice test spinlock Running mm_cid test spinlock Error: cpu id getter unavailable Fixes: 18c2355838e7 ("selftests/rseq: Implement rseq mm_cid field support") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Peter Zijlstra <peterz(a)infradead.org> CC: Boqun Feng <boqun.feng(a)gmail.com> CC: "Paul E. McKenney" <paulmck(a)kernel.org> Cc: Shuah Khan <skhan(a)linuxfoundation.org> CC: Carlos O'Donell <carlos(a)redhat.com> CC: Florian Weimer <fweimer(a)redhat.com> CC: linux-kselftest(a)vger.kernel.org CC: stable(a)vger.kernel.org Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- tools/testing/selftests/rseq/rseq.c | 110 ++++++++++++++++++++++++------------ tools/testing/selftests/rseq/rseq.h | 10 --- 2 files changed, 77 insertions(+), 43 deletions(-) --- a/tools/testing/selftests/rseq/rseq.c +++ b/tools/testing/selftests/rseq/rseq.c @@ -60,12 +60,6 @@ unsigned int rseq_size = -1U; /* Flags used during rseq registration. */ unsigned int rseq_flags; -/* - * rseq feature size supported by the kernel. 0 if the registration was - * unsuccessful. - */ -unsigned int rseq_feature_size = -1U; - static int rseq_ownership; static int rseq_reg_success; /* At least one rseq registration has succeded. */ @@ -111,6 +105,43 @@ int rseq_available(void) } } +/* The rseq areas need to be at least 32 bytes. */ +static +unsigned int get_rseq_min_alloc_size(void) +{ + unsigned int alloc_size = rseq_size; + + if (alloc_size < ORIG_RSEQ_ALLOC_SIZE) + alloc_size = ORIG_RSEQ_ALLOC_SIZE; + return alloc_size; +} + +/* + * Return the feature size supported by the kernel. + * + * Depending on the value returned by getauxval(AT_RSEQ_FEATURE_SIZE): + * + * 0: Return ORIG_RSEQ_FEATURE_SIZE (20) + * > 0: Return the value from getauxval(AT_RSEQ_FEATURE_SIZE). + * + * It should never return a value below ORIG_RSEQ_FEATURE_SIZE. + */ +static +unsigned int get_rseq_kernel_feature_size(void) +{ + unsigned long auxv_rseq_feature_size, auxv_rseq_align; + + auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); + assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); + + auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); + assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); + if (auxv_rseq_feature_size) + return auxv_rseq_feature_size; + else + return ORIG_RSEQ_FEATURE_SIZE; +} + int rseq_register_current_thread(void) { int rc; @@ -119,7 +150,7 @@ int rseq_register_current_thread(void) /* Treat libc's ownership as a successful registration. */ return 0; } - rc = sys_rseq(&__rseq_abi, rseq_size, 0, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG); if (rc) { if (RSEQ_READ_ONCE(rseq_reg_success)) { /* Incoherent success/failure within process. */ @@ -140,28 +171,12 @@ int rseq_unregister_current_thread(void) /* Treat libc's ownership as a successful unregistration. */ return 0; } - rc = sys_rseq(&__rseq_abi, rseq_size, RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG); + rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), RSEQ_ABI_FLAG_UNREGISTER, RSEQ_SIG); if (rc) return -1; return 0; } -static -unsigned int get_rseq_feature_size(void) -{ - unsigned long auxv_rseq_feature_size, auxv_rseq_align; - - auxv_rseq_align = getauxval(AT_RSEQ_ALIGN); - assert(!auxv_rseq_align || auxv_rseq_align <= RSEQ_THREAD_AREA_ALLOC_SIZE); - - auxv_rseq_feature_size = getauxval(AT_RSEQ_FEATURE_SIZE); - assert(!auxv_rseq_feature_size || auxv_rseq_feature_size <= RSEQ_THREAD_AREA_ALLOC_SIZE); - if (auxv_rseq_feature_size) - return auxv_rseq_feature_size; - else - return ORIG_RSEQ_FEATURE_SIZE; -} - static __attribute__((constructor)) void rseq_init(void) { @@ -178,28 +193,54 @@ void rseq_init(void) } if (libc_rseq_size_p && libc_rseq_offset_p && libc_rseq_flags_p && *libc_rseq_size_p != 0) { + unsigned int libc_rseq_size; + /* rseq registration owned by glibc */ rseq_offset = *libc_rseq_offset_p; - rseq_size = *libc_rseq_size_p; + libc_rseq_size = *libc_rseq_size_p; rseq_flags = *libc_rseq_flags_p; - rseq_feature_size = get_rseq_feature_size(); - if (rseq_feature_size > rseq_size) - rseq_feature_size = rseq_size; + + /* + * Previous versions of glibc expose the value + * 32 even though the kernel only supported 20 + * bytes initially. Therefore treat 32 as a + * special-case. glibc 2.40 exposes a 20 bytes + * __rseq_size without using getauxval(3) to + * query the supported size, while still allocating a 32 + * bytes area. Also treat 20 as a special-case. + * + * Special-cases are handled by using the following + * value as active feature set size: + * + * rseq_size = min(32, get_rseq_kernel_feature_size()) + */ + switch (libc_rseq_size) { + case ORIG_RSEQ_FEATURE_SIZE: + fallthrough; + case ORIG_RSEQ_ALLOC_SIZE: + { + unsigned int rseq_kernel_feature_size = get_rseq_kernel_feature_size(); + + if (rseq_kernel_feature_size < ORIG_RSEQ_ALLOC_SIZE) + rseq_size = rseq_kernel_feature_size; + else + rseq_size = ORIG_RSEQ_ALLOC_SIZE; + break; + } + default: + /* Otherwise just use the __rseq_size from libc as rseq_size. */ + rseq_size = libc_rseq_size; + break; + } return; } rseq_ownership = 1; if (!rseq_available()) { rseq_size = 0; - rseq_feature_size = 0; return; } rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer(); rseq_flags = 0; - rseq_feature_size = get_rseq_feature_size(); - if (rseq_feature_size == ORIG_RSEQ_FEATURE_SIZE) - rseq_size = ORIG_RSEQ_ALLOC_SIZE; - else - rseq_size = RSEQ_THREAD_AREA_ALLOC_SIZE; } static __attribute__((destructor)) @@ -209,7 +250,6 @@ void rseq_exit(void) return; rseq_offset = 0; rseq_size = -1U; - rseq_feature_size = -1U; rseq_ownership = 0; } --- a/tools/testing/selftests/rseq/rseq.h +++ b/tools/testing/selftests/rseq/rseq.h @@ -68,12 +68,6 @@ extern unsigned int rseq_size; /* Flags used during rseq registration. */ extern unsigned int rseq_flags; -/* - * rseq feature size supported by the kernel. 0 if the registration was - * unsuccessful. - */ -extern unsigned int rseq_feature_size; - enum rseq_mo { RSEQ_MO_RELAXED = 0, RSEQ_MO_CONSUME = 1, /* Unused */ @@ -193,7 +187,7 @@ static inline uint32_t rseq_current_cpu( static inline bool rseq_node_id_available(void) { - return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, node_id); + return (int) rseq_size >= rseq_offsetofend(struct rseq_abi, node_id); } /* @@ -207,7 +201,7 @@ static inline uint32_t rseq_current_node static inline bool rseq_mm_cid_available(void) { - return (int) rseq_feature_size >= rseq_offsetofend(struct rseq_abi, mm_cid); + return (int) rseq_size >= rseq_offsetofend(struct rseq_abi, mm_cid); } static inline uint32_t rseq_current_mm_cid(void)

1 year, 2 months

1
0
0 0

[PATCH bpf-next v1 0/3] Improve .BTF_ids patching and alignment

by Tony Ambardar

Hello all, This patch series offers improvements to the way .BTF_ids section data is created and later patched by resolve_btfids. Patch #1 simplifies the byte-order translation in resolve_btfids while making it more resilient to future .BTF_ids encoding updates. Patch #2 makes sure all BTF ID data is 4-byte aligned, and not only the .BTF_ids used for vmlinux. Patch #3 syncs the above changes in btf_ids.h to tools/include, obviating a previous alignment fix in selftests/bpf. Feedback and suggestions are welcome! Best regards, Tony Tony Ambardar (3): tools/resolve_btfids: Simplify handling cross-endian compilation bpf: btf: Ensure natural alignment of .BTF_ids section tools/bpf, selftests/bpf : Sync btf_ids.h to tools include/linux/btf_ids.h | 1 + tools/bpf/resolve_btfids/main.c | 60 +++++--------- tools/include/linux/btf_ids.h | 80 +++++++++++++++++-- .../selftests/bpf/prog_tests/resolve_btfids.c | 6 -- 4 files changed, 97 insertions(+), 50 deletions(-) -- 2.34.1

1 year, 2 months

4
10
0 0

kselftest/fixes kselftest-seccomp: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc3)

by kernelci.org bot

kselftest/fixes kselftest-seccomp: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc3) Regressions Summary ------------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/job/kselftest/branch/fixes/kernel/linux_kselftest… Test: kselftest-seccomp Tree: kselftest Branch: fixes Describe: linux_kselftest-fixes-6.12-rc3 URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git SHA: 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Test Regressions ---------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670d07f5cd937325b5c86857 Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-seccomp.login: https://kernelci.org/test/case/id/670d07f5cd937325b5c86858 failing since 5 days (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b)

1 year, 2 months

1
0
0 0

kselftest/fixes kselftest-lib: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc3)

by kernelci.org bot

kselftest/fixes kselftest-lib: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc3) Regressions Summary ------------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/job/kselftest/branch/fixes/kernel/linux_kselftest… Test: kselftest-lib Tree: kselftest Branch: fixes Describe: linux_kselftest-fixes-6.12-rc3 URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git SHA: 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Test Regressions ---------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670d06ca62e90ff6e7c86855 Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-lib.login: https://kernelci.org/test/case/id/670d06ca62e90ff6e7c86856 failing since 5 days (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b)

1 year, 2 months

1
0
0 0

kselftest/fixes kselftest-cpufreq: 3 runs, 3 regressions (linux_kselftest-fixes-6.12-rc3)

by kernelci.org bot

kselftest/fixes kselftest-cpufreq: 3 runs, 3 regressions (linux_kselftest-fixes-6.12-rc3) Regressions Summary ------------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 sun50i-a64-pine64-plus | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 sun50i-h5-lib...ch-all-h3-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/job/kselftest/branch/fixes/kernel/linux_kselftest… Test: kselftest-cpufreq Tree: kselftest Branch: fixes Describe: linux_kselftest-fixes-6.12-rc3 URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git SHA: 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Test Regressions ---------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670d07df5ce1577dbec86858 Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-cpufreq.login: https://kernelci.org/test/case/id/670d07df5ce1577dbec86859 failing since 5 days (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b) platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ sun50i-a64-pine64-plus | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670d09462706bb6cd8c8685d Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-cpufreq.login: https://kernelci.org/test/case/id/670d09472706bb6cd8c8685e failing since 5 days (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b) platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ sun50i-h5-lib...ch-all-h3-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670d068aaf8e516253c8685f Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-cpufreq.login: https://kernelci.org/test/case/id/670d068aaf8e516253c86860 failing since 5 days (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b)

1 year, 2 months

1
0
0 0

kselftest/fixes build: 7 builds: 2 failed, 5 passed, 1 warning (linux_kselftest-fixes-6.12-rc3)

by kernelci.org bot

kselftest/fixes build: 7 builds: 2 failed, 5 passed, 1 warning (linux_kselftest-fixes-6.12-rc3) Full Build Summary: https://kernelci.org/build/kselftest/branch/fixes/kernel/linux_kselftest-fi… Tree: kselftest Branch: fixes Git Describe: linux_kselftest-fixes-6.12-rc3 Git Commit: 4ee5ca9a29384fcf3f18232fdf8474166dea8dca Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git Built: 4 unique architectures Build Failures Detected: arm64: defconfig+kselftest+arm64-chromebook: (clang-16) FAIL defconfig+kselftest+arm64-chromebook: (gcc-12) FAIL Warnings Detected: arm64: arm: i386: x86_64: x86_64_defconfig+kselftest (clang-16): 1 warning Warnings summary: 1 vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x23: relocation to !ENDBR: .text+0x14fd19 ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- defconfig+kselftest (arm64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+kselftest+arm64-chromebook (arm64, gcc-12) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+kselftest+arm64-chromebook (arm64, clang-16) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- i386_defconfig+kselftest (i386, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v7_defconfig+kselftest (arm, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+kselftest (x86_64, gcc-12) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+kselftest (x86_64, clang-16) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: vmlinux.o: warning: objtool: set_ftrace_ops_ro+0x23: relocation to !ENDBR: .text+0x14fd19 --- For more info write to <info(a)kernelci.org>

1 year, 2 months

1
0
0 0

[PATCH v1 0/1] update mseal.rst

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)chromium.org> Pedro Falcato's optimization [1] for checking sealed VMAs, which replaces the can_modify_mm() function with an in-loop check, necessitates an update to the mseal.rst documentation to reflect this change. Furthermore, the document has received offline comments regarding the code sample and suggestions for sentence clarification to enhance reader comprehension. [1] https://lore.kernel.org/linux-mm/20240817-mseal-depessimize-v3-0-d8d2e037df… Jeff Xu (1): mseal: update mseal.rst Documentation/userspace-api/mseal.rst | 290 ++++++++++++-------------- 1 file changed, 136 insertions(+), 154 deletions(-) -- 2.46.1.824.gd892dcdcdd-goog

1 year, 2 months

5
11
0 0

[PATCH V12 00/14] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

by Adrian Hunter

Hi Note for V12: There was a small conflict between the Intel PT changes in "KVM: x86: Fix Intel PT Host/Guest mode when host tracing" and the changes in this patch set, so I have put the patch sets together, along with outstanding fix "perf/x86/intel/pt: Fix buffer full but size is 0 case" Cover letter for KVM changes (patches 2 to 4): There is a long-standing problem whereby running Intel PT on host and guest in Host/Guest mode, causes VM-Entry failure. The motivation for this patch set is to provide a fix for stable kernels prior to the advent of the "Mediated Passthrough vPMU" patch set: https://lore.kernel.org/kvm/20240801045907.4010984-1-mizhang@google.com/ which would render a large part of the fix unnecessary but likely not be suitable for backport to stable due to its size and complexity. Ideally, this patch set would be applied before "Mediated Passthrough vPMU" Note that the fix does not conflict with "Mediated Passthrough vPMU", it is just that "Mediated Passthrough vPMU" will make the code to stop and restart Intel PT unnecessary. Note for V11: Moving aux_paused into a union within struct hw_perf_event caused a regression because aux_paused was being written unconditionally even though it is valid only for AUX (e.g. Intel PT) PMUs. That is fixed in V11. Hardware traces, such as instruction traces, can produce a vast amount of trace data, so being able to reduce tracing to more specific circumstances can be useful. The ability to pause or resume tracing when another event happens, can do that. These patches add such a facilty and show how it would work for Intel Processor Trace. Maintainers of other AUX area tracing implementations are requested to consider if this is something they might employ and then whether or not the ABI would work for them. Note, thank you to James Clark (ARM) for evaluating the API for Coresight. Suzuki K Poulose (ARM) also responded positively to the RFC. Changes to perf tools are now (since V4) fleshed out. Please note, Intel® Architecture Instruction Set Extensions and Future Features Programming Reference March 2024 319433-052, currently: https://cdrdv2.intel.com/v1/dl/getContent/671368 introduces hardware pause / resume for Intel PT in a feature named Intel PT Trigger Tracing. For that more fields in perf_event_attr will be necessary. The main differences are: - it can be applied not just to overflows, but optionally to every event - a packet is emitted into the trace, optionally with IP information - no PMI - works with PMC and DR (breakpoint) events only Here are the proposed additions to perf_event_attr, please comment: diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index 0c557f0a17b3..05dcc43f11bb 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -369,6 +369,22 @@ enum perf_event_read_format { PERF_FORMAT_MAX = 1U << 5, /* non-ABI */ }; +enum { + PERF_AUX_ACTION_START_PAUSED = 1U << 0, + PERF_AUX_ACTION_PAUSE = 1U << 1, + PERF_AUX_ACTION_RESUME = 1U << 2, + PERF_AUX_ACTION_EMIT = 1U << 3, + PERF_AUX_ACTION_NR = 0x1f << 4, + PERF_AUX_ACTION_NO_IP = 1U << 9, + PERF_AUX_ACTION_PAUSE_ON_EVT = 1U << 10, + PERF_AUX_ACTION_RESUME_ON_EVT = 1U << 11, + PERF_AUX_ACTION_EMIT_ON_EVT = 1U << 12, + PERF_AUX_ACTION_NR_ON_EVT = 0x1f << 13, + PERF_AUX_ACTION_NO_IP_ON_EVT = 1U << 18, + PERF_AUX_ACTION_MASK = ~PERF_AUX_ACTION_START_PAUSED, + PERF_AUX_PAUSE_RESUME_MASK = PERF_AUX_ACTION_PAUSE | PERF_AUX_ACTION_RESUME, +}; + #define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */ #define PERF_ATTR_SIZE_VER1 72 /* add: config2 */ #define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */ @@ -515,10 +531,19 @@ struct perf_event_attr { union { __u32 aux_action; struct { - __u32 aux_start_paused : 1, /* start AUX area tracing paused */ - aux_pause : 1, /* on overflow, pause AUX area tracing */ - aux_resume : 1, /* on overflow, resume AUX area tracing */ - __reserved_3 : 29; + __u32 aux_start_paused : 1, /* start AUX area tracing paused */ + aux_pause : 1, /* on overflow, pause AUX area tracing */ + aux_resume : 1, /* on overflow, resume AUX area tracing */ + aux_emit : 1, /* generate AUX records instead of events */ + aux_nr : 5, /* AUX area tracing reference number */ + aux_no_ip : 1, /* suppress IP in AUX records */ + /* Following apply to event occurrence not overflows */ + aux_pause_on_evt : 1, /* on event, pause AUX area tracing */ + aux_resume_on_evt : 1, /* on event, resume AUX area tracing */ + aux_emit_on_evt : 1, /* generate AUX records instead of events */ + aux_nr_on_evt : 5, /* AUX area tracing reference number */ + aux_no_ip_on_evt : 1, /* suppress IP in AUX records */ + __reserved_3 : 13; }; }; Changes in V12: Add previously sent patch "perf/x86/intel/pt: Fix buffer full but size is 0 case" Add previously sent patch set "KVM: x86: Fix Intel PT Host/Guest mode when host tracing" Rebase on current tip plus patch set "KVM: x86: Fix Intel PT Host/Guest mode when host tracing" Changes in V11: perf/core: Add aux_pause, aux_resume, aux_start_paused Make assignment to event->hw.aux_paused conditional on (pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE). perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling Remove definition of has_aux_action() because it has already been added as an inline function. perf/x86/intel/pt: Fix sampling synchronization perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64 perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF Dropped because they have already been applied Changes in V10: perf/core: Add aux_pause, aux_resume, aux_start_paused Move aux_paused into a union within struct hw_perf_event. Additional comment wrt PERF_EF_PAUSE/PERF_EF_RESUME. Factor out has_aux_action() as an inline function. Use scoped_guard for irqsave. Move calls of perf_event_aux_pause() from __perf_event_output() to __perf_event_overflow(). Changes in V9: perf/x86/intel/pt: Fix sampling synchronization New patch perf/core: Add aux_pause, aux_resume, aux_start_paused Move aux_paused to struct hw_perf_event perf/x86/intel/pt: Add support for pause / resume Add more comments and barriers for resume_allowed and pause_allowed Always use WRITE_ONCE with resume_allowed Changes in V8: perf tools: Parse aux-action Fix clang warning: util/auxtrace.c:821:7: error: missing field 'aux_action' initializer [-Werror,-Wmissing-field-initializers] 821 | {NULL}, | ^ Changes in V7: Add Andi's Reviewed-by for patches 2-12 Re-base Changes in V6: perf/core: Add aux_pause, aux_resume, aux_start_paused Removed READ/WRITE_ONCE from __perf_event_aux_pause() Expanded comment about guarding against NMI Changes in V5: perf/core: Add aux_pause, aux_resume, aux_start_paused Added James' Ack perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling New patch perf tools Added Ian's Ack Changes in V4: perf/core: Add aux_pause, aux_resume, aux_start_paused Rename aux_output_cfg -> aux_action Reorder aux_action bits from: aux_pause, aux_resume, aux_start_paused to: aux_start_paused, aux_pause, aux_resume Fix aux_action bits __u64 -> __u32 coresight: Have a stab at support for pause / resume Dropped perf tools All new patches Changes in RFC V3: coresight: Have a stab at support for pause / resume 'mode' -> 'flags' so it at least compiles Changes in RFC V2: Use ->stop() / ->start() instead of ->pause_resume() Move aux_start_paused bit into aux_output_cfg Tighten up when Intel PT pause / resume is allowed Add an example of how it might work for CoreSight Adrian Hunter (14): perf/x86/intel/pt: Fix buffer full but size is 0 case KVM: x86: Fix Intel PT IA32_RTIT_CTL MSR validation KVM: x86: Fix Intel PT Host/Guest mode when host tracing also KVM: selftests: Add guest Intel PT test perf/core: Add aux_pause, aux_resume, aux_start_paused perf/x86/intel/pt: Add support for pause / resume perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling perf tools: Add aux_start_paused, aux_pause and aux_resume perf tools: Add aux-action config term perf tools: Parse aux-action perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume perf intel-pt: Improve man page format perf intel-pt: Add documentation for pause / resume perf intel-pt: Add a test for pause / resume arch/x86/events/intel/core.c | 4 +- arch/x86/events/intel/pt.c | 209 +++++++- arch/x86/events/intel/pt.h | 16 + arch/x86/include/asm/intel_pt.h | 4 + arch/x86/kvm/vmx/vmx.c | 26 +- arch/x86/kvm/vmx/vmx.h | 1 - include/linux/perf_event.h | 28 + include/uapi/linux/perf_event.h | 11 +- kernel/events/core.c | 72 ++- kernel/events/internal.h | 1 + tools/include/uapi/linux/perf_event.h | 11 +- tools/perf/Documentation/perf-intel-pt.txt | 596 +++++++++++++-------- tools/perf/Documentation/perf-record.txt | 4 + tools/perf/builtin-record.c | 4 +- tools/perf/tests/shell/test_intel_pt.sh | 28 + tools/perf/util/auxtrace.c | 67 ++- tools/perf/util/auxtrace.h | 6 +- tools/perf/util/evsel.c | 13 +- tools/perf/util/evsel.h | 1 + tools/perf/util/evsel_config.h | 1 + tools/perf/util/parse-events.c | 10 + tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + tools/perf/util/perf_event_attr_fprintf.c | 3 + tools/perf/util/pmu.c | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/x86_64/processor.h | 1 + tools/testing/selftests/kvm/x86_64/intel_pt.c | 381 +++++++++++++ 28 files changed, 1238 insertions(+), 264 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/intel_pt.c Regards Adrian

1 year, 2 months

2
15
0 0

[PATCH net-next 0/3] Threads support in proc connector

by Anjali Kulkarni

Recently we committed a fix to allow processes to receive notifications for non-zero exits via the process connector module. Commit is a4c9a56e6a2c. However, for threads, when it does a pthread_exit(&exit_status) call, the kernel is not aware of the exit status with which pthread_exit is called. It is sent by child thread to the parent process, if it is waiting in pthread_join(). Hence, for a thread exiting abnormally, kernel cannot send notifications to any listening processes. The exception to this is if the thread is sent a signal which it has not handled, and dies along with it's process as a result; for eg. SIGSEGV or SIGKILL. In this case, kernel is aware of the non-zero exit and sends a notification for it. For our use case, we cannot have parent wait in pthread_join, one of the main reasons for this being that we do not want to track normal pthread_exit(), which could be a very large number. We only want to be notified of any abnormal exits. Hence, threads are created with pthread_attr_t set to PTHREAD_CREATE_DETACHED. To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector API, which allows a thread to send it's exit status to kernel either when it needs to call pthread_exit() with non-zero value to indicate some error or from signal handler before pthread_exit(). Anjali Kulkarni (3): connector/cn_proc: Add hash table for threads connector/cn_proc: Kunit tests for threads hash table connector/cn_proc: Selftest for threads drivers/connector/Makefile | 2 +- drivers/connector/cn_hash.c | 240 ++++++++++++++++++ drivers/connector/cn_proc.c | 59 ++++- drivers/connector/connector.c | 96 ++++++- include/linux/connector.h | 47 ++++ include/linux/sched.h | 2 +- include/uapi/linux/cn_proc.h | 4 +- lib/Kconfig.debug | 17 ++ lib/Makefile | 1 + lib/cn_hash_test.c | 167 ++++++++++++ lib/cn_hash_test.h | 12 + tools/testing/selftests/connector/Makefile | 23 +- .../testing/selftests/connector/proc_filter.c | 5 + tools/testing/selftests/connector/thread.c | 90 +++++++ .../selftests/connector/thread_filter.c | 93 +++++++ 15 files changed, 848 insertions(+), 10 deletions(-) create mode 100644 drivers/connector/cn_hash.c create mode 100644 lib/cn_hash_test.c create mode 100644 lib/cn_hash_test.h create mode 100644 tools/testing/selftests/connector/thread.c create mode 100644 tools/testing/selftests/connector/thread_filter.c -- 2.46.0

1 year, 2 months

2
6
0 0

[PATCH kdevops] defconfig: add linux-modules-kpd defconfig symlink

by Luis Chamberlain

We have now two kdevops proof of concepts with kernel-patches-daemon [0], one for Linux kernel modules testing [1] and the other with radix tree testing (xarray, maple tree) [2]. These trees just contain the required .github/workflows/* files used to trigger a github self-hosted runner to run kdevops since evaluation shows that using github hosted runners will just not work or scale for Linux kernel testing [3]. The way this works with KPD is that KPD has an app in the linux-kdevops organization which is in charge of taking patch series posted to your respective subsystem patchwork (you can have dedicated filters on a mailing list for only specific files if you don't have a dedicated mailing list), it creates a git tree branch using your configured KPD main development tree source, and pushes it out to a respective test tree under github for for you. For example, in the case of development for Linux modules it pushes out a branch with a delta onto the linux-modules-kpd tree [4] and in it, it will also merge the latest kdevops-ci-modules [1] work, which is where the github runner work gets developed. For the radix tree we currently do not yet have a patchwork instance defined but we *could*, and the way it would work is that KPD would push out a branch into the linux-radix-tree-kpd [5] tree with the github actions defined in its respective kdevops-ci-radix-tree [3] tree. What these PoC shows is that the way kdevops has designed testing selftests is that we actually only need to differ in *one* single line of code on the github actions runner to test either of these two Linux kernel subsystems: the defconfig used. To be able to *share* the *same* Linux kernel github actions runner code development between the Linux kernel module tests and the radix tree, all we need to do then is use the git tree onto which a delta was pushed onto as the source for the defconfig. So all we have to do now is just add a symlink of the respective development test tree onto its corresponding defconfig. Add the respective defconfig then for linux-modules-kpd by symlinking it to the seltests-kmod-cli defconfig. This will let us later share *one* github development action runner code for self-hosted runners for *all* Linux kernel sefltests we define in *one* development tree which KPD could leverage. Now that we have locked down the linux-kdevops github organization to only allow respective developers to be able to trigger pushes or PRs, this also allows us to add dedicated self-hosted runners per target test development repository so we can scale our testing as we need with security in mind. The only thing left to do here now, is to evaluate if we want an allow check for who's patches we want to enable automatic testing for through KPD. [0] https://github.com/facebookincubator/kernel-patches-daemon [1] https://github.com/linux-kdevops/kdevops-ci-modules [2] https://github.com/linux-kdevops/kdevops-ci-radix-tree [3] https://lore.kernel.org/kdevops/CAB=NE6VKWSkv1JZ_Z2LKq4o7+JBkKc6u8Wa1zxxBnG… [4] https://github.com/linux-kdevops/linux-modules-kpd [5] https://github.com/linux-kdevops/linux-radix-tree-kpd Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org> --- defconfigs/linux-modules-kpd | 1 + 1 file changed, 1 insertion(+) create mode 120000 defconfigs/linux-modules-kpd diff --git a/defconfigs/linux-modules-kpd b/defconfigs/linux-modules-kpd new file mode 120000 index 000000000000..e61fd7f687b0 --- /dev/null +++ b/defconfigs/linux-modules-kpd @@ -0,0 +1 @@ +seltests-kmod-cli \ No newline at end of file -- 2.43.0

1 year, 2 months

1
0
0 0

[PATCH net-next v21 11/14] mm: page_frag: add testing for the newly added prepare API

by Yunsheng Lin

Add testing for the newly added prepare API, for both aligned and non-aligned API, also probe API is also tested along with prepare API. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> --- .../selftests/mm/page_frag/page_frag_test.c | 76 +++++++++++++++++-- tools/testing/selftests/mm/run_vmtests.sh | 4 + tools/testing/selftests/mm/test_page_frag.sh | 27 +++++++ 3 files changed, 102 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index e806c1866e36..1e47e9ad66f0 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -32,6 +32,10 @@ static bool test_align; module_param(test_align, bool, 0); MODULE_PARM_DESC(test_align, "use align API for testing"); +static bool test_prepare; +module_param(test_prepare, bool, 0); +MODULE_PARM_DESC(test_prepare, "use prepare API for testing"); + static int test_alloc_len = 2048; module_param(test_alloc_len, int, 0); MODULE_PARM_DESC(test_alloc_len, "alloc len for testing"); @@ -74,6 +78,21 @@ static int page_frag_pop_thread(void *arg) return 0; } +static void frag_frag_test_commit(struct page_frag_cache *nc, + struct page_frag *prepare_pfrag, + struct page_frag *probe_pfrag, + unsigned int used_sz) +{ + if (prepare_pfrag->page != probe_pfrag->page || + prepare_pfrag->offset != probe_pfrag->offset || + prepare_pfrag->size != probe_pfrag->size) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "wrong probed info\n"); + } + + page_frag_commit(nc, prepare_pfrag, used_sz); +} + static int page_frag_push_thread(void *arg) { struct ptr_ring *ring = arg; @@ -86,15 +105,61 @@ static int page_frag_push_thread(void *arg) int ret; if (test_align) { - va = page_frag_alloc_align(&test_nc, test_alloc_len, - GFP_KERNEL, SMP_CACHE_BYTES); + if (test_prepare) { + struct page_frag prepare_frag, probe_frag; + void *probe_va; + + va = page_frag_alloc_refill_prepare_align(&test_nc, + test_alloc_len, + &prepare_frag, + GFP_KERNEL, + SMP_CACHE_BYTES); + + probe_va = __page_frag_alloc_refill_probe_align(&test_nc, + test_alloc_len, + &probe_frag, + -SMP_CACHE_BYTES); + if (va != probe_va) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "wrong va\n"); + } + + if (likely(va)) + frag_frag_test_commit(&test_nc, &prepare_frag, + &probe_frag, test_alloc_len); + } else { + va = page_frag_alloc_align(&test_nc, + test_alloc_len, + GFP_KERNEL, + SMP_CACHE_BYTES); + } if ((unsigned long)va & (SMP_CACHE_BYTES - 1)) { force_exit = true; WARN_ONCE(true, TEST_FAILED_PREFIX "unaligned va returned\n"); } } else { - va = page_frag_alloc(&test_nc, test_alloc_len, GFP_KERNEL); + if (test_prepare) { + struct page_frag prepare_frag, probe_frag; + void *probe_va; + + va = page_frag_alloc_refill_prepare(&test_nc, test_alloc_len, + &prepare_frag, GFP_KERNEL); + + probe_va = page_frag_alloc_refill_probe(&test_nc, test_alloc_len, + &probe_frag); + + if (va != probe_va) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "wrong va\n"); + } + + if (likely(va)) + frag_frag_test_commit(&test_nc, &prepare_frag, + &probe_frag, test_alloc_len); + } else { + va = page_frag_alloc(&test_nc, test_alloc_len, GFP_KERNEL); + } } if (!va) @@ -176,8 +241,9 @@ static int __init page_frag_test_init(void) } duration = (u64)ktime_us_delta(ktime_get(), start); - pr_info("%d of iterations for %s testing took: %lluus\n", nr_test, - test_align ? "aligned" : "non-aligned", duration); + pr_info("%d of iterations for %s %s API testing took: %lluus\n", nr_test, + test_align ? "aligned" : "non-aligned", + test_prepare ? "prepare" : "alloc", duration); out: ptr_ring_cleanup(&ptr_ring, NULL); diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 2c5394584af4..f6ff9080a6f2 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -464,6 +464,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned +CATEGORY="page_frag" run_test ./test_page_frag.sh aligned_prepare + +CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned_prepare + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output diff --git a/tools/testing/selftests/mm/test_page_frag.sh b/tools/testing/selftests/mm/test_page_frag.sh index f55b105084cf..1c757fd11844 100755 --- a/tools/testing/selftests/mm/test_page_frag.sh +++ b/tools/testing/selftests/mm/test_page_frag.sh @@ -43,6 +43,8 @@ check_test_failed_prefix() { SMOKE_PARAM="test_push_cpu=$TEST_CPU_0 test_pop_cpu=$TEST_CPU_1" NONALIGNED_PARAM="$SMOKE_PARAM test_alloc_len=75 nr_test=$NR_TEST" ALIGNED_PARAM="$NONALIGNED_PARAM test_align=1" +NONALIGNED_PREPARE_PARAM="$NONALIGNED_PARAM test_prepare=1" +ALIGNED_PREPARE_PARAM="$ALIGNED_PARAM test_prepare=1" check_test_requirements() { @@ -77,6 +79,20 @@ run_aligned_check() insmod $DRIVER $ALIGNED_PARAM > /dev/null 2>&1 } +run_nonaligned_prepare_check() +{ + echo "Run performance tests to evaluate how fast nonaligned prepare API is." + + insmod $DRIVER $NONALIGNED_PREPARE_PARAM > /dev/null 2>&1 +} + +run_aligned_prepare_check() +{ + echo "Run performance tests to evaluate how fast aligned prepare API is." + + insmod $DRIVER $ALIGNED_PREPARE_PARAM > /dev/null 2>&1 +} + run_smoke_check() { echo "Run smoke test." @@ -87,6 +103,7 @@ run_smoke_check() usage() { echo -n "Usage: $0 [ aligned ] | [ nonaligned ] | | [ smoke ] | " + echo "[ aligned_prepare ] | [ nonaligned_prepare ] | " echo "manual parameters" echo echo "Valid tests and parameters:" @@ -107,6 +124,12 @@ usage() echo "# Performance testing for aligned alloc API" echo "$0 aligned" echo + echo "# Performance testing for nonaligned prepare API" + echo "$0 nonaligned_prepare" + echo + echo "# Performance testing for aligned prepare API" + echo "$0 aligned_prepare" + echo exit 0 } @@ -158,6 +181,10 @@ function run_test() run_nonaligned_check elif [[ "$1" = "aligned" ]]; then run_aligned_check + elif [[ "$1" = "nonaligned_prepare" ]]; then + run_nonaligned_prepare_check + elif [[ "$1" = "aligned_prepare" ]]; then + run_aligned_prepare_check else run_manual_check $@ fi -- 2.33.0

1 year, 2 months

1
0
0 0

[PATCH net-next v21 04/14] mm: page_frag: avoid caller accessing 'page_frag_cache' directly

by Yunsheng Lin

Use appropriate frag_page API instead of caller accessing 'page_frag_cache' directly. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> Acked-by: Chuck Lever <chuck.lever(a)oracle.com> --- drivers/vhost/net.c | 2 +- include/linux/page_frag_cache.h | 10 ++++++++++ net/core/skbuff.c | 6 +++--- net/rxrpc/conn_object.c | 4 +--- net/rxrpc/local_object.c | 4 +--- net/sunrpc/svcsock.c | 6 ++---- tools/testing/selftests/mm/page_frag/page_frag_test.c | 2 +- 7 files changed, 19 insertions(+), 15 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f16279351db5..9ad37c012189 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1325,7 +1325,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) vqs[VHOST_NET_VQ_RX]); f->private_data = n; - n->pf_cache.va = NULL; + page_frag_cache_init(&n->pf_cache); return 0; } diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 67ac8626ed9b..0a52f7a179c8 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -7,6 +7,16 @@ #include <linux/mm_types_task.h> #include <linux/types.h> +static inline void page_frag_cache_init(struct page_frag_cache *nc) +{ + nc->va = NULL; +} + +static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) +{ + return !!nc->pfmemalloc; +} + void page_frag_cache_drain(struct page_frag_cache *nc); void __page_frag_cache_drain(struct page *page, unsigned int count); void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 00afeb90c23a..6841e61a6bd0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -753,14 +753,14 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, if (in_hardirq() || irqs_disabled()) { nc = this_cpu_ptr(&netdev_alloc_cache); data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(nc); } else { local_bh_disable(); local_lock_nested_bh(&napi_alloc_cache.bh_lock); nc = this_cpu_ptr(&napi_alloc_cache.page); data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(nc); local_unlock_nested_bh(&napi_alloc_cache.bh_lock); local_bh_enable(); @@ -850,7 +850,7 @@ struct sk_buff *napi_alloc_skb(struct napi_struct *napi, unsigned int len) len = SKB_HEAD_ALIGN(len); data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + pfmemalloc = page_frag_cache_is_pfmemalloc(&nc->page); } local_unlock_nested_bh(&napi_alloc_cache.bh_lock); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index 1539d315afe7..694c4df7a1a3 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -337,9 +337,7 @@ static void rxrpc_clean_up_connection(struct work_struct *work) */ rxrpc_purge_queue(&conn->rx_queue); - if (conn->tx_data_alloc.va) - __page_frag_cache_drain(virt_to_page(conn->tx_data_alloc.va), - conn->tx_data_alloc.pagecnt_bias); + page_frag_cache_drain(&conn->tx_data_alloc); call_rcu(&conn->rcu, rxrpc_rcu_free_connection); } diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index f9623ace2201..2792d2304605 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -452,9 +452,7 @@ void rxrpc_destroy_local(struct rxrpc_local *local) #endif rxrpc_purge_queue(&local->rx_queue); rxrpc_purge_client_connections(local); - if (local->tx_alloc.va) - __page_frag_cache_drain(virt_to_page(local->tx_alloc.va), - local->tx_alloc.pagecnt_bias); + page_frag_cache_drain(&local->tx_alloc); } /* diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 825ec5357691..b785425c3315 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1608,7 +1608,6 @@ static void svc_tcp_sock_detach(struct svc_xprt *xprt) static void svc_sock_free(struct svc_xprt *xprt) { struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt); - struct page_frag_cache *pfc = &svsk->sk_frag_cache; struct socket *sock = svsk->sk_sock; trace_svcsock_free(svsk, sock); @@ -1618,8 +1617,7 @@ static void svc_sock_free(struct svc_xprt *xprt) sockfd_put(sock); else sock_release(sock); - if (pfc->va) - __page_frag_cache_drain(virt_to_head_page(pfc->va), - pfc->pagecnt_bias); + + page_frag_cache_drain(&svsk->sk_frag_cache); kfree(svsk); } diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 13c44133e009..e806c1866e36 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -126,7 +126,7 @@ static int __init page_frag_test_init(void) u64 duration; int ret; - test_nc.va = NULL; + page_frag_cache_init(&test_nc); atomic_set(&nthreads, 2); init_completion(&wait); -- 2.33.0

1 year, 2 months

1
0
0 0

[PATCH net-next v21 02/14] mm: move the page fragment allocator from page_alloc into its own file

by Yunsheng Lin

Inspired by [1], move the page fragment allocator from page_alloc into its own c file and header file, as we are about to make more change for it to replace another page_frag implementation in sock.c As this patchset is going to replace 'struct page_frag' with 'struct page_frag_cache' in sched.h, including page_frag_cache.h in sched.h has a compiler error caused by interdependence between mm_types.h and mm.h for asm-offsets.c, see [2]. So avoid the compiler error by moving 'struct page_frag_cache' to mm_types_task.h as suggested by Alexander, see [3]. 1. https://lore.kernel.org/all/20230411160902.4134381-3-dhowells@redhat.com/ 2. https://lore.kernel.org/all/15623dac-9358-4597-b3ee-3694a5956920@gmail.com/ 3. https://lore.kernel.org/all/CAKgT0UdH1yD=LSCXFJ=YM_aiA4OomD-2wXykO42bizaWMt… CC: David Howells <dhowells(a)redhat.com> CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Acked-by: Andrew Morton <akpm(a)linux-foundation.org> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> --- include/linux/gfp.h | 22 --- include/linux/mm_types.h | 18 --- include/linux/mm_types_task.h | 18 +++ include/linux/page_frag_cache.h | 31 ++++ include/linux/skbuff.h | 1 + mm/Makefile | 1 + mm/page_alloc.c | 136 ---------------- mm/page_frag_cache.c | 145 ++++++++++++++++++ .../selftests/mm/page_frag/page_frag_test.c | 2 +- 9 files changed, 197 insertions(+), 177 deletions(-) create mode 100644 include/linux/page_frag_cache.h create mode 100644 mm/page_frag_cache.c diff --git a/include/linux/gfp.h b/include/linux/gfp.h index a951de920e20..a0a6d25f883f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -371,28 +371,6 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas extern void __free_pages(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); -struct page_frag_cache; -void page_frag_cache_drain(struct page_frag_cache *nc); -extern void __page_frag_cache_drain(struct page *page, unsigned int count); -void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, - gfp_t gfp_mask, unsigned int align_mask); - -static inline void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); -} - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) -{ - return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); -} - -extern void page_frag_free(void *addr); - #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..92314ef2d978 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -521,9 +521,6 @@ static_assert(sizeof(struct ptdesc) <= sizeof(struct page)); */ #define STRUCT_PAGE_MAX_SHIFT (order_base_2(sizeof(struct page))) -#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) -#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) - /* * page_private can be used on tail pages. However, PagePrivate is only * checked by the VM on the head page. So page_private on the tail pages @@ -542,21 +539,6 @@ static inline void *folio_get_private(struct folio *folio) return folio->private; } -struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif - /* we maintain a pagecount bias, so that we dont dirty cache line - * containing page->_refcount every time we allocate a fragment. - */ - unsigned int pagecnt_bias; - bool pfmemalloc; -}; - typedef unsigned long vm_flags_t; /* diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index bff5706b76e1..0ac6daebdd5c 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -8,6 +8,7 @@ * (These are defined separately to decouple sched.h from mm_types.h as much as possible.) */ +#include <linux/align.h> #include <linux/types.h> #include <asm/page.h> @@ -43,6 +44,23 @@ struct page_frag { #endif }; +#define PAGE_FRAG_CACHE_MAX_SIZE __ALIGN_MASK(32768, ~PAGE_MASK) +#define PAGE_FRAG_CACHE_MAX_ORDER get_order(PAGE_FRAG_CACHE_MAX_SIZE) +struct page_frag_cache { + void *va; +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + __u16 offset; + __u16 size; +#else + __u32 offset; +#endif + /* we maintain a pagecount bias, so that we dont dirty cache line + * containing page->_refcount every time we allocate a fragment. + */ + unsigned int pagecnt_bias; + bool pfmemalloc; +}; + /* Track pages that require TLB flushes */ struct tlbflush_unmap_batch { #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h new file mode 100644 index 000000000000..67ac8626ed9b --- /dev/null +++ b/include/linux/page_frag_cache.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_PAGE_FRAG_CACHE_H +#define _LINUX_PAGE_FRAG_CACHE_H + +#include <linux/log2.h> +#include <linux/mm_types_task.h> +#include <linux/types.h> + +void page_frag_cache_drain(struct page_frag_cache *nc); +void __page_frag_cache_drain(struct page *page, unsigned int count); +void *__page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, + gfp_t gfp_mask, unsigned int align_mask); + +static inline void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align) +{ + WARN_ON_ONCE(!is_power_of_2(align)); + return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); +} + +static inline void *page_frag_alloc(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask) +{ + return __page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); +} + +void page_frag_free(void *addr); + +#endif diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 39f1d16f3628..560e2b49f98b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -31,6 +31,7 @@ #include <linux/in6.h> #include <linux/if_packet.h> #include <linux/llist.h> +#include <linux/page_frag_cache.h> #include <net/flow.h> #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include <linux/netfilter/nf_conntrack_common.h> diff --git a/mm/Makefile b/mm/Makefile index d5639b036166..dba52bb0da8a 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -65,6 +65,7 @@ page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o memory-hotplug-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o obj-y += page-alloc.o +obj-y += page_frag_cache.o obj-y += init-mm.o obj-y += memblock.o obj-y += $(memory-hotplug-y) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8afab64814dc..6ca2abce857b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4836,142 +4836,6 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page = NULL; - gfp_t gfp = gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | - __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; - - return page; -} - -void page_frag_cache_drain(struct page_frag_cache *nc) -{ - if (!nc->va) - return; - - __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); - nc->va = NULL; -} -EXPORT_SYMBOL(page_frag_cache_drain); - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); - - if (page_ref_sub_and_test(page, count)) - free_unref_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *__page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; - } - - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_unref_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(__page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 page. - */ -void page_frag_free(void *addr) -{ - struct page *page = virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_unref_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c new file mode 100644 index 000000000000..609a485cd02a --- /dev/null +++ b/mm/page_frag_cache.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides within a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device drivers to + * provide a backing region of memory for use as either an sk_buff->head, or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include <linux/export.h> +#include <linux/gfp_types.h> +#include <linux/init.h> +#include <linux/mm.h> +#include <linux/page_frag_cache.h> +#include "internal.h" + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page = NULL; + gfp_t gfp = gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | + __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va = page ? page_address(page) : NULL; + + return page; +} + +void page_frag_cache_drain(struct page_frag_cache *nc) +{ + if (!nc->va) + return; + + __page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias); + nc->va = NULL; +} +EXPORT_SYMBOL(page_frag_cache_drain); + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count)) + free_unref_page(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *__page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size = PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page = __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset = size; + } + + offset = nc->offset - fragsz; + if (unlikely(offset < 0)) { + page = virt_to_page(nc->va); + + if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) + goto refill; + + if (unlikely(nc->pfmemalloc)) { + free_unref_page(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &= align_mask; + nc->offset = offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(__page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 page. + */ +void page_frag_free(void *addr) +{ + struct page *page = virt_to_head_page(addr); + + if (unlikely(put_page_testzero(page))) + free_unref_page(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c index 912d97b99107..13c44133e009 100644 --- a/tools/testing/selftests/mm/page_frag/page_frag_test.c +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -6,12 +6,12 @@ * Copyright (C) 2024 Yunsheng Lin <linyunsheng(a)huawei.com> */ -#include <linux/mm.h> #include <linux/module.h> #include <linux/cpumask.h> #include <linux/completion.h> #include <linux/ptr_ring.h> #include <linux/kthread.h> +#include <linux/page_frag_cache.h> #define TEST_FAILED_PREFIX "page_frag_test failed: " -- 2.33.0

1 year, 2 months

1
0
0 0

[PATCH net-next v21 01/14] mm: page_frag: add a test module for page_frag

by Yunsheng Lin

The testing is done by ensuring that the fragment allocated from a frag_frag_cache instance is pushed into a ptr_ring instance in a kthread binded to a specified cpu, and a kthread binded to a specified cpu will pop the fragment from the ptr_ring and free the fragment. CC: Alexander Duyck <alexander.duyck(a)gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng(a)huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com> --- tools/testing/selftests/mm/Makefile | 3 + tools/testing/selftests/mm/page_frag/Makefile | 18 ++ .../selftests/mm/page_frag/page_frag_test.c | 198 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 8 + tools/testing/selftests/mm/test_page_frag.sh | 175 ++++++++++++++++ 5 files changed, 402 insertions(+) create mode 100644 tools/testing/selftests/mm/page_frag/Makefile create mode 100644 tools/testing/selftests/mm/page_frag/page_frag_test.c create mode 100755 tools/testing/selftests/mm/test_page_frag.sh diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 02e1204971b0..acec529baaca 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -36,6 +36,8 @@ MAKEFLAGS += --no-builtin-rules CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) LDLIBS = -lrt -lpthread -lm +TEST_GEN_MODS_DIR := page_frag + TEST_GEN_FILES = cow TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_longterm @@ -126,6 +128,7 @@ TEST_FILES += test_hmm.sh TEST_FILES += va_high_addr_switch.sh TEST_FILES += charge_reserved_hugetlb.sh TEST_FILES += hugetlb_reparenting_test.sh +TEST_FILES += test_page_frag.sh # required by charge_reserved_hugetlb.sh TEST_FILES += write_hugetlb_memory.sh diff --git a/tools/testing/selftests/mm/page_frag/Makefile b/tools/testing/selftests/mm/page_frag/Makefile new file mode 100644 index 000000000000..58dda74d50a3 --- /dev/null +++ b/tools/testing/selftests/mm/page_frag/Makefile @@ -0,0 +1,18 @@ +PAGE_FRAG_TEST_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))) +KDIR ?= $(abspath $(PAGE_FRAG_TEST_DIR)/../../../../..) + +ifeq ($(V),1) +Q = +else +Q = @ +endif + +MODULES = page_frag_test.ko + +obj-m += page_frag_test.o + +all: + +$(Q)make -C $(KDIR) M=$(PAGE_FRAG_TEST_DIR) modules + +clean: + +$(Q)make -C $(KDIR) M=$(PAGE_FRAG_TEST_DIR) clean diff --git a/tools/testing/selftests/mm/page_frag/page_frag_test.c b/tools/testing/selftests/mm/page_frag/page_frag_test.c new file mode 100644 index 000000000000..912d97b99107 --- /dev/null +++ b/tools/testing/selftests/mm/page_frag/page_frag_test.c @@ -0,0 +1,198 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Test module for page_frag cache + * + * Copyright (C) 2024 Yunsheng Lin <linyunsheng(a)huawei.com> + */ + +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/cpumask.h> +#include <linux/completion.h> +#include <linux/ptr_ring.h> +#include <linux/kthread.h> + +#define TEST_FAILED_PREFIX "page_frag_test failed: " + +static struct ptr_ring ptr_ring; +static int nr_objs = 512; +static atomic_t nthreads; +static struct completion wait; +static struct page_frag_cache test_nc; +static int test_popped; +static int test_pushed; +static bool force_exit; + +static int nr_test = 2000000; +module_param(nr_test, int, 0); +MODULE_PARM_DESC(nr_test, "number of iterations to test"); + +static bool test_align; +module_param(test_align, bool, 0); +MODULE_PARM_DESC(test_align, "use align API for testing"); + +static int test_alloc_len = 2048; +module_param(test_alloc_len, int, 0); +MODULE_PARM_DESC(test_alloc_len, "alloc len for testing"); + +static int test_push_cpu; +module_param(test_push_cpu, int, 0); +MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing fragment"); + +static int test_pop_cpu; +module_param(test_pop_cpu, int, 0); +MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping fragment"); + +static int page_frag_pop_thread(void *arg) +{ + struct ptr_ring *ring = arg; + + pr_info("page_frag pop test thread begins on cpu %d\n", + smp_processor_id()); + + while (test_popped < nr_test) { + void *obj = __ptr_ring_consume(ring); + + if (obj) { + test_popped++; + page_frag_free(obj); + } else { + if (force_exit) + break; + + cond_resched(); + } + } + + if (atomic_dec_and_test(&nthreads)) + complete(&wait); + + pr_info("page_frag pop test thread exits on cpu %d\n", + smp_processor_id()); + + return 0; +} + +static int page_frag_push_thread(void *arg) +{ + struct ptr_ring *ring = arg; + + pr_info("page_frag push test thread begins on cpu %d\n", + smp_processor_id()); + + while (test_pushed < nr_test && !force_exit) { + void *va; + int ret; + + if (test_align) { + va = page_frag_alloc_align(&test_nc, test_alloc_len, + GFP_KERNEL, SMP_CACHE_BYTES); + + if ((unsigned long)va & (SMP_CACHE_BYTES - 1)) { + force_exit = true; + WARN_ONCE(true, TEST_FAILED_PREFIX "unaligned va returned\n"); + } + } else { + va = page_frag_alloc(&test_nc, test_alloc_len, GFP_KERNEL); + } + + if (!va) + continue; + + ret = __ptr_ring_produce(ring, va); + if (ret) { + page_frag_free(va); + cond_resched(); + } else { + test_pushed++; + } + } + + pr_info("page_frag push test thread exits on cpu %d\n", + smp_processor_id()); + + if (atomic_dec_and_test(&nthreads)) + complete(&wait); + + return 0; +} + +static int __init page_frag_test_init(void) +{ + struct task_struct *tsk_push, *tsk_pop; + int last_pushed = 0, last_popped = 0; + ktime_t start; + u64 duration; + int ret; + + test_nc.va = NULL; + atomic_set(&nthreads, 2); + init_completion(&wait); + + if (test_alloc_len > PAGE_SIZE || test_alloc_len <= 0 || + !cpu_active(test_push_cpu) || !cpu_active(test_pop_cpu)) + return -EINVAL; + + ret = ptr_ring_init(&ptr_ring, nr_objs, GFP_KERNEL); + if (ret) + return ret; + + tsk_push = kthread_create_on_cpu(page_frag_push_thread, &ptr_ring, + test_push_cpu, "page_frag_push"); + if (IS_ERR(tsk_push)) + return PTR_ERR(tsk_push); + + tsk_pop = kthread_create_on_cpu(page_frag_pop_thread, &ptr_ring, + test_pop_cpu, "page_frag_pop"); + if (IS_ERR(tsk_pop)) { + kthread_stop(tsk_push); + return PTR_ERR(tsk_pop); + } + + start = ktime_get(); + wake_up_process(tsk_push); + wake_up_process(tsk_pop); + + pr_info("waiting for test to complete\n"); + + while (!wait_for_completion_timeout(&wait, msecs_to_jiffies(10000))) { + /* exit if there is no progress for push or pop size */ + if (last_pushed == test_pushed || last_popped == test_popped) { + WARN_ONCE(true, TEST_FAILED_PREFIX "no progress\n"); + force_exit = true; + continue; + } + + last_pushed = test_pushed; + last_popped = test_popped; + pr_info("page_frag_test progress: pushed = %d, popped = %d\n", + test_pushed, test_popped); + } + + if (force_exit) { + pr_err(TEST_FAILED_PREFIX "exit with error\n"); + goto out; + } + + duration = (u64)ktime_us_delta(ktime_get(), start); + pr_info("%d of iterations for %s testing took: %lluus\n", nr_test, + test_align ? "aligned" : "non-aligned", duration); + +out: + ptr_ring_cleanup(&ptr_ring, NULL); + page_frag_cache_drain(&test_nc); + + return -EAGAIN; +} + +static void __exit page_frag_test_exit(void) +{ +} + +module_init(page_frag_test_init); +module_exit(page_frag_test_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Yunsheng Lin <linyunsheng(a)huawei.com>"); +MODULE_DESCRIPTION("Test module for page_frag"); diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index c5797ad1d37b..2c5394584af4 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -75,6 +75,8 @@ separated by spaces: read-only VMAs - mdwe test prctl(PR_SET_MDWE, ...) +- page_frag + test handling of page fragment allocation and freeing example: ./run_vmtests.sh -t "hmm mmap ksm" EOF @@ -456,6 +458,12 @@ CATEGORY="mkdirty" run_test ./mkdirty CATEGORY="mdwe" run_test ./mdwe_test +CATEGORY="page_frag" run_test ./test_page_frag.sh smoke + +CATEGORY="page_frag" run_test ./test_page_frag.sh aligned + +CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output diff --git a/tools/testing/selftests/mm/test_page_frag.sh b/tools/testing/selftests/mm/test_page_frag.sh new file mode 100755 index 000000000000..f55b105084cf --- /dev/null +++ b/tools/testing/selftests/mm/test_page_frag.sh @@ -0,0 +1,175 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2024 Yunsheng Lin <linyunsheng(a)huawei.com> +# Copyright (C) 2018 Uladzislau Rezki (Sony) <urezki(a)gmail.com> +# +# This is a test script for the kernel test driver to test the +# correctness and performance of page_frag's implementation. +# Therefore it is just a kernel module loader. You can specify +# and pass different parameters in order to: +# a) analyse performance of page fragment allocations; +# b) stressing and stability check of page_frag subsystem. + +DRIVER="./page_frag/page_frag_test.ko" +CPU_LIST=$(grep -m 2 processor /proc/cpuinfo | cut -d ' ' -f 2) +TEST_CPU_0=$(echo $CPU_LIST | awk '{print $1}') + +if [ $(echo $CPU_LIST | wc -w) -gt 1 ]; then + TEST_CPU_1=$(echo $CPU_LIST | awk '{print $2}') + NR_TEST=100000000 +else + TEST_CPU_1=$TEST_CPU_0 + NR_TEST=1000000 +fi + +# 1 if fails +exitcode=1 + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +check_test_failed_prefix() { + if dmesg | grep -q 'page_frag_test failed:';then + echo "page_frag_test failed, please check dmesg" + exit $exitcode + fi +} + +# +# Static templates for testing of page_frag APIs. +# Also it is possible to pass any supported parameters manually. +# +SMOKE_PARAM="test_push_cpu=$TEST_CPU_0 test_pop_cpu=$TEST_CPU_1" +NONALIGNED_PARAM="$SMOKE_PARAM test_alloc_len=75 nr_test=$NR_TEST" +ALIGNED_PARAM="$NONALIGNED_PARAM test_align=1" + +check_test_requirements() +{ + uid=$(id -u) + if [ $uid -ne 0 ]; then + echo "$0: Must be run as root" + exit $ksft_skip + fi + + if ! which insmod > /dev/null 2>&1; then + echo "$0: You need insmod installed" + exit $ksft_skip + fi + + if [ ! -f $DRIVER ]; then + echo "$0: You need to compile page_frag_test module" + exit $ksft_skip + fi +} + +run_nonaligned_check() +{ + echo "Run performance tests to evaluate how fast nonaligned alloc API is." + + insmod $DRIVER $NONALIGNED_PARAM > /dev/null 2>&1 +} + +run_aligned_check() +{ + echo "Run performance tests to evaluate how fast aligned alloc API is." + + insmod $DRIVER $ALIGNED_PARAM > /dev/null 2>&1 +} + +run_smoke_check() +{ + echo "Run smoke test." + + insmod $DRIVER $SMOKE_PARAM > /dev/null 2>&1 +} + +usage() +{ + echo -n "Usage: $0 [ aligned ] | [ nonaligned ] | | [ smoke ] | " + echo "manual parameters" + echo + echo "Valid tests and parameters:" + echo + modinfo $DRIVER + echo + echo "Example usage:" + echo + echo "# Shows help message" + echo "$0" + echo + echo "# Smoke testing" + echo "$0 smoke" + echo + echo "# Performance testing for nonaligned alloc API" + echo "$0 nonaligned" + echo + echo "# Performance testing for aligned alloc API" + echo "$0 aligned" + echo + exit 0 +} + +function validate_passed_args() +{ + VALID_ARGS=`modinfo $DRIVER | awk '/parm:/ {print $2}' | sed 's/:.*//'` + + # + # Something has been passed, check it. + # + for passed_arg in $@; do + key=${passed_arg//=*/} + valid=0 + + for valid_arg in $VALID_ARGS; do + if [[ $key = $valid_arg ]]; then + valid=1 + break + fi + done + + if [[ $valid -ne 1 ]]; then + echo "Error: key is not correct: ${key}" + exit $exitcode + fi + done +} + +function run_manual_check() +{ + # + # Validate passed parameters. If there is wrong one, + # the script exists and does not execute further. + # + validate_passed_args $@ + + echo "Run the test with following parameters: $@" + insmod $DRIVER $@ > /dev/null 2>&1 +} + +function run_test() +{ + if [ $# -eq 0 ]; then + usage + else + if [[ "$1" = "smoke" ]]; then + run_smoke_check + elif [[ "$1" = "nonaligned" ]]; then + run_nonaligned_check + elif [[ "$1" = "aligned" ]]; then + run_aligned_check + else + run_manual_check $@ + fi + fi + + check_test_failed_prefix + + echo "Done." + echo "Check the kernel ring buffer to see the summary." +} + +check_test_requirements +run_test $@ + +exit 0 -- 2.33.0

1 year, 2 months

1
0
0 0

[PATCH v3] lib/crc16_kunit.c: add KUnit tests for crc16

by Vinicius Peixoto

Add Kunit tests for the kernel's implementation of the standard CRC-16 algorithm (<linux/crc16.h>). The test data consists of 100 randomly-generated test cases, validated against a naive CRC-16 implementation. This test follows roughly the same logic as lib/crc32test.c, but without the performance measurements. Signed-off-by: Vinicius Peixoto <vpeixoto(a)lkcamp.dev> Co-developed-by: Enzo Bertoloti <ebertoloti(a)lkcamp.dev> Signed-off-by: Enzo Bertoloti <ebertoloti(a)lkcamp.dev> Co-developed-by: Fabricio Gasperin <fgasperin(a)lkcamp.dev> Signed-off-by: Fabricio Gasperin <fgasperin(a)lkcamp.dev> Suggested-by: David Laight <David.Laight(a)ACULAB.COM> --- This patch was developed during a hackathon organized by LKCAMP [1], with the objective of writing KUnit tests, both to introduce people to the kernel development process and to learn about different subsystems (with the positive side effect of improving the kernel test coverage, of course). We noticed there were tests for CRC32 in lib/crc32test.c and thought it would be nice to have something similar for CRC16, since it seems to be widely used in network drivers (as well as in some ext4 code). We would really appreciate any feedback/suggestions on how to improve this. Thanks! :-) Changes in v2 (suggested by David Laight): - Use the PRNG from include/linux/prandom.h to generate pseudorandom data/test cases instead of having them hardcoded as large static arrays - Add a naive CRC16 implementation used to validate the kernel's implementation (instead of having the test case results be hard-coded) - Link to v1: https://lore.kernel.org/linux-kselftest/20240922232643.535329-1-vpeixoto@lk… Changes in v3: - Fix compilation warnings about function documentation - Link to v2: https://lore.kernel.org/r/20241003-crc16-kunit-v2-1-5fe74b113e1e@lkcamp.dev [1] https://lkcamp.dev/about --- lib/Kconfig.debug | 9 ++++ lib/Makefile | 1 + lib/crc16_kunit.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 165 insertions(+) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 7315f643817ae1021f1e4b3dd27b424f49e3f761..f9617e3054948ce43090f524dc67650e9549cee8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2850,6 +2850,15 @@ config USERCOPY_KUNIT_TEST on the copy_to/from_user infrastructure, making sure basic user/kernel boundary testing is working. +config CRC16_KUNIT_TEST + tristate "KUnit tests for CRC16" + depends on KUNIT + default KUNIT_ALL_TESTS + select CRC16 + help + Enable this option to run unit tests for the kernel's CRC16 + implementation (<linux/crc16.h>). + config TEST_UDELAY tristate "udelay test driver" help diff --git a/lib/Makefile b/lib/Makefile index 773adf88af41665b2419202e5427e0513c6becae..1faed6414a85fd366b4966a00e8ba231d7546e14 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -389,6 +389,7 @@ CFLAGS_fortify_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN) obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o obj-$(CONFIG_USERCOPY_KUNIT_TEST) += usercopy_kunit.o +obj-$(CONFIG_CRC16_KUNIT_TEST) += crc16_kunit.o obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o diff --git a/lib/crc16_kunit.c b/lib/crc16_kunit.c new file mode 100644 index 0000000000000000000000000000000000000000..0918c98a96d26f4e795e3eb92923db7c549ac01f --- /dev/null +++ b/lib/crc16_kunit.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KUnits tests for CRC16. + * + * Copyright (C) 2024, LKCAMP + * Author: Vinicius Peixoto <vpeixoto(a)lkcamp.dev> + * Author: Fabricio Gasperin <fgasperin(a)lkcamp.dev> + * Author: Enzo Bertoloti <ebertoloti(a)lkcamp.dev> + */ +#include <kunit/test.h> +#include <linux/crc16.h> +#include <linux/prandom.h> + +#define CRC16_KUNIT_DATA_SIZE 4096 +#define CRC16_KUNIT_TEST_SIZE 100 +#define CRC16_KUNIT_SEED 0x12345678 + +/** + * struct crc16_test - CRC16 test data + * @crc: initial input value to CRC16 + * @start: Start index within the data buffer + * @length: Length of the data + */ +static struct crc16_test { + u16 crc; + u16 start; + u16 length; +} tests[CRC16_KUNIT_TEST_SIZE]; + +u8 data[CRC16_KUNIT_DATA_SIZE]; + + +/* Naive implementation of CRC16 for validation purposes */ +static inline u16 _crc16_naive_byte(u16 crc, u8 data) +{ + u8 i = 0; + + crc ^= (u16) data; + for (i = 0; i < 8; i++) { + if (crc & 0x01) + crc = (crc >> 1) ^ 0xa001; + else + crc = crc >> 1; + } + + return crc; +} + + +static inline u16 _crc16_naive(u16 crc, u8 *buffer, size_t len) +{ + while (len--) + crc = _crc16_naive_byte(crc, *buffer++); + return crc; +} + + +/* Small helper for generating pseudorandom 16-bit data */ +static inline u16 _rand16(void) +{ + static u32 rand = CRC16_KUNIT_SEED; + + rand = next_pseudo_random32(rand); + return rand & 0xFFFF; +} + + +static int crc16_init_test_data(struct kunit_suite *suite) +{ + size_t i; + + /* Fill the data buffer with random bytes */ + for (i = 0; i < CRC16_KUNIT_DATA_SIZE; i++) + data[i] = _rand16() & 0xFF; + + /* Generate random test data while ensuring the random + * start + length values won't overflow the 4096-byte + * buffer (0x7FF * 2 = 0xFFE < 0x1000) + */ + for (size_t i = 0; i < CRC16_KUNIT_TEST_SIZE; i++) { + tests[i].crc = _rand16(); + tests[i].start = _rand16() & 0x7FF; + tests[i].length = _rand16() & 0x7FF; + } + + return 0; +} + +static void crc16_test_empty(struct kunit *test) +{ + u16 crc; + + /* The result for empty data should be the same as the + * initial crc + */ + crc = crc16(0x00, data, 0); + KUNIT_EXPECT_EQ(test, crc, 0); + crc = crc16(0xFF, data, 0); + KUNIT_EXPECT_EQ(test, crc, 0xFF); +} + +static void crc16_test_correctness(struct kunit *test) +{ + size_t i; + u16 crc, crc_naive; + + for (i = 0; i < CRC16_KUNIT_TEST_SIZE; i++) { + /* Compare results with the naive crc16 implementation */ + crc = crc16(tests[i].crc, data + tests[i].start, + tests[i].length); + crc_naive = _crc16_naive(tests[i].crc, data + tests[i].start, + tests[i].length); + KUNIT_EXPECT_EQ(test, crc, crc_naive); + } +} + + +static void crc16_test_combine(struct kunit *test) +{ + size_t i, j; + u16 crc, crc_naive; + + /* Make sure that combining two consecutive crc16 calculations + * yields the same result as calculating the crc16 for the whole thing + */ + for (i = 0; i < CRC16_KUNIT_TEST_SIZE; i++) { + crc_naive = crc16(tests[i].crc, data + tests[i].start, tests[i].length); + for (j = 0; j < tests[i].length; j++) { + crc = crc16(tests[i].crc, data + tests[i].start, j); + crc = crc16(crc, data + tests[i].start + j, tests[i].length - j); + KUNIT_EXPECT_EQ(test, crc, crc_naive); + } + } +} + + +static struct kunit_case crc16_test_cases[] = { + KUNIT_CASE(crc16_test_empty), + KUNIT_CASE(crc16_test_combine), + KUNIT_CASE(crc16_test_correctness), + {}, +}; + +static struct kunit_suite crc16_test_suite = { + .name = "crc16", + .test_cases = crc16_test_cases, + .suite_init = crc16_init_test_data, +}; +kunit_test_suite(crc16_test_suite); + +MODULE_AUTHOR("Fabricio Gasperin <fgasperin(a)lkcamp.dev>"); +MODULE_AUTHOR("Vinicius Peixoto <vpeixoto(a)lkcamp.dev>"); +MODULE_AUTHOR("Enzo Bertoloti <ebertoloti(a)lkcamp.dev>"); +MODULE_DESCRIPTION("Unit tests for crc16"); +MODULE_LICENSE("GPL"); --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20241003-crc16-kunit-127a4dc2b72c Best regards, -- Vinicius Peixoto <vpeixoto(a)lkcamp.dev>

1 year, 2 months

1
0
0 0

[PATCH v4 00/10] iommufd support pasid attach/replace

by Yi Liu

PASID (Process Address Space ID) is a PCIe extension to tag the DMA transactions out of a physical device, and most modern IOMMU hardware have supported PASID granular address translation. So a PASID-capable device can be attached to multiple hwpts (a.k.a. domains), and each attachment is tagged with a pasid. This series is based on the preparation series [1] [2], it first adds a missing iommu API to replace the domain for a pasid. Based on the iommu pasid attach/ replace/detach APIs, this series adds iommufd APIs for device drivers to attach/replace/detach pasid to/from hwpt per userspace's request, and adds selftest to validate the iommufd APIs. While this series has a missing part which is to enforce the domain allocation with special flag if it will be used by PASID [3]. This is due to special requirements by AMD. Since it is still in mailing discussion [4], so let's mark it here. Once it's finalized, this series needs to enforce the domain flag check to ensure the AMD pasid support is not broken from day-1. The completed code can be found in the below link [5]. Heads up! The existing iommufd selftest was broken, there was a fix [6] to it, but not been upstreamed yet. If want to run the iommufd selftest, please apply that fix. Sorry for the inconvenience. [1] https://lore.kernel.org/linux-iommu/20240912130427.10119-1-yi.l.liu@intel.c… [2] https://lore.kernel.org/linux-iommu/20240912130653.11028-1-yi.l.liu@intel.c… [3] https://lore.kernel.org/linux-iommu/20240822124433.GD3468552@ziepe.ca/ [4] https://lore.kernel.org/linux-iommu/20240911101911.6269-3-vasant.hegde@amd.… [5] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid [6] https://lore.kernel.org/linux-iommu/20240111073213.180020-1-baolu.lu@linux.… Change log: v4: - Replace remove_dev_pasid() by supporting set_dev_pasid() for blocking domain (Kevin) - This is done by the preparation series "Support attaching PASID to the blocked_domain" - Misc tweaks to foil the merging of the iommufd iopf series. Three new patches are added: - iommufd: Always pass iommu_attach_handle to iommu core - iommufd: Move the iommufd_handle helpers to iommufd_private.h - iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle() - Renmae patch 03 of v3 to be "iommufd: Support pasid attach/replace" - Add test case for attaching/replacing iopf-capable hwpt to pasid v3: https://lore.kernel.org/kvm/20240628090557.50898-1-yi.l.liu@intel.com/ - Split the set_dev_pasid op enhancements for domain replacement to be a separate series "Make set_dev_pasid op supportting domain replacement" [1]. The below changes are made in the separate series. *) set_dev_pasid() callback should keep the old config if failed to attach to a domain. This simplifies the caller a lot as caller does not need to attach it back to old domain explicitly. This also avoids some corner cases in which the core may do duplicated domain attachment as described in below link (Jason) https://lore.kernel.org/linux-iommu/BN9PR11MB52768C98314A95AFCD2FA6478C0F2@… *) Drop patch 10 of v2 as it's a bug fix and can be submitted separately (Kevin) *) Rebase on top of Baolu's domain_alloc_paging refactor series (Jason) - Drop the attach_data which includes attach_fn and pasid, insteadly passing the pasid through the device attach path. (Jason) - Add a pasid-num-bits property to mock dev to make pasid selftest work (Kevin) v2: https://lore.kernel.org/linux-iommu/20240412081516.31168-1-yi.l.liu@intel.c… - Domain replace for pasid should be handled in set_dev_pasid() callbacks instead of remove_dev_pasid and call set_dev_pasid afteward in iommu layer (Jason) - Make xarray operations more self-contained in iommufd pasid attach/replace/detach (Jason) - Tweak the dev_iommu_get_max_pasids() to allow iommu driver to populate the max_pasids. This makes the iommufd selftest simpler to meet the max_pasids check in iommu_attach_device_pasid() (Jason) v1: https://lore.kernel.org/kvm/20231127063428.127436-1-yi.l.liu@intel.com/#r - Implemnet iommu_replace_device_pasid() to fall back to the original domain if this replacement failed (Kevin) - Add check in do_attach() to check corressponding attach_fn per the pasid value. rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c… Regards, Yi Liu Yi Liu (10): iommu: Introduce a replace API for device pasid iommufd: Refactor __fault_domain_replace_dev() to be a wrapper of iommu_replace_group_handle() iommufd: Move the iommufd_handle helpers to iommufd_private.h iommufd: Always pass iommu_attach_handle to iommu core iommufd: Pass pasid through the device attach/replace path iommufd: Support pasid attach/replace iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu iommufd/selftest: Add a helper to get test device iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add coverage for iommufd pasid attach/detach drivers/iommu/iommu-priv.h | 4 + drivers/iommu/iommu.c | 90 +++++- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/device.c | 46 ++-- drivers/iommu/iommufd/fault.c | 90 ++---- drivers/iommu/iommufd/hw_pagetable.c | 5 +- drivers/iommu/iommufd/iommufd_private.h | 129 ++++++++- drivers/iommu/iommufd/iommufd_test.h | 30 ++ drivers/iommu/iommufd/pasid.c | 157 +++++++++++ drivers/iommu/iommufd/selftest.c | 208 +++++++++++++- include/linux/iommufd.h | 7 + tools/testing/selftests/iommu/iommufd.c | 256 ++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 29 +- tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++ 14 files changed, 1005 insertions(+), 125 deletions(-) create mode 100644 drivers/iommu/iommufd/pasid.c -- 2.34.1

1 year, 2 months

3
20
0 0

[GIT PULL] Kselftest fixes update for Linux 6.12-rc3

by Shuah Khan

Hi Linus, Please pull this kselftest fixes update for Linux 6.12-rc3. This kselftest update for Linux 6.12-rc3 consists of several fixes for build, run-time errors, and reporting errors: -- ftrace: regression test for a kernel crash when running function graph tracing and then enabling function profiler. -- rseq: fix for mm_cid test failure. -- vDSO: - fixes to reporting skip and other error conditions. - changes to unconditionally build chacha and getrandom tests on all architectures to make it easier for them to run in CIs. - build error when sched.h to bring in CLONE_NEWTIME define. diff is attached. Note: Had to fix a commit message last minute on rseq patch right before generating the pull request. The last 2 patches have been in my tree longer than just a few hours. :) thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit c66be905cda24fb782b91053b196bd2e966f95b7: selftests: breakpoints: use remaining time to check if suspend succeed (2024-10-02 14:37:30 -0600) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.12-rc3 for you to fetch changes up to 4ee5ca9a29384fcf3f18232fdf8474166dea8dca: ftrace/selftest: Test combination of function_graph tracer and function profiler (2024-10-11 15:05:16 -0600) ---------------------------------------------------------------- linux_kselftest-fixes-6.12-rc3 This kselftest update for Linux 6.12-rc3 consists of several fixes for build, run-time errors, and reporting errors: -- ftrace: regression test for a kernel crash when running function graph tracing and then enabling function profiler. -- rseq: fix for mm_cid test failure. -- vDSO: - fixes to reporting skip and other error conditions. - changes unconditionally build chacha and getrandom tests on all architectures to make it easier for them to run in CIs. - build error when sched.h to bring in CLONE_NEWTIME define. ---------------------------------------------------------------- Jason A. Donenfeld (3): selftests: vDSO: unconditionally build chacha test selftests: vDSO: unconditionally build getrandom test selftests: vDSO: improve getrandom and chacha error messages Mathieu Desnoyers (1): selftests/rseq: Fix mm_cid test failure Steven Rostedt (1): ftrace/selftest: Test combination of function_graph tracer and function profiler Yu Liao (1): selftests: vDSO: Explicitly include sched.h tools/arch/arm64/vdso | 1 - tools/arch/loongarch/vdso | 1 - tools/arch/powerpc/vdso | 1 - tools/arch/s390/vdso | 1 - tools/arch/x86/vdso | 1 - .../ftrace/test.d/ftrace/fgraph-profiler.tc | 31 ++++++ tools/testing/selftests/rseq/rseq.c | 110 ++++++++++++++------- tools/testing/selftests/rseq/rseq.h | 10 +- tools/testing/selftests/vDSO/Makefile | 6 +- tools/testing/selftests/vDSO/vdso_test_chacha.c | 36 ++++--- tools/testing/selftests/vDSO/vdso_test_getrandom.c | 76 +++++++------- tools/testing/selftests/vDSO/vgetrandom-chacha.S | 18 ++++ 12 files changed, 183 insertions(+), 109 deletions(-) delete mode 120000 tools/arch/arm64/vdso delete mode 120000 tools/arch/loongarch/vdso delete mode 120000 tools/arch/powerpc/vdso delete mode 120000 tools/arch/s390/vdso delete mode 120000 tools/arch/x86/vdso create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/fgraph-profiler.tc create mode 100644 tools/testing/selftests/vDSO/vgetrandom-chacha.S ----------------------------------------------------------------

1 year, 2 months

2
1
0 0

[PATCH v2] selftests: net/rds: add module not found

by Alessandro Zanni

This fix solves this error, when calling kselftest with targets "net/rds": The error was found by running tests manually with the command: make kselftest TARGETS="net/rds" The patch also specifies to import ip() function from the utils module. Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com> --- Notes: v2: modified the way the parent path is added added test to reproduce the error tools/testing/selftests/net/rds/test.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/rds/test.py b/tools/testing/selftests/net/rds/test.py index e6bb109bcead..4a7178d11193 100755 --- a/tools/testing/selftests/net/rds/test.py +++ b/tools/testing/selftests/net/rds/test.py @@ -14,8 +14,11 @@ import sys import atexit from pwd import getpwuid from os import stat -from lib.py import ip +# Allow utils module to be imported from different directory +this_dir = os.path.dirname(os.path.realpath(__file__)) +sys.path.append(os.path.join(this_dir, "../")) +from lib.py.utils import ip libc = ctypes.cdll.LoadLibrary('libc.so.6') setns = libc.setns -- 2.43.0

1 year, 2 months

3
2
0 0

[PATCH v2] selftests: drivers: net: fix name not defined

by Alessandro Zanni

This fix solves this error, when calling kselftest with targets "drivers/net": File "tools/testing/selftests/net/lib/py/nsim.py", line 64, in __init__ if e.errno == errno.ENOSPC: NameError: name 'errno' is not defined The error was found by running tests manually with the command: make kselftest TARGETS="drivers/net" The module errno makes available standard error system symbols. Reviewed-by: Petr Machata <petrm(a)nvidia.com> Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com> --- Notes: v2: added how to run the test tools/testing/selftests/net/lib/py/nsim.py | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/lib/py/nsim.py b/tools/testing/selftests/net/lib/py/nsim.py index f571a8b3139b..1a8cbe9acc48 100644 --- a/tools/testing/selftests/net/lib/py/nsim.py +++ b/tools/testing/selftests/net/lib/py/nsim.py @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 +import errno import json import os import random -- 2.43.0

1 year, 2 months

2
1
0 0

[PATCH v3 0/2] selftests/futex: Create test for robust list

by André Almeida

This patchset creates a selftest for the robust list interface, to track regressions and assure that the interface keeps working as expected. In this version I removed the kselftest_harness include, but I expanded the current futex selftest API a little bit with basic ASSERT_ macros to make the test easier to write and read. In the future, hopefully we can move all futex selftests to the kselftest_harness API anyway. Changes from v2: - Create ASSERT_ macros for futex selftests - Dropped kselftest_harness include, using just futex test API - This is the expected output: TAP version 13 1..6 ok 1 test_robustness ok 2 test_set_robust_list_invalid_size ok 3 test_get_robust_list_self ok 4 test_get_robust_list_child ok 5 test_set_list_op_pending ok 6 test_robust_list_multiple_elements # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 https://lore.kernel.org/lkml/20240903134033.816500-1-andrealmeid@igalia.com André Almeida (2): selftests/futex: Add ASSERT_ macros selftests/futex: Create test for robust list .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../selftests/futex/functional/robust_list.c | 512 ++++++++++++++++++ .../testing/selftests/futex/include/logging.h | 28 + 4 files changed, 543 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/robust_list.c -- 2.46.0

1 year, 2 months

2
3
0 0

[PATCH] selftests/ftrace: Fix check of return value in fgraph-retval.tc test

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> The addition of recording both the function name and return address to the function graph tracer updated the selftest to check for "=-5" from "= -5". But this causes the test to fail on certain configs, as "= -5" is still a value that can be returned if function addresses are not enabled (older kernels). Check for both "=-5" and " -5" as a success value. Fixes: 21e92806d39c6 ("function_graph: Support recording and printing the function return address") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- Shuah, this update is only for changes in my tree, so you do not need to add it. tools/testing/selftests/ftrace/test.d/ftrace/fgraph-retval.tc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-retval.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-retval.tc index e8e46378b88d..4307d4eef417 100644 --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-retval.tc +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-retval.tc @@ -29,7 +29,7 @@ set -e : "Test printing the error code in signed decimal format" echo 0 > options/funcgraph-retval-hex -count=`cat trace | grep 'proc_reg_write' | grep '=-5' | wc -l` +count=`cat trace | grep 'proc_reg_write' | grep -e '=-5 ' -e '= -5 ' | wc -l` if [ $count -eq 0 ]; then fail "Return value can not be printed in signed decimal format" fi -- 2.45.2

1 year, 2 months

2
1
0 0

[PATCH net-next v25 00/13] Device Memory TCP

by Mina Almasry

v25: https://patchwork.kernel.org/project/netdevbpf/list/?series=885396&state=* === Major changes: - Moved devmem.h and mp_dmabuf_devmem.h to internal header files. - Changed the page_pool_params to take in a queue_idx rather than a struct netdev_rx_queue. - Added WARN_ON_ONCE around __skb_checksum readability check and added check to skb_checksum_help(). Other more minor feedback addressed as well. v24: https://patchwork.kernel.org/project/netdevbpf/list/?series=884556&state=* ==== No major changes. Mostly addressing issues in the error paths of dmabuf binding, and code cleanups/improvements from reviewers: Changes: - Fix failing ynl regen error. - Error path fixes & extack error messages in dmabuf binding. - Code cleanup in introspection. - gitignore ynl.d generated file. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v24/ v23: https://patchwork.kernel.org/project/netdevbpf/list/?series=882978&state=* ==== Fixing relatively minor issues called out in v22. (thanks again!) Mostly code cleanups, extack error messages, and minor reworks. Nothing major really changed, so the exact changes per commit is called in the commit messages. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v23/ v22: https://patchwork.kernel.org/project/netdevbpf/list/?series=881158&state=* ==== v22 aims to resolve the pending issue pointed to in v21, which is the interaction with xdp. In this series I rebase on top of the minor refactor which refactors propagating xdp configuration to slave devices: https://patchwork.kernel.org/project/netdevbpf/list/?series=881994&state=* I then disable setting xdp on devices using memory providers, and propagating xdp configuration to devices using memory providers. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v22/ v21: https://patchwork.kernel.org/project/netdevbpf/list/?series=880735&state=* ==== v20 addressed some comments and resolved a test failure, but introduced an unfortunate build error with a config edge case I wasn't testing. v21 simply resolves that error. Major Changes: - Resolve build error with CONFIG_PAGE_POOL=n && CONFIG_NET=y Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v21/ v20: https://patchwork.kernel.org/project/netdevbpf/list/?series=879373&state=* ==== v20 aims to resolve a couple of bug reports against v19, and addresses some review comments around the page_pool_check_memory_provider mechanism. Major changes: - Test edge cases such as header split disabled in selftest. - Change `offset = 0` back to `offset = offset - start` to resolve issue found in RX path by Taehee (thanks!) - Address a few comments around page_pool_check_memory_provider() from Pavel & Jakub. - Removed some unnecessary includes across various patches in the series. - Removed unnecessary EXPORT_SYMBOL(page_pool_mem_providers) (Jakub). - Fix regression caused by incorrect dev_get_max_mp_channel check, along with rename (Jakub). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v20/ v19: https://patchwork.kernel.org/project/netdevbpf/list/?series=876852&state=* ==== v18 got a thorough review (thanks!), and this iteration addresses the feedback. Major changes: - Prevent deactivating mp bound queues. - Prevent installing xdp on mp bound netdevs, or installing mps on xdp installed netdevs. - Fix corner cases in netlink API vis-a-vis missing attributes. - Iron out the unreadable netmem driver support story. To be honest, the conversation with Jakub & Pavel got a bit confusing for me. I've implemented an approach in this set that makes sense to me, and AFAICT, addresses the requirements. It may be good as-is, or it may be a conversation starter/continuer. To be honest IMO there are many ways to skin this cat and I don't see an extremely strong reason to go for one approach over another. Here is one approach you may like. - Don't reset niov dma_addr on allocation & free. - Add some tests to the selftest that catches some of the issues around missing netlink attributes or deactivating mp-bound queues. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v19/ v18: https://patchwork.kernel.org/project/netdevbpf/list/?series=874848&state=* ==== v17 got minor feedback: (a) to beef up the description on patch 1 and (b) to remove the leading underscores in the header definition. I applied (a). (b) seems to be against current conventions so I did not apply before further discussion. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=* ==== v16 also got a very thorough review and some testing (thanks again!). Thes version addresses all the concerns reported on v15, in terms of feedback and issues reported. Major changes: - Use ASSERT_RTNL. - Moved around some of the page_pool helpers definitions so I can hide some netmem helpers in private files as Jakub suggested. - Don't make every net_iov hold a ref on the binding as Jakub suggested. - Fix issue reported by Taehee where we access queues after they have been freed. Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v17/ v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=* ==== v15 got a thorough review and some testing, and this version addresses almost all the feedback. Some more minor comments where the authors said it could be done later, I left out. Major changes: - Addition of dma-buf introspection to page-pool-get and queue-get. - Fixes to selftests suggested by Taehee. - Fixes to documentation suggested by Donald. - A couple of suggestions and fixes to TCP patches by Eric and David. - Fixes to number assignements suggested by Arnd. - Use rtnl_lock()ing to guard against queue reconfiguration while the page_pool initialization is happening. (Jakub). - Fixes to a few warnings reproduced by Taehee. - Fixes to dma-buf binding suggested by Taehee and Jakub. - Fixes to netlink UAPI suggested by Jakub - Applied a number of Reviewed-bys and Acked-bys (including ones I lost from v13+). Full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v16/ One caveat: Taehee reproduced a KASAN warning and reported it here: https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd… I estimate the issue to be minor and easily fixable: https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW… I hope to be able to follow up with a fix to net tree as net-next closes imminently, but if this iteration doesn't make it in, I will repost with a fix squashed after net-next reopens, no problem. v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=* ==== No material changes in this version, only a fix to linking against libynl.a from the last version. Per Jakub's instructions I've pulled one of his patches into this series, and now use the new libynl.a correctly, I hope. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v15/ v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=… ==== No material changes in this version. Only rebase and re-verification on top of net-next. v13, I think, raced with commit ebad6d0334793 ("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to net-next that caused a patchwork failure to apply. This series should apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded IP_GRE config"). I did not wait the customary 24hr as Jakub said it's OK to repost as soon as I build test the rebased version: https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/ v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=… ==== Major changes: -------------- This iteration addresses Pavel's review comments, applies his reviewed-by's, and seeks to fix the patchwork build error (sorry!). As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v13/ v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=* ==== Major changes: -------------- This iteration only addresses one minor comment from Pavel with regards to the trace printing of netmem, and the patchwork build error introduced in v11 because I missed doing an allmodconfig build, sorry. Other than that v11, AFAICT, received no feedback. There is one discussion about how the specifics of plugging io uring memory through the page pool, but not relevant to content in this particular patchset, AFAICT. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v12/ v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=* ==== Major Changes: -------------- v11 addresses feedback received in v10. The major change is the removal of the memory provider ops as requested by Christoph. We still accomplish the same thing, but utilizing direct function calls with if statements rather than generic ops. Additionally address sparse warnings, bugs and review comments from folks that reviewed. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v11/ Detailed changelog: ------------------- - Fixes in netdev_rx_queue_restart() from Pavel & David. - Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for custom page providers") from the series to address Christoph's feedback and rebased other patches on the series on this change. - Fixed build errors with CONFIG_DMA_SHARED_BUFFER && !CONFIG_GENERIC_ALLOCATOR build. - Fixed sparse warnings pointed out by Paolo. - Drop unnecessary gro_pull_from_frag0 checks. - Added Bagas reviewed-by to docs. v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=* ==== Major Changes: -------------- v9 was sent right before the merge window closed (sorry!). v10 is almost a re-send of the series now that the merge window re-opened. Only rebased to latest net-next and addressed some minor iterative comments received on v9. As usual, the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v10/ Detailed changelog: ------------------- - Fixed tokens leaking in DONTNEED setsockopt (Nikolay). - Moved net_iov_dma_addr() to devmem.c and made it a devmem specific helpers (David). - Rename hook alloc_pages to alloc_netmems as alloc_pages is now preprocessor macro defined and causes a build error. v9: === Major Changes: -------------- GVE queue API has been merged. Submitting this version as non-RFC after rebasing on top of the merged API, and dropped the out of tree queue API I was carrying on github. Addressed the little feedback v8 has received. Detailed changelog: ------------------ - Added new patch from David Wei to this series for netdev_rx_queue_restart() - Fixed sparse error. - Removed CONFIG_ checks in netmem_is_net_iov() - Flipped skb->readable to skb->unreadable - Minor fixes to selftests & docs. RFC v8: ======= Major Changes: -------------- - Fixed build error generated by patch-by-patch build. - Applied docs suggestions from Randy. RFC v7: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the feedback RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v7/ Detailed changelog: - Use admin-perm in netlink API. - Addressed feedback from Jakub with regards to netlink API implementation. - Renamed devmem.c functions to something more appropriate for that file. - Improve the performance seen through the page_pool benchmark. - Fix the value definition of all the SO_DEVMEM_* uapi. - Various fixes to documentation. Perf - page-pool benchmark: --------------------------- Improved performance of bench_page_pool_simple.ko tests compared to v6: https://pastebin.com/raw/v5dYRg8L net-next base: 8 cycle fast path. RFC v6: 10 cycle fast path. RFC v7: 9 cycle fast path. RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path, same as baseline. Perf - Devmem TCP benchmark: --------------------- Perf is about the same regardless of the changes in v7, namely the removal of the static_branch_unlikely to improve the page_pool benchmark performance: 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. RFC v6: ======= Major Changes: -------------- This revision largely rebases on top of net-next and addresses the little feedback RFCv5 received. The series remains in RFC because the queue-API ndos defined in this series are not yet implemented. I have a GVE implementation I carry out of tree for my testing. A upstreamable GVE implementation is in the works. Aside from that, in my estimation all the patches are ready for review/merge. Please do take a look. As usual the full devmem TCP changes including the full GVE driver implementation is here: https://github.com/mina/linux/commits/tcpdevmem-v6/ This version also comes with some performance data recorded in the cover letter (see below changelog). Detailed changelog: - Rebased on top of the merged netmem_ref changes. - Converted skb->dmabuf to skb->readable (Pavel). Pavel's original suggestion was to remove the skb->dmabuf flag entirely, but when I looked into it closely, I found the issue that if we remove the flag we have to dereference the shinfo(skb) pointer to obtain the first frag to tell whether an skb is readable or not. This can cause a performance regression if it dirties the cache line when the shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf flag into a generic skb->readable flag which can be re-used by io_uring 0-copy RX. - Squashed a few locking optimizations from Eric Dumazet in the RX path and the DEVMEM_DONTNEED setsockopt. - Expanded the tests a bit. Added validation for invalid scenarios and added some more coverage. Perf - page-pool benchmark: --------------------------- bench_page_pool_simple.ko tests with and without these changes: https://pastebin.com/raw/ncHDwAbn AFAIK the number that really matters in the perf tests is the 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8 cycles without the changes but there is some 1 cycle noise in some results. With the patches this regresses to 9 cycles with the changes but there is 1 cycle noise occasionally running this test repeatedly. Lastly I tried disable the static_branch_unlikely() in netmem_is_net_iov() check. To my surprise disabling the static_branch_unlikely() check reduces the fast path back to 8 cycles, but the 1 cycle noise remains. Perf - Devmem TCP benchmark: --------------------- 189/200gbps bi-directional throughput with RX devmem TCP and regular TCP TX i.e. ~95% line rate. Major changes in RFC v5: ======================== 1. Rebased on top of 'Abstract page from net stack' series and used the new netmem type to refer to LSB set pointers instead of re-using struct page. 2. Downgraded this series back to RFC and called it RFC v5. This is because this series is now dependent on 'Abstract page from net stack'[1] and the queue API. Both are removed from the series to reduce the patch # and those bits are fairly independent or pre-requisite work. 3. Reworked the page_pool devmem support to use netmem and for some more unified handling. 4. Reworked the reference counting of net_iov (renamed from page_pool_iov) to use pp_ref_count for refcounting. The full changes including the dependent series and GVE page pool support is here: https://github.com/mina/linux/commits/tcpdevmem-rfcv5/ [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774 Major changes in v1: ==================== 1. Implemented MVP queue API ndos to remove the userspace-visible driver reset. 2. Fixed issues in the napi_pp_put_page() devmem frag unref path. 3. Removed RFC tag. Many smaller addressed comments across all the patches (patches have individual change log). Full tree including the rest of the GVE driver changes: https://github.com/mina/linux/commits/tcpdevmem-v1 Changes in RFC v3: ================== 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergeable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Changes in RFC v2: ================== The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ================== * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. Approach #1 was attempted in RFC v1. RFC v2 implements approach #2. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this series is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This series is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this series and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: David Wei <dw(a)davidwei.uk> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Yunsheng Lin <linyunsheng(a)huawei.com> Cc: Shailend Chand <shailend(a)google.com> Cc: Harshitha Ramamurthy <hramamurthy(a)google.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Nikolay Aleksandrov <razor(a)blackwall.org> Cc: Taehee Yoo <ap420073(a)gmail.com> Cc: Donald Hunter <donald.hunter(a)gmail.com> Mina Almasry (13): netdev: add netdev_rx_queue_restart() net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator page_pool: devmem support memory-provider: dmabuf devmem memory provider net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags net: add devmem TCP documentation selftests: add ncdevmem, netcat for devmem TCP netdev: add dmabuf introspection Documentation/netlink/specs/netdev.yaml | 61 +++ Documentation/networking/devmem.rst | 269 +++++++++++ Documentation/networking/index.rst | 1 + arch/alpha/include/uapi/asm/socket.h | 6 + arch/mips/include/uapi/asm/socket.h | 6 + arch/parisc/include/uapi/asm/socket.h | 6 + arch/sparc/include/uapi/asm/socket.h | 6 + include/linux/netdevice.h | 2 + include/linux/skbuff.h | 61 ++- include/linux/skbuff_ref.h | 9 +- include/linux/socket.h | 1 + include/net/netdev_rx_queue.h | 5 + include/net/netmem.h | 132 +++++- include/net/page_pool/helpers.h | 39 +- include/net/page_pool/types.h | 23 +- include/net/sock.h | 2 + include/net/tcp.h | 3 +- include/trace/events/page_pool.h | 12 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 13 + include/uapi/linux/uio.h | 17 + net/Kconfig | 5 + net/core/Makefile | 2 + net/core/datagram.c | 6 + net/core/dev.c | 33 +- net/core/devmem.c | 389 ++++++++++++++++ net/core/devmem.h | 180 ++++++++ net/core/gro.c | 3 +- net/core/mp_dmabuf_devmem.h | 44 ++ net/core/netdev-genl-gen.c | 23 + net/core/netdev-genl-gen.h | 6 + net/core/netdev-genl.c | 139 +++++- net/core/netdev_rx_queue.c | 81 ++++ net/core/netmem_priv.h | 31 ++ net/core/page_pool.c | 120 +++-- net/core/page_pool_priv.h | 46 ++ net/core/page_pool_user.c | 32 +- net/core/skbuff.c | 77 +++- net/core/sock.c | 68 +++ net/ethtool/common.c | 8 + net/ipv4/esp4.c | 3 +- net/ipv4/tcp.c | 263 ++++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 16 + net/ipv4/tcp_minisocks.c | 2 + net/ipv4/tcp_output.c | 5 +- net/ipv6/esp6.c | 3 +- net/packet/af_packet.c | 4 +- net/xdp/xsk_buff_pool.c | 5 + tools/include/uapi/linux/netdev.h | 13 + tools/net/ynl/lib/.gitignore | 1 + tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 9 + tools/testing/selftests/net/ncdevmem.c | 570 ++++++++++++++++++++++++ 54 files changed, 2757 insertions(+), 124 deletions(-) create mode 100644 Documentation/networking/devmem.rst create mode 100644 net/core/devmem.c create mode 100644 net/core/devmem.h create mode 100644 net/core/mp_dmabuf_devmem.h create mode 100644 net/core/netdev_rx_queue.c create mode 100644 net/core/netmem_priv.h create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.46.0.469.g59c65b2a67-goog

1 year, 2 months

5
27
0 0

[PATCH net-next v02 0/2] net: af_packet: allow joining a fanout when link is down

by Gur Stavi

PACKET socket can retain its fanout membership through link down and up and leave a fanout while closed regardless of link state. However, socket was forbidden from joining a fanout while it was not RUNNING. This patch allows PACKET socket to join a fanout while not RUNNING. Selftest psock_fanout is extended to test this scenario. This is the only test that was performed. This scenario was identified while studying DPDK pmd_af_packet_drv. Since sockets are only created during initialization, there is no reason to fail the initialization if a single link is temporarily down. I hope it is not considered as breaking user space and that applications are not designed to expect this failure. Changes: V02: * psock_fanout: use explicit loopback up/down instead of toggle. * psock_fanout: don't try to restore loopback state on failure. * Rephrase commit message about "leaving a fanout". V01: https://lore.kernel.org/netdev/cover.1728303615.git.gur.stavi@huawei.com/ Gur Stavi (2): af_packet: allow fanout_add when socket is not RUNNING selftests: net/psock_fanout: socket joins fanout when link is down net/packet/af_packet.c | 10 +++--- tools/testing/selftests/net/psock_fanout.c | 42 ++++++++++++++++++++-- 2 files changed, 44 insertions(+), 8 deletions(-) base-commit: f95b4725e796b12e5f347a0d161e1d3843142aa8 -- 2.45.2

1 year, 2 months

2
18
0 0

[PATCH net-next 2/2] selftests: drv-net: rss_ctx: add rss ctx busy testcase

by Daniel Zahka

It should be invalid to delete an rss context while it is being referenced from an ntuple filter. ethtool core should prevent this from happening. This patch adds a testcase to verify this behavior. Signed-off-by: Daniel Zahka <daniel.zahka(a)gmail.com> --- .../selftests/drivers/net/hw/rss_ctx.py | 32 +++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py index 9d7adb3cf33b..29995586993c 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py @@ -6,7 +6,7 @@ import random from lib.py import ksft_run, ksft_pr, ksft_exit, ksft_eq, ksft_ne, ksft_ge, ksft_lt from lib.py import NetDrvEpEnv from lib.py import EthtoolFamily, NetdevFamily -from lib.py import KsftSkipEx +from lib.py import KsftSkipEx, KsftFailEx from lib.py import rand_port from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure @@ -606,6 +606,33 @@ def test_rss_context_overlap2(cfg): test_rss_context_overlap(cfg, True) +def test_delete_rss_context_busy(cfg): + """ + Test that deletion returns -EBUSY when an rss context is being used + by an ntuple filter. + """ + + require_ntuple(cfg) + + # create additional rss context + ctx_id = ethtool_create(cfg, "-X", "context new") + ctx_deleter = defer(ethtool, f"-X {cfg.ifname} context {ctx_id} delete") + + # utilize context from ntuple filter + port = rand_port() + flow = f"flow-type tcp{cfg.addr_ipver} dst-port {port} context {ctx_id}" + ntuple_id = ethtool_create(cfg, "-N", flow) + defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}") + + # attempt to delete in-use context + try: + ctx_deleter.exec_only() + ctx_deleter.cancel() + raise KsftFailEx(f"deleted context {ctx_id} used by rule {ntuple_id}") + except CmdExitFailure: + pass + + def main() -> None: with NetDrvEpEnv(__file__, nsim_test=False) as cfg: cfg.ethnl = EthtoolFamily() @@ -616,7 +643,8 @@ def main() -> None: test_rss_context, test_rss_context4, test_rss_context32, test_rss_context_dump, test_rss_context_queue_reconfigure, test_rss_context_overlap, test_rss_context_overlap2, - test_rss_context_out_of_order, test_rss_context4_create_with_cfg], + test_rss_context_out_of_order, test_rss_context4_create_with_cfg, + test_delete_rss_context_busy], args=(cfg, )) ksft_exit() -- 2.43.5

1 year, 2 months

1
0
0 0

[PATCH,bpf-next v3 0/4] selftests/bpf: migrate and remove cgroup/tracing related tests

by Daniel T. Lee

The BPF testing framework has evolved significantly over time. However, some legacy tests in the samples/bpf directory have not kept up with these changes. These outdated tests can cause confusion and increase maintenance efforts. This patchset focuses on migrating outdated cgroup and tracing-related tests from samples/bpf to selftests/bpf, ensuring the BPF test suite remains current and efficient. Tests that are already covered by selftests/bpf are removed, while those not yet covered are migrated. This includes cgroup sock create tests for setting socket attributes and blocking socket creation, as well as the removal of redundant cgroup and tracing tests that have been replaced by newer tests. This patchset covers the following cgroup/tracing tests: - test_overhead: tests the overhead of BPF programs with task_rename, now covered by selftests and benchmark tests (rename-*). [1] - test_override_return: tests the return override functionality, now handled by kprobe_multi_override in selftests. - test_probe_write_user: tests the probe_write_user functionality, now replaced by the probe_user test in selftests. - test_cgrp2_sock: tests cgroup BPF's ability to set sk_bound_dev_if, mark, and priority during socket creation. Migrated to selftests as 'sock_create' since no existing tests fully cover this. - test_cgrp2_sock2: tests blocking socket creation for specific types (AF_INET{6}, SOCK_DGRAM, IPPROTO_ICMP{V6}). Migrated to selftests in 'sock_create' test for coverage. - test_current_task_under_cgroup: tests bpf_current_task_under_cgroup() to check if a task belongs to a cgroup. Already covered by task_under_cgroup at selftest and other cgroup ID tests. - test_cgrp2_tc: tests bpf_skb_under_cgroup() to filter packets based on cgroup. This behavior is now validated by cgroup_skb_sk_lookup, which uses bpf_skb_cgroup_id, making this test redundant. [1]: https://patchwork.kernel.org/cover/13759916 --- Changes in v2: - commit message fix Changes in v3: - Makefile fix Daniel T. Lee (4): selftests/bpf: migrate cgroup sock create test for setting iface/mark/prio selftests/bpf: migrate cgroup sock create test for prohibiting sockets samples/bpf: remove obsolete cgroup related tests samples/bpf: remove obsolete tracing related tests samples/bpf/Makefile | 25 -- samples/bpf/sock_flags.bpf.c | 47 --- samples/bpf/test_cgrp2_array_pin.c | 106 ------ samples/bpf/test_cgrp2_attach.c | 177 ---------- samples/bpf/test_cgrp2_sock.c | 296 ---------------- samples/bpf/test_cgrp2_sock.sh | 137 ------- samples/bpf/test_cgrp2_sock2.c | 95 ----- samples/bpf/test_cgrp2_sock2.sh | 103 ------ samples/bpf/test_cgrp2_tc.bpf.c | 56 --- samples/bpf/test_cgrp2_tc.sh | 187 ---------- .../bpf/test_current_task_under_cgroup.bpf.c | 43 --- .../bpf/test_current_task_under_cgroup_user.c | 115 ------ samples/bpf/test_overhead_kprobe.bpf.c | 41 --- samples/bpf/test_overhead_raw_tp.bpf.c | 17 - samples/bpf/test_overhead_tp.bpf.c | 23 -- samples/bpf/test_overhead_user.c | 225 ------------ samples/bpf/test_override_return.sh | 16 - samples/bpf/test_probe_write_user.bpf.c | 52 --- samples/bpf/test_probe_write_user_user.c | 108 ------ samples/bpf/tracex7.bpf.c | 15 - samples/bpf/tracex7_user.c | 56 --- .../selftests/bpf/prog_tests/sock_create.c | 333 ++++++++++++++++++ 22 files changed, 333 insertions(+), 1940 deletions(-) delete mode 100644 samples/bpf/sock_flags.bpf.c delete mode 100644 samples/bpf/test_cgrp2_array_pin.c delete mode 100644 samples/bpf/test_cgrp2_attach.c delete mode 100644 samples/bpf/test_cgrp2_sock.c delete mode 100755 samples/bpf/test_cgrp2_sock.sh delete mode 100644 samples/bpf/test_cgrp2_sock2.c delete mode 100755 samples/bpf/test_cgrp2_sock2.sh delete mode 100644 samples/bpf/test_cgrp2_tc.bpf.c delete mode 100755 samples/bpf/test_cgrp2_tc.sh delete mode 100644 samples/bpf/test_current_task_under_cgroup.bpf.c delete mode 100644 samples/bpf/test_current_task_under_cgroup_user.c delete mode 100644 samples/bpf/test_overhead_kprobe.bpf.c delete mode 100644 samples/bpf/test_overhead_raw_tp.bpf.c delete mode 100644 samples/bpf/test_overhead_tp.bpf.c delete mode 100644 samples/bpf/test_overhead_user.c delete mode 100755 samples/bpf/test_override_return.sh delete mode 100644 samples/bpf/test_probe_write_user.bpf.c delete mode 100644 samples/bpf/test_probe_write_user_user.c delete mode 100644 samples/bpf/tracex7.bpf.c delete mode 100644 samples/bpf/tracex7_user.c create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_create.c -- 2.43.0

1 year, 2 months

2
5
0 0

[PATCH bpf-next v3 0/3] selftests/bpf: add coverage for xdp_features in test_progs

by Alexis Lothoré (eBPF Foundation)

Hello, this small series aims to increase coverage of xdp features in test_progs. The initial versions proposed to rework test_xdp_features.sh to make it fit in test_progs, but some discussions in v1 and v2 showed that the script is still needed as a standalone tool. So this new revision lets test_xdp_features.sh as-is, and rather adds missing coverage in existing test (cpu map). The new revision is now also a follow-up to the update performed by Florian Kauer in [1] for devmap programs testing. [1] https://lore.kernel.org/bpf/20240911-devel-koalo-fix-ingress-ifindex-v4-2-5… --- Changes in v3: - Drop xdp_features rework commit - update xdp_cpumap_attach to extend its coverage - Link to v2: https://lore.kernel.org/r/20240910-convert_xdp_tests-v2-1-a46367c9d038@boot… Changes in v2: - fix endianness management in userspace packet parsing (call htonl on constant rather than packet part) The new test has been run in a local x86 environment and in CI: #560/1 xdp_cpumap_attach/CPUMAP with programs in entries:OK #560/2 xdp_cpumap_attach/CPUMAP with frags programs in entries:OK #560/3 xdp_cpumap_attach/CPUMAP attach with programs in entries on veth:OK #560 xdp_cpumap_attach:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED --- Alexis Lothoré (eBPF Foundation) (3): selftests/bpf: fix bpf_map_redirect call for cpu map test selftests/bpf: make xdp_cpumap_attach keep redirect prog attached selftests/bpf: check program redirect in xdp_cpumap_attach .../selftests/bpf/prog_tests/xdp_cpumap_attach.c | 130 +++++++++++++++++++-- .../bpf/progs/test_xdp_with_cpumap_helpers.c | 7 +- 2 files changed, 129 insertions(+), 8 deletions(-) --- base-commit: 058d7c3d1691e2e4a4963716ec6c047dff778637 change-id: 20240730-convert_xdp_tests-ccd66bfe33db Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

1 year, 2 months

4
6
0 0

[RFC PATCH 0/3] Allow sk_lookup UDP return traffic to egress.

by Tiago Lam

Currently, sk_lookup allows an ebpf program to run on the ingress socket lookup path, and accept traffic not only on a range of addresses, but also on a range of ports. At Cloudflare we use sk_lookup for two main cases: 1. Sharing a single port between multiple services - i.e. two services (or more) use disjoint IP ranges but share the same port; 2. Receiving traffic on all ports - i.e. a service which accepts traffic on specific IP ranges but any port [1]. However, one main challenge we face while using sk_lookup for these use cases is how to source return UDP traffic: - On point 1. above, sometimes this range of addresses are not local (i.e. there's no local routes for these in the server), which means we need IP_TRANSPARENT set to be able to egress traffic from addresses we've received traffic on (or simply IP_FREEBIND in the case of IPv6); - And on point 2. above, allowing traffic to a range of ports means a service could get traffic on multiple ports, but currently there's no way to set the source UDP port egress traffic should be sourced from - it's possible to receive the original destination port using the IP_ORIGDSTADDR ancilliary message in recvmsg, but not set it in sendmsg. Both of these limitations can be worked around, but in a sub-optimal way. Using IP_TRANSPARENT, for instance, requires special privileges. And while one could use UDP connected sockets to send return traffic, creating a connected socket for each different address a UDP traffic is received on does have performance implications. Given sk_lookup allows services to accept traffic on a range of addresses or ports, it seems sensible to also allow return traffic to proceed through as well, without needing extra configurations / set ups. This patch set allows to do exactly this by performing a reverse socket lookup on the egress path - where it looks to see if the egress socket matches a socket in the attached sk_lookup ebpf program for the traffic that's being sent. If it does, traffic is allowed to proceed. The downsides to this is that this runs on the egress hot path, although this work tries to minimise its impact by only performing the reverse socket lookup when necessary. Further performance measurements are to be taken, but we're reaching out early for feedback to see what the technical concerns are and if we can address them. [1] https://blog.cloudflare.com/how-we-built-spectrum/ Suggested-by: Jakub Sitnicki <jakub(a)cloudflare.com> Signed-off-by: Tiago Lam <tiagolam(a)cloudflare.com> --- Tiago Lam (3): ipv4: Run a reverse sk_lookup on sendmsg. ipv6: Run a reverse sk_lookup on sendmsg. bpf: Add sk_lookup test to use ORIGDSTADDR cmsg. include/net/ip.h | 1 + net/ipv4/ip_sockglue.c | 11 ++++ net/ipv4/udp.c | 33 +++++++++- net/ipv6/datagram.c | 76 ++++++++++++++++++++++ net/ipv6/udp.c | 8 ++- tools/testing/selftests/bpf/prog_tests/sk_lookup.c | 70 +++++++++++++------- 6 files changed, 174 insertions(+), 25 deletions(-) --- base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63 change-id: 20240909-reverse-sk-lookup-f7bf36292bc4 Best regards, -- Tiago Lam <tiagolam(a)cloudflare.com>

1 year, 2 months

6
15
0 0

[PATCH] lib: Move KUnit tests into tests/ subdirectory

by Kees Cook

Following from the recent KUnit file naming discussion[1], move all KUnit tests in lib/ into lib/tests/. Link: https://lore.kernel.org/lkml/20240720165441.it.320-kees@kernel.org/ [1] Signed-off-by: Kees Cook <kees(a)kernel.org> --- I can carry this in the hardening tree. To disrupt people as little as possible, I'm hoping to send this either at the end of -rc1 or early in -rc2. Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Yury Norov <yury.norov(a)gmail.com> Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk> Cc: David Gow <davidgow(a)google.com> Cc: "Jason A. Donenfeld" <Jason(a)zx2c4.com> Cc: Andy Shevchenko <andy(a)kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao(a)linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy(a)intel.com> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Matti Vaittinen <mazziesaccount(a)gmail.com> Cc: linux-hardening(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: kunit-dev(a)googlegroups.com Cc: linux-trace-kernel(a)vger.kernel.org --- MAINTAINERS | 18 ++++++------- lib/Makefile | 35 +----------------------- lib/tests/Makefile | 37 ++++++++++++++++++++++++++ lib/{ => tests}/bitfield_kunit.c | 0 lib/{ => tests}/checksum_kunit.c | 0 lib/{ => tests}/cmdline_kunit.c | 0 lib/{ => tests}/cpumask_kunit.c | 0 lib/{ => tests}/fortify_kunit.c | 0 lib/{ => tests}/hashtable_test.c | 0 lib/{ => tests}/is_signed_type_kunit.c | 0 lib/{ => tests}/kunit_iov_iter.c | 0 lib/{ => tests}/list-test.c | 0 lib/{ => tests}/memcpy_kunit.c | 0 lib/{ => tests}/overflow_kunit.c | 0 lib/{ => tests}/siphash_kunit.c | 0 lib/{ => tests}/slub_kunit.c | 0 lib/{ => tests}/stackinit_kunit.c | 0 lib/{ => tests}/string_helpers_kunit.c | 0 lib/{ => tests}/string_kunit.c | 0 lib/{ => tests}/test_bits.c | 0 lib/{ => tests}/test_fprobe.c | 0 lib/{ => tests}/test_hash.c | 0 lib/{ => tests}/test_kprobes.c | 0 lib/{ => tests}/test_linear_ranges.c | 0 lib/{ => tests}/test_list_sort.c | 0 lib/{ => tests}/test_sort.c | 0 26 files changed, 47 insertions(+), 43 deletions(-) create mode 100644 lib/tests/Makefile rename lib/{ => tests}/bitfield_kunit.c (100%) rename lib/{ => tests}/checksum_kunit.c (100%) rename lib/{ => tests}/cmdline_kunit.c (100%) rename lib/{ => tests}/cpumask_kunit.c (100%) rename lib/{ => tests}/fortify_kunit.c (100%) rename lib/{ => tests}/hashtable_test.c (100%) rename lib/{ => tests}/is_signed_type_kunit.c (100%) rename lib/{ => tests}/kunit_iov_iter.c (100%) rename lib/{ => tests}/list-test.c (100%) rename lib/{ => tests}/memcpy_kunit.c (100%) rename lib/{ => tests}/overflow_kunit.c (100%) rename lib/{ => tests}/siphash_kunit.c (100%) rename lib/{ => tests}/slub_kunit.c (100%) rename lib/{ => tests}/stackinit_kunit.c (100%) rename lib/{ => tests}/string_helpers_kunit.c (100%) rename lib/{ => tests}/string_kunit.c (100%) rename lib/{ => tests}/test_bits.c (100%) rename lib/{ => tests}/test_fprobe.c (100%) rename lib/{ => tests}/test_hash.c (100%) rename lib/{ => tests}/test_kprobes.c (100%) rename lib/{ => tests}/test_linear_ranges.c (100%) rename lib/{ => tests}/test_list_sort.c (100%) rename lib/{ => tests}/test_sort.c (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 8754ac2c259d..3f4b9d007cbb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3737,10 +3737,10 @@ F: include/vdso/bits.h F: lib/bitmap-str.c F: lib/bitmap.c F: lib/cpumask.c -F: lib/cpumask_kunit.c F: lib/find_bit.c F: lib/find_bit_benchmark.c F: lib/test_bitmap.c +F: lib/tests/cpumask_kunit.c F: tools/include/linux/bitfield.h F: tools/include/linux/bitmap.h F: tools/include/linux/bits.h @@ -8618,9 +8618,9 @@ L: linux-hardening(a)vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/hardening F: include/linux/fortify-string.h -F: lib/fortify_kunit.c -F: lib/memcpy_kunit.c F: lib/test_fortify/* +F: lib/tests/fortify_kunit.c +F: lib/tests/memcpy_kunit.c F: scripts/test_fortify.sh K: \b__NO_FORTIFY\b @@ -9246,9 +9246,9 @@ F: include/linux/string.h F: include/linux/string_choices.h F: include/linux/string_helpers.h F: lib/string.c -F: lib/string_kunit.c F: lib/string_helpers.c -F: lib/string_helpers_kunit.c +F: lib/tests/string_helpers_kunit.c +F: lib/tests/string_kunit.c F: scripts/coccinelle/api/string_choices.cocci GENERIC UIO DRIVER FOR PCI DEVICES @@ -12347,7 +12347,7 @@ F: Documentation/trace/kprobes.rst F: include/asm-generic/kprobes.h F: include/linux/kprobes.h F: kernel/kprobes.c -F: lib/test_kprobes.c +F: lib/tests/test_kprobes.c F: samples/kprobes KS0108 LCD CONTROLLER DRIVER @@ -12697,7 +12697,7 @@ M: Mark Brown <broonie(a)kernel.org> R: Matti Vaittinen <mazziesaccount(a)gmail.com> F: include/linux/linear_range.h F: lib/linear_ranges.c -F: lib/test_linear_ranges.c +F: lib/tests/test_linear_ranges.c LINUX FOR POWER MACINTOSH L: linuxppc-dev(a)lists.ozlabs.org @@ -12824,7 +12824,7 @@ M: David Gow <davidgow(a)google.com> L: linux-kselftest(a)vger.kernel.org L: kunit-dev(a)googlegroups.com S: Maintained -F: lib/list-test.c +F: lib/tests/list-test.c LITEX PLATFORM M: Karol Gugala <kgugala(a)antmicro.com> @@ -20498,7 +20498,7 @@ M: Jason A. Donenfeld <Jason(a)zx2c4.com> S: Maintained F: include/linux/siphash.h F: lib/siphash.c -F: lib/siphash_kunit.c +F: lib/tests/siphash_kunit.c SIS 190 ETHERNET DRIVER M: Francois Romieu <romieu(a)fr.zoreil.com> diff --git a/lib/Makefile b/lib/Makefile index 3b1769045651..f00fe120ee9e 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -49,9 +49,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ percpu-refcount.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ generic-radix-tree.o bitmap-str.o -obj-$(CONFIG_STRING_KUNIT_TEST) += string_kunit.o obj-y += string_helpers.o -obj-$(CONFIG_STRING_HELPERS_KUNIT_TEST) += string_helpers_kunit.o obj-y += hexdump.o obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o obj-y += kstrtox.o @@ -62,22 +60,17 @@ obj-$(CONFIG_TEST_DHRY) += test_dhry.o obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o obj-$(CONFIG_TEST_BITOPS) += test_bitops.o CFLAGS_test_bitops.o += -Werror -obj-$(CONFIG_CPUMASK_KUNIT_TEST) += cpumask_kunit.o obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o -obj-$(CONFIG_TEST_IOV_ITER) += kunit_iov_iter.o -obj-$(CONFIG_HASH_KUNIT_TEST) += test_hash.o obj-$(CONFIG_TEST_IDA) += test_ida.o obj-$(CONFIG_TEST_UBSAN) += test_ubsan.o CFLAGS_test_ubsan.o += $(call cc-disable-warning, vla) CFLAGS_test_ubsan.o += $(call cc-disable-warning, unused-but-set-variable) UBSAN_SANITIZE_test_ubsan.o := y obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o -obj-$(CONFIG_TEST_LIST_SORT) += test_list_sort.o obj-$(CONFIG_TEST_MIN_HEAP) += test_min_heap.o obj-$(CONFIG_TEST_LKM) += test_module.o obj-$(CONFIG_TEST_VMALLOC) += test_vmalloc.o obj-$(CONFIG_TEST_RHASHTABLE) += test_rhashtable.o -obj-$(CONFIG_TEST_SORT) += test_sort.o obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o @@ -104,10 +97,7 @@ obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o obj-$(CONFIG_TEST_HMM) += test_hmm.o obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o -obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o -CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) -obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o obj-$(CONFIG_TEST_FPU) += test_fpu.o @@ -129,7 +119,7 @@ endif obj-$(CONFIG_DEBUG_INFO_REDUCED) += debug_info.o CFLAGS_debug_info.o += $(call cc-option, -femit-struct-debug-detailed=any) -obj-y += math/ crypto/ +obj-y += math/ crypto/ tests/ obj-$(CONFIG_GENERIC_IOMAP) += iomap.o obj-$(CONFIG_HAS_IOMEM) += iomap_copy.o devres.o @@ -366,29 +356,6 @@ obj-$(CONFIG_OBJAGG) += objagg.o # pldmfw library obj-$(CONFIG_PLDMFW) += pldmfw/ -# KUnit tests -CFLAGS_bitfield_kunit.o := $(DISABLE_STRUCTLEAK_PLUGIN) -obj-$(CONFIG_BITFIELD_KUNIT) += bitfield_kunit.o -obj-$(CONFIG_CHECKSUM_KUNIT) += checksum_kunit.o -obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o -obj-$(CONFIG_HASHTABLE_KUNIT_TEST) += hashtable_test.o -obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o -obj-$(CONFIG_BITS_TEST) += test_bits.o -obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o -obj-$(CONFIG_SLUB_KUNIT_TEST) += slub_kunit.o -obj-$(CONFIG_MEMCPY_KUNIT_TEST) += memcpy_kunit.o -obj-$(CONFIG_IS_SIGNED_TYPE_KUNIT_TEST) += is_signed_type_kunit.o -CFLAGS_overflow_kunit.o = $(call cc-disable-warning, tautological-constant-out-of-range-compare) -obj-$(CONFIG_OVERFLOW_KUNIT_TEST) += overflow_kunit.o -CFLAGS_stackinit_kunit.o += $(call cc-disable-warning, switch-unreachable) -obj-$(CONFIG_STACKINIT_KUNIT_TEST) += stackinit_kunit.o -CFLAGS_fortify_kunit.o += $(call cc-disable-warning, unsequenced) -CFLAGS_fortify_kunit.o += $(call cc-disable-warning, stringop-overread) -CFLAGS_fortify_kunit.o += $(call cc-disable-warning, stringop-truncation) -CFLAGS_fortify_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN) -obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o -obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o - obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o obj-$(CONFIG_FIRMWARE_TABLE) += fw_table.o diff --git a/lib/tests/Makefile b/lib/tests/Makefile new file mode 100644 index 000000000000..c6a14cc8663e --- /dev/null +++ b/lib/tests/Makefile @@ -0,0 +1,37 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for tests of kernel library functions. + +# KUnit tests +CFLAGS_bitfield_kunit.o := $(DISABLE_STRUCTLEAK_PLUGIN) +obj-$(CONFIG_BITFIELD_KUNIT) += bitfield_kunit.o +obj-$(CONFIG_BITS_TEST) += test_bits.o +obj-$(CONFIG_CHECKSUM_KUNIT) += checksum_kunit.o +obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o +obj-$(CONFIG_CPUMASK_KUNIT_TEST) += cpumask_kunit.o +CFLAGS_fortify_kunit.o += $(call cc-disable-warning, unsequenced) +CFLAGS_fortify_kunit.o += $(call cc-disable-warning, stringop-overread) +CFLAGS_fortify_kunit.o += $(call cc-disable-warning, stringop-truncation) +CFLAGS_fortify_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN) +obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o +CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) +obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o +obj-$(CONFIG_HASHTABLE_KUNIT_TEST) += hashtable_test.o +obj-$(CONFIG_HASH_KUNIT_TEST) += test_hash.o +obj-$(CONFIG_TEST_IOV_ITER) += kunit_iov_iter.o +obj-$(CONFIG_IS_SIGNED_TYPE_KUNIT_TEST) += is_signed_type_kunit.o +obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o +obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o +obj-$(CONFIG_TEST_LIST_SORT) += test_list_sort.o +obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o +obj-$(CONFIG_MEMCPY_KUNIT_TEST) += memcpy_kunit.o +CFLAGS_overflow_kunit.o = $(call cc-disable-warning, tautological-constant-out-of-range-compare) +obj-$(CONFIG_OVERFLOW_KUNIT_TEST) += overflow_kunit.o +obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o +obj-$(CONFIG_SLUB_KUNIT_TEST) += slub_kunit.o +obj-$(CONFIG_TEST_SORT) += test_sort.o +CFLAGS_stackinit_kunit.o += $(call cc-disable-warning, switch-unreachable) +obj-$(CONFIG_STACKINIT_KUNIT_TEST) += stackinit_kunit.o +obj-$(CONFIG_STRING_KUNIT_TEST) += string_kunit.o +obj-$(CONFIG_STRING_HELPERS_KUNIT_TEST) += string_helpers_kunit.o + diff --git a/lib/bitfield_kunit.c b/lib/tests/bitfield_kunit.c similarity index 100% rename from lib/bitfield_kunit.c rename to lib/tests/bitfield_kunit.c diff --git a/lib/checksum_kunit.c b/lib/tests/checksum_kunit.c similarity index 100% rename from lib/checksum_kunit.c rename to lib/tests/checksum_kunit.c diff --git a/lib/cmdline_kunit.c b/lib/tests/cmdline_kunit.c similarity index 100% rename from lib/cmdline_kunit.c rename to lib/tests/cmdline_kunit.c diff --git a/lib/cpumask_kunit.c b/lib/tests/cpumask_kunit.c similarity index 100% rename from lib/cpumask_kunit.c rename to lib/tests/cpumask_kunit.c diff --git a/lib/fortify_kunit.c b/lib/tests/fortify_kunit.c similarity index 100% rename from lib/fortify_kunit.c rename to lib/tests/fortify_kunit.c diff --git a/lib/hashtable_test.c b/lib/tests/hashtable_test.c similarity index 100% rename from lib/hashtable_test.c rename to lib/tests/hashtable_test.c diff --git a/lib/is_signed_type_kunit.c b/lib/tests/is_signed_type_kunit.c similarity index 100% rename from lib/is_signed_type_kunit.c rename to lib/tests/is_signed_type_kunit.c diff --git a/lib/kunit_iov_iter.c b/lib/tests/kunit_iov_iter.c similarity index 100% rename from lib/kunit_iov_iter.c rename to lib/tests/kunit_iov_iter.c diff --git a/lib/list-test.c b/lib/tests/list-test.c similarity index 100% rename from lib/list-test.c rename to lib/tests/list-test.c diff --git a/lib/memcpy_kunit.c b/lib/tests/memcpy_kunit.c similarity index 100% rename from lib/memcpy_kunit.c rename to lib/tests/memcpy_kunit.c diff --git a/lib/overflow_kunit.c b/lib/tests/overflow_kunit.c similarity index 100% rename from lib/overflow_kunit.c rename to lib/tests/overflow_kunit.c diff --git a/lib/siphash_kunit.c b/lib/tests/siphash_kunit.c similarity index 100% rename from lib/siphash_kunit.c rename to lib/tests/siphash_kunit.c diff --git a/lib/slub_kunit.c b/lib/tests/slub_kunit.c similarity index 100% rename from lib/slub_kunit.c rename to lib/tests/slub_kunit.c diff --git a/lib/stackinit_kunit.c b/lib/tests/stackinit_kunit.c similarity index 100% rename from lib/stackinit_kunit.c rename to lib/tests/stackinit_kunit.c diff --git a/lib/string_helpers_kunit.c b/lib/tests/string_helpers_kunit.c similarity index 100% rename from lib/string_helpers_kunit.c rename to lib/tests/string_helpers_kunit.c diff --git a/lib/string_kunit.c b/lib/tests/string_kunit.c similarity index 100% rename from lib/string_kunit.c rename to lib/tests/string_kunit.c diff --git a/lib/test_bits.c b/lib/tests/test_bits.c similarity index 100% rename from lib/test_bits.c rename to lib/tests/test_bits.c diff --git a/lib/test_fprobe.c b/lib/tests/test_fprobe.c similarity index 100% rename from lib/test_fprobe.c rename to lib/tests/test_fprobe.c diff --git a/lib/test_hash.c b/lib/tests/test_hash.c similarity index 100% rename from lib/test_hash.c rename to lib/tests/test_hash.c diff --git a/lib/test_kprobes.c b/lib/tests/test_kprobes.c similarity index 100% rename from lib/test_kprobes.c rename to lib/tests/test_kprobes.c diff --git a/lib/test_linear_ranges.c b/lib/tests/test_linear_ranges.c similarity index 100% rename from lib/test_linear_ranges.c rename to lib/tests/test_linear_ranges.c diff --git a/lib/test_list_sort.c b/lib/tests/test_list_sort.c similarity index 100% rename from lib/test_list_sort.c rename to lib/tests/test_list_sort.c diff --git a/lib/test_sort.c b/lib/tests/test_sort.c similarity index 100% rename from lib/test_sort.c rename to lib/tests/test_sort.c -- 2.34.1

1 year, 2 months

5
6
0 0

[PATCH v2 0/1] Add KUnit tests for kfifo

by Diego Vieira

Hi all, This is part of a hackathon organized by LKCAMP [1], focused on writing tests using KUnit. We reached out a while ago asking for advice on what would be a useful contribution [2] and ended up choosing data structures that did not yet have tests. This patch series depends on the patch that moves the KUnit tests on lib/ into lib/tests/ [3]. This patch adds tests for the kfifo data structure, defined in include/linux/kfifo.h, and is inspired by the KUnit tests for the doubly linked list in lib/tests/list-test.c (previously at lib/list-test.c) [4]. [1] https://lkcamp.dev/about/ [2] https://lore.kernel.org/all/Zktnt7rjKryTh9-N@arch/ [3] https://lore.kernel.org/all/20240720181025.work.002-kees@kernel.org/ [4] https://elixir.bootlin.com/linux/latest/source/lib/list-test.c --- Changes in v2: - Add MODULE_DESCRIPTION() - Move the tests from lib/kfifo-test.c to lib/tests/kfifo_kunit.c Diego Vieira (1): lib/tests/kfifo_kunit.c: add tests for the kfifo structure lib/Kconfig.debug | 14 +++ lib/tests/Makefile | 1 + lib/tests/kfifo_kunit.c | 224 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 239 insertions(+) create mode 100644 lib/tests/kfifo_kunit.c -- 2.34.1

1 year, 3 months

5
5
0 0

[PATCH,bpf-next v2 0/4] selftests/bpf: migrate and remove cgroup/tracing related tests

by Daniel T. Lee

The BPF testing framework has evolved significantly over time. However, some legacy tests in the samples/bpf directory have not kept up with these changes. These outdated tests can cause confusion and increase maintenance efforts. This patchset focuses on migrating outdated cgroup and tracing-related tests from samples/bpf to selftests/bpf, ensuring the BPF test suite remains current and efficient. Tests that are already covered by selftests/bpf are removed, while those not yet covered are migrated. This includes cgroup sock create tests for setting socket attributes and blocking socket creation, as well as the removal of redundant cgroup and tracing tests that have been replaced by newer tests. This patchset covers the following cgroup/tracing tests: - test_overhead: tests the overhead of BPF programs with task_rename, now covered by selftests and benchmark tests (rename-*). [1] - test_override_return: tests the return override functionality, now handled by kprobe_multi_override in selftests. - test_probe_write_user: tests the probe_write_user functionality, now replaced by the probe_user test in selftests. - test_cgrp2_sock: tests cgroup BPF's ability to set sk_bound_dev_if, mark, and priority during socket creation. Migrated to selftests as 'sock_create' since no existing tests fully cover this. - test_cgrp2_sock2: tests blocking socket creation for specific types (AF_INET{6}, SOCK_DGRAM, IPPROTO_ICMP{V6}). Migrated to selftests in 'sock_create' test for coverage. - test_current_task_under_cgroup: tests bpf_current_task_under_cgroup() to check if a task belongs to a cgroup. Already covered by task_under_cgroup at selftest and other cgroup ID tests. - test_cgrp2_tc: tests bpf_skb_under_cgroup() to filter packets based on cgroup. This behavior is now validated by cgroup_skb_sk_lookup, which uses bpf_skb_cgroup_id, making this test redundant. [1]: https://patchwork.kernel.org/cover/13759916 Daniel T. Lee (4): selftests/bpf: migrate cgroup sock create test for setting iface/mark/prio selftests/bpf: migrate cgroup sock create tests for prohibitig sockets samples/bpf: remove obsolete cgroup related tests samples/bpf: remove obsolete tracing related tests --- Changes in v2: - commit message fix samples/bpf/Makefile | 24 -- samples/bpf/sock_flags.bpf.c | 47 --- samples/bpf/test_cgrp2_array_pin.c | 106 ------ samples/bpf/test_cgrp2_attach.c | 177 ---------- samples/bpf/test_cgrp2_sock.c | 296 ---------------- samples/bpf/test_cgrp2_sock.sh | 137 ------- samples/bpf/test_cgrp2_sock2.c | 95 ----- samples/bpf/test_cgrp2_sock2.sh | 103 ------ samples/bpf/test_cgrp2_tc.bpf.c | 56 --- samples/bpf/test_cgrp2_tc.sh | 187 ---------- .../bpf/test_current_task_under_cgroup.bpf.c | 43 --- .../bpf/test_current_task_under_cgroup_user.c | 115 ------ samples/bpf/test_overhead_kprobe.bpf.c | 41 --- samples/bpf/test_overhead_raw_tp.bpf.c | 17 - samples/bpf/test_overhead_tp.bpf.c | 23 -- samples/bpf/test_overhead_user.c | 225 ------------ samples/bpf/test_override_return.sh | 16 - samples/bpf/test_probe_write_user.bpf.c | 52 --- samples/bpf/test_probe_write_user_user.c | 108 ------ samples/bpf/tracex7.bpf.c | 15 - samples/bpf/tracex7_user.c | 56 --- .../selftests/bpf/prog_tests/sock_create.c | 333 ++++++++++++++++++ 22 files changed, 333 insertions(+), 1939 deletions(-) delete mode 100644 samples/bpf/sock_flags.bpf.c delete mode 100644 samples/bpf/test_cgrp2_array_pin.c delete mode 100644 samples/bpf/test_cgrp2_attach.c delete mode 100644 samples/bpf/test_cgrp2_sock.c delete mode 100755 samples/bpf/test_cgrp2_sock.sh delete mode 100644 samples/bpf/test_cgrp2_sock2.c delete mode 100755 samples/bpf/test_cgrp2_sock2.sh delete mode 100644 samples/bpf/test_cgrp2_tc.bpf.c delete mode 100755 samples/bpf/test_cgrp2_tc.sh delete mode 100644 samples/bpf/test_current_task_under_cgroup.bpf.c delete mode 100644 samples/bpf/test_current_task_under_cgroup_user.c delete mode 100644 samples/bpf/test_overhead_kprobe.bpf.c delete mode 100644 samples/bpf/test_overhead_raw_tp.bpf.c delete mode 100644 samples/bpf/test_overhead_tp.bpf.c delete mode 100644 samples/bpf/test_overhead_user.c delete mode 100755 samples/bpf/test_override_return.sh delete mode 100644 samples/bpf/test_probe_write_user.bpf.c delete mode 100644 samples/bpf/test_probe_write_user_user.c delete mode 100644 samples/bpf/tracex7.bpf.c delete mode 100644 samples/bpf/tracex7_user.c create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_create.c -- 2.43.0

1 year, 3 months

1
4
0 0

[bpf-next 0/4] selftests/bpf: migrate and remove cgroup/tracing related tests

by Daniel T. Lee

The BPF testing framework has evolved significantly over time. However, some legacy tests in the samples/bpf directory have not kept up with these changes. These outdated tests can cause confusion and increase maintenance efforts. This patchset focuses on migrating outdated cgroup and tracing-related tests from samples/bpf to selftests/bpf, ensuring the BPF test suite remains current and efficient. Tests that are already covered by selftests/bpf are removed, while those not yet covered are migrated. This includes cgroup sock create tests for setting socket attributes and blocking socket creation, as well as the removal of redundant cgroup and tracing tests that have been replaced by newer tests. This patchset covers the following cgroup/tracing tests: - test_overhead: tests the overhead of BPF programs with task_rename, now covered by selftests and benchmark tests (rename-*). [1] - test_override_return: tests the return override functionality, now handled by kprobe_multi_override in selftests. - test_probe_write_user: tests the probe_write_user functionality, now replaced by the probe_user test in selftests. - test_cgrp2_sock: tests cgroup BPF's ability to set sk_bound_dev_if, mark, and priority during socket creation. Migrated to selftests as 'sock_create' since no existing tests fully cover this. - test_cgrp2_sock2: tests blocking socket creation for specific types (AF_INET{6}, SOCK_DGRAM, IPPROTO_ICMP{V6}). Migrated to selftests in 'sock_create' test for coverage. - test_current_task_under_cgroup: tests bpf_current_task_under_cgroup() to check if a task belongs to a cgroup. Already covered by task_under_cgroup at selftest and other cgroup ID tests. - test_cgrp2_tc: tests bpf_skb_under_cgroup() to filter packets based on cgroup. This behavior is now validated by cgroup_skb_sk_lookup, which uses bpf_skb_cgroup_id, making this test redundant. Daniel T. Lee (4): selftests/bpf: migrate cgroup sock create test for setting iface/mark/prio selftests/bpf: migrate sock create tests for prohibitig sockets samples/bpf: remove obsolete cgroup related tests samples/bpf: remove obsolete tracing related tests samples/bpf/Makefile | 24 -- samples/bpf/sock_flags.bpf.c | 47 --- samples/bpf/test_cgrp2_array_pin.c | 106 ------ samples/bpf/test_cgrp2_attach.c | 177 ---------- samples/bpf/test_cgrp2_sock.c | 296 ---------------- samples/bpf/test_cgrp2_sock.sh | 137 ------- samples/bpf/test_cgrp2_sock2.c | 95 ----- samples/bpf/test_cgrp2_sock2.sh | 103 ------ samples/bpf/test_cgrp2_tc.bpf.c | 56 --- samples/bpf/test_cgrp2_tc.sh | 187 ---------- .../bpf/test_current_task_under_cgroup.bpf.c | 43 --- .../bpf/test_current_task_under_cgroup_user.c | 115 ------ samples/bpf/test_overhead_kprobe.bpf.c | 41 --- samples/bpf/test_overhead_raw_tp.bpf.c | 17 - samples/bpf/test_overhead_tp.bpf.c | 23 -- samples/bpf/test_overhead_user.c | 225 ------------ samples/bpf/test_override_return.sh | 16 - samples/bpf/test_probe_write_user.bpf.c | 52 --- samples/bpf/test_probe_write_user_user.c | 108 ------ samples/bpf/tracex7.bpf.c | 15 - samples/bpf/tracex7_user.c | 56 --- .../selftests/bpf/prog_tests/sock_create.c | 333 ++++++++++++++++++ 22 files changed, 333 insertions(+), 1939 deletions(-) delete mode 100644 samples/bpf/sock_flags.bpf.c delete mode 100644 samples/bpf/test_cgrp2_array_pin.c delete mode 100644 samples/bpf/test_cgrp2_attach.c delete mode 100644 samples/bpf/test_cgrp2_sock.c delete mode 100755 samples/bpf/test_cgrp2_sock.sh delete mode 100644 samples/bpf/test_cgrp2_sock2.c delete mode 100755 samples/bpf/test_cgrp2_sock2.sh delete mode 100644 samples/bpf/test_cgrp2_tc.bpf.c delete mode 100755 samples/bpf/test_cgrp2_tc.sh delete mode 100644 samples/bpf/test_current_task_under_cgroup.bpf.c delete mode 100644 samples/bpf/test_current_task_under_cgroup_user.c delete mode 100644 samples/bpf/test_overhead_kprobe.bpf.c delete mode 100644 samples/bpf/test_overhead_raw_tp.bpf.c delete mode 100644 samples/bpf/test_overhead_tp.bpf.c delete mode 100644 samples/bpf/test_overhead_user.c delete mode 100755 samples/bpf/test_override_return.sh delete mode 100644 samples/bpf/test_probe_write_user.bpf.c delete mode 100644 samples/bpf/test_probe_write_user_user.c delete mode 100644 samples/bpf/tracex7.bpf.c delete mode 100644 samples/bpf/tracex7_user.c create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_create.c -- 2.43.0

1 year, 3 months

1
4
0 0

[PATCH] selftests/bpf: Removed redundant fd after close in bpf_prog_load_log_buf

by Zhu Jun

Removed unnecessary `fd = -1` assignments after closing file descriptors. because it will be assigned by the function bpf_prog_load().This improves code readability and removes redundant operations. Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com> --- tools/testing/selftests/bpf/prog_tests/log_buf.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/log_buf.c b/tools/testing/selftests/bpf/prog_tests/log_buf.c index 27676a04d0b6..169ce689b97c 100644 --- a/tools/testing/selftests/bpf/prog_tests/log_buf.c +++ b/tools/testing/selftests/bpf/prog_tests/log_buf.c @@ -169,7 +169,6 @@ static void bpf_prog_load_log_buf(void) ASSERT_GE(fd, 0, "good_fd1"); if (fd >= 0) close(fd); - fd = -1; /* log_level == 2 should always fill log_buf, even for good prog */ log_buf[0] = '\0'; @@ -180,7 +179,6 @@ static void bpf_prog_load_log_buf(void) ASSERT_GE(fd, 0, "good_fd2"); if (fd >= 0) close(fd); - fd = -1; /* log_level == 0 should fill log_buf for bad prog */ log_buf[0] = '\0'; @@ -191,7 +189,6 @@ static void bpf_prog_load_log_buf(void) ASSERT_LT(fd, 0, "bad_fd"); if (fd >= 0) close(fd); - fd = -1; free(log_buf); } -- 2.17.1

1 year, 3 months

2
1
0 0

[PATCH bpf v1] selftests/bpf: Fix cross-compiling urandom_read

by Tony Ambardar

Linking of urandom_read and liburandom_read.so prefers LLVM's 'ld.lld' but falls back to using 'ld' if unsupported. However, this fallback discards any existing makefile macro for LD and can break cross-compilation. Fix by changing the fallback to use the target linker $(LD), passed via '-fuse-ld=' using an absolute path rather than a linker "flavour". Fixes: 08c79c9cd67f ("selftests/bpf: Don't force lld on non-x86 architectures") Signed-off-by: Tony Ambardar <tony.ambardar(a)gmail.com> --- tools/testing/selftests/bpf/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 5e366f2fc02a..f2a0f912e038 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -263,7 +263,7 @@ $(OUTPUT)/%:%.c ifeq ($(SRCARCH),$(filter $(SRCARCH),x86 riscv)) LLD := lld else -LLD := ld +LLD := $(shell command -v $(LD)) endif # Filter out -static for liburandom_read.so and its dependent targets so that static builds -- 2.34.1

1 year, 3 months

2
1
0 0

[PATCH bpf v1 1/2] bpf: fix unpopulated path_size when uprobe_multi fields unset

by Tyrone Wu

Previously when retrieving `bpf_link_info.uprobe_multi` with `path` and `path_size` fields unset, the `path_size` field is not populated (remains 0). This behavior was inconsistent with how other input/output string buffer fields work, as the field should be populated in cases when: - both buffer and length are set (currently works as expected) - both buffer and length are unset (not working as expected) This patch now fills the `path_size` field when `path` and `path_size` are unset. Fixes: e56fdbfb06e2 ("bpf: Add link_info support for uprobe multi link") Signed-off-by: Tyrone Wu <wudevelops(a)gmail.com> --- kernel/trace/bpf_trace.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index a582cd25ca87..ba34e4f3fa8f 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3133,7 +3133,8 @@ static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link, struct bpf_uprobe_multi_link *umulti_link; u32 ucount = info->uprobe_multi.count; int err = 0, i; - long left; + char *p, *buf; + long left = 0; if (!upath ^ !upath_size) return -EINVAL; @@ -3147,26 +3148,24 @@ static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link, info->uprobe_multi.pid = umulti_link->task ? task_pid_nr_ns(umulti_link->task, task_active_pid_ns(current)) : 0; - if (upath) { - char *p, *buf; - - upath_size = min_t(u32, upath_size, PATH_MAX); + upath_size = upath_size ? min_t(u32, upath_size, PATH_MAX) : PATH_MAX; + buf = kmalloc(upath_size, GFP_KERNEL); + if (!buf) + return -ENOMEM; + p = d_path(&umulti_link->path, buf, upath_size); + if (IS_ERR(p)) { + kfree(buf); + return PTR_ERR(p); + } + upath_size = buf + upath_size - p; - buf = kmalloc(upath_size, GFP_KERNEL); - if (!buf) - return -ENOMEM; - p = d_path(&umulti_link->path, buf, upath_size); - if (IS_ERR(p)) { - kfree(buf); - return PTR_ERR(p); - } - upath_size = buf + upath_size - p; + if (upath) { left = copy_to_user(upath, p, upath_size); - kfree(buf); - if (left) - return -EFAULT; - info->uprobe_multi.path_size = upath_size; } + kfree(buf); + if (left) + return -EFAULT; + info->uprobe_multi.path_size = upath_size; if (!uoffsets && !ucookies && !uref_ctr_offsets) return 0; -- 2.43.0

1 year, 3 months

3
5
0 0

kselftest/fixes kselftest-seccomp: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc2-6-ge26e42b5679ed)

by kernelci.org bot

kselftest/fixes kselftest-seccomp: 2 runs, 1 regressions (linux_kselftest-fixes-6.12-rc2-6-ge26e42b5679ed) Regressions Summary ------------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/job/kselftest/branch/fixes/kernel/linux_kselftest… Test: kselftest-seccomp Tree: kselftest Branch: fixes Describe: linux_kselftest-fixes-6.12-rc2-6-ge26e42b5679ed URL: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git SHA: e26e42b5679edf8c1226970325366f962555e58f Test Regressions ---------------- platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+-------------+----------+---------------------+------------ meson-gxl-s905x-libretech-cc | arm64 | lab-broonie | gcc-12 | defconfig+kselftest | 1 Details: https://kernelci.org/test/plan/id/670866b6913f5044d0c86896 Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+kselftest Compiler: gcc-12 (aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0) Plain log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… HTML log: https://storage.kernelci.org//kselftest/fixes/linux_kselftest-fixes-6.12-rc… Rootfs: http://storage.kernelci.org/images/rootfs/debian/bookworm-kselftest/2024031… * kselftest-seccomp.login: https://kernelci.org/test/case/id/670866b6913f5044d0c86897 failing since 1 day (last pass: v6.12-rc1-5-g45a8897db67d4, first fail: linux_kselftest-fixes-6.12-rc2-4-g34d5b600172b)

1 year, 3 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror