Linux-kselftest-mirror

linux-kselftest-mirror@lists.linaro.org

141 participants
14285 discussions

[PATCH RFT v4 0/5] fork: Support shadow stacks in clone3()

by Mark Brown

The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zicfiss respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process. Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented. Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process in a similar manner to how the normal stack is specified, keeping the current implicit allocation behaviour if one is not specified either with clone3() or through the use of clone(). Unlike normal stacks only the shadow stack size is specified, similar issues to those that lead to the creation of map_shadow_stack() apply. Please note that the x86 portions of this code are build tested only, I don't appear to have a system that can run CET avaible to me, I have done testing with an integration into my pending work for GCS. There is some possibility that the arm64 implementation may require the use of clone3() and explicit userspace allocation of shadow stacks, this is still under discussion. A new architecture feature Kconfig option for shadow stacks is added as here, this was suggested as part of the review comments for the arm64 GCS series and since we need to detect if shadow stacks are supported it seemed sensible to roll it in here. [1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v4: - Formatting changes. - Use a define for minimum shadow stack size and move some basic validation to fork.c. - Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke… Changes in v3: - Rebase onto v6.7-rc2. - Remove stale shadow_stack in internal kargs. - If a shadow stack is specified unconditionally use it regardless of CLONE_ parameters. - Force enable shadow stacks in the selftest. - Update changelogs for RISC-V feature rename. - Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke… Changes in v2: - Rebase onto v6.7-rc1. - Remove ability to provide preallocated shadow stack, just specify the desired size. - Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke… --- Mark Brown (5): mm: Introduce ARCH_HAS_USER_SHADOW_STACK fork: Add shadow stack support to clone3() selftests/clone3: Factor more of main loop into test_clone3() selftests/clone3: Allow tests to flag if -E2BIG is a valid error code kselftest/clone3: Test shadow stack support arch/x86/Kconfig | 1 + arch/x86/include/asm/shstk.h | 11 +- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/shstk.c | 56 ++++-- fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 2 +- include/linux/sched/task.h | 1 + include/uapi/linux/sched.h | 4 + kernel/fork.c | 53 ++++-- mm/Kconfig | 6 + tools/testing/selftests/clone3/clone3.c | 200 +++++++++++++++++----- tools/testing/selftests/clone3/clone3_selftests.h | 7 + 12 files changed, 268 insertions(+), 77 deletions(-) --- base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263 change-id: 20231019-clone3-shadow-stack-15d40d2bf536 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years, 1 month

[PATCH v1 00/23] Enable FRED with KVM VMX

by Xin Li

This patch set enables the Intel flexible return and event delivery (FRED) architecture with KVM VMX to allow guests to utilize FRED. The FRED architecture defines simple new transitions that change privilege level (ring transitions). The FRED architecture was designed with the following goals: 1) Improve overall performance and response time by replacing event delivery through the interrupt descriptor table (IDT event delivery) and event return by the IRET instruction with lower latency transitions. 2) Improve software robustness by ensuring that event delivery establishes the full supervisor context and that event return establishes the full user context. The new transitions defined by the FRED architecture are FRED event delivery and, for returning from events, two FRED return instructions. FRED event delivery can effect a transition from ring 3 to ring 0, but it is used also to deliver events incident to ring 0. One FRED instruction (ERETU) effects a return from ring 0 to ring 3, while the other (ERETS) returns while remaining in ring 0. Collectively, FRED event delivery and the FRED return instructions are FRED transitions. Intel VMX architecture is extended to run FRED guests, and the changes are majorly: 1) New VMCS fields for FRED context management, which includes two new event data VMCS fields, eight new guest FRED context VMCS fields and eight new host FRED context VMCS fields. 2) VMX nested-Exception support for proper virtualization of stack levels introduced with FRED architecture. Search for the latest FRED spec in most search engines with this search pattern: site:intel.com FRED (flexible return and event delivery) specification We want to send out the FRED VMX patch set for review while the FRED native patch set v12 is being reviewed @ https://lkml.kernel.org/kvm/20231003062458.23552-1-xin3.li@intel.com/. For easier review, I have set up a base tree with the latest FRED native patch set on top of tip tree in the 'fred_v12' branch of repo https://github.com/xinli-intel/linux-fred-public.git. Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent out earlier as a preparation for FRED changes: https://lore.kernel.org/kvm/20231030233940.438233-1-xin@zytor.com/. Patch 3-14 add FRED support to VMX. Patch 15-18 add FRED support to nested VMX. Patch 19 exposes FRED to KVM guests to complete the enabling. Patch 20-23 adds FRED selftests. Shan Kang (1): KVM: selftests: Add fred exception tests Xin Li (22): KVM: VMX: Cleanup VMX basic information defines and usages KVM: VMX: Cleanup VMX misc information defines and usages KVM: VMX: Add support for the secondary VM exit controls KVM: x86: Mark CR4.FRED as not reserved KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID KVM: VMX: Disable intercepting FRED MSRs KVM: VMX: Initialize VMCS FRED fields KVM: VMX: Switch FRED RSP0 between host and guest KVM: VMX: Add support for FRED context save/restore KVM: x86: Add kvm_is_fred_enabled() KVM: VMX: Handle FRED event data KVM: VMX: Handle VMX nested exception for FRED KVM: VMX: Dump FRED context in dump_vmcs() KVM: nVMX: Add support for the secondary VM exit controls KVM: nVMX: Add FRED VMCS fields KVM: nVMX: Add support for VMX FRED controls KVM: nVMX: Add VMCS FRED states checking KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests KVM: selftests: Add FRED VMCS fields to evmcs KVM: selftests: Run debug_regs test with FRED enabled KVM: selftests: Add a new VM guest mode to run user level code Documentation/virt/kvm/x86/nested-vmx.rst | 19 + arch/x86/include/asm/hyperv-tlfs.h | 19 + arch/x86/include/asm/kvm_host.h | 9 +- arch/x86/include/asm/msr-index.h | 15 +- arch/x86/include/asm/vmx.h | 57 ++- arch/x86/kvm/cpuid.c | 4 +- arch/x86/kvm/kvm_cache_regs.h | 10 + arch/x86/kvm/svm/svm.c | 4 +- arch/x86/kvm/vmx/capabilities.h | 20 +- arch/x86/kvm/vmx/hyperv.c | 61 ++- arch/x86/kvm/vmx/nested.c | 315 ++++++++++++-- arch/x86/kvm/vmx/nested.h | 2 +- arch/x86/kvm/vmx/vmcs.h | 1 + arch/x86/kvm/vmx/vmcs12.c | 19 + arch/x86/kvm/vmx/vmcs12.h | 38 ++ arch/x86/kvm/vmx/vmcs_shadow_fields.h | 6 +- arch/x86/kvm/vmx/vmx.c | 404 ++++++++++++++++-- arch/x86/kvm/vmx/vmx.h | 14 +- arch/x86/kvm/x86.c | 55 ++- arch/x86/kvm/x86.h | 5 +- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/include/kvm_util_base.h | 1 + .../selftests/kvm/include/x86_64/evmcs.h | 146 +++++++ .../selftests/kvm/include/x86_64/processor.h | 33 ++ .../selftests/kvm/include/x86_64/vmx.h | 20 + tools/testing/selftests/kvm/lib/kvm_util.c | 5 +- .../selftests/kvm/lib/x86_64/processor.c | 15 +- tools/testing/selftests/kvm/lib/x86_64/vmx.c | 4 +- .../testing/selftests/kvm/x86_64/debug_regs.c | 50 ++- .../testing/selftests/kvm/x86_64/fred_test.c | 262 ++++++++++++ 30 files changed, 1464 insertions(+), 150 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c base-commit: d49b86c24e836941c85c4906e9519fca9426a6e0 -- 2.42.0

2 years, 1 month

[RFC PATCH v3 00/12] Device Memory TCP

by Mina Almasry

Changes in RFC v3: ------------------ 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Cc: Shakeel Butt <shakeelb(a)google.com> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Changes in RFC v2: ------------------ The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ---------------------- * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. Approach #1 was attempted in RFC v1. RFC v2 implements approach #2. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this RFC is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This RFC is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Jakub Kicinski (2): net: page_pool: factor out releasing DMA from releasing the page net: page_pool: create hooks for custom page providers Mina Almasry (10): net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator memory-provider: dmabuf devmem memory provider page-pool: device memory support net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages selftests: add ncdevmem, netcat for devmem TCP Documentation/netlink/specs/netdev.yaml | 28 ++ include/linux/netdevice.h | 93 ++++ include/linux/skbuff.h | 56 ++- include/linux/socket.h | 1 + include/net/netdev_rx_queue.h | 1 + include/net/page_pool/helpers.h | 151 ++++++- include/net/page_pool/types.h | 55 +++ include/net/sock.h | 2 + include/net/tcp.h | 5 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 10 + include/uapi/linux/uio.h | 10 + net/core/datagram.c | 6 + net/core/dev.c | 240 +++++++++++ net/core/gro.c | 7 +- net/core/netdev-genl-gen.c | 14 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 118 +++++ net/core/page_pool.c | 209 +++++++-- net/core/skbuff.c | 80 +++- net/core/sock.c | 36 ++ net/ipv4/tcp.c | 205 ++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 7 + net/ipv4/tcp_output.c | 5 +- net/packet/af_packet.c | 4 +- tools/include/uapi/linux/netdev.h | 10 + tools/net/ynl/generated/netdev-user.c | 42 ++ tools/net/ynl/generated/netdev-user.h | 47 ++ tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 5 + tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++ 32 files changed, 1950 insertions(+), 64 deletions(-) create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.42.0.869.gea05f2083d-goog

2 years, 1 month

[PATCH] kunit: test: Use an action wrapper instead of a cast

by David Gow

We missed one of the casts of kfree() to kunit_action_t in kunit-test, which was only enabled when debugfs was in use. This could potentially break CFI. Use the existing wrapper function instead. Signed-off-by: David Gow <davidgow(a)google.com> --- lib/kunit/kunit-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c index 3e9c5192d095..ee6927c60979 100644 --- a/lib/kunit/kunit-test.c +++ b/lib/kunit/kunit-test.c @@ -559,7 +559,7 @@ static void kunit_log_test(struct kunit *test) KUNIT_EXPECT_TRUE(test, test->log->append_newlines); full_log = string_stream_get_string(test->log); - kunit_add_action(test, (kunit_action_t *)kfree, full_log); + kunit_add_action(test, kfree_wrapper, full_log); KUNIT_EXPECT_NOT_ERR_OR_NULL(test, strstr(full_log, "put this in log.")); KUNIT_EXPECT_NOT_ERR_OR_NULL(test, -- 2.43.0.rc2.451.g8631bc7472-goog

2 years, 1 month

[PATCH v3 1/2] kunit: tool: fix parsing of test attributes

by Rae Moar

Add parsing of attributes as diagnostic data. Fixes issue with test plan being parsed incorrectly as diagnostic data when located after suite-level attributes. Note that if there does not exist a test plan line, the diagnostic lines between the suite header and the first result will be saved in the suite log rather than the first test case log. Signed-off-by: Rae Moar <rmoar(a)google.com> --- tools/testing/kunit/kunit_parser.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py index 79d8832c862a..ce34be15c929 100644 --- a/tools/testing/kunit/kunit_parser.py +++ b/tools/testing/kunit/kunit_parser.py @@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]: Log of diagnostic lines """ log = [] # type: List[str] - non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START] + non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN] while lines and not any(re.match(lines.peek()) for re in non_diagnostic_lines): log.append(lines.pop()) @@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest: # test plan test.name = "main" ktap_line = parse_ktap_header(lines, test) + test.log.extend(parse_diagnostic(lines)) parse_test_plan(lines, test) parent_test = True else: @@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest: if parent_test: # If KTAP version line and/or subtest header is found, attempt # to parse test plan and print test header + test.log.extend(parse_diagnostic(lines)) parse_test_plan(lines, test) print_test_header(test) expected_count = test.expected_count base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 -- 2.43.0.472.g3155946c3a-goog

2 years, 1 month

[PATCH] maple_tree: Fix mas_prev() state separation code

by Liam R. Howlett

mas_prev() was setting the ma_underflow condition when the limit was reached and not when the limit was attempting to go lower. This resulted in the incorrect behaviour on subsequent actions. This commit fixes the status setting to only set ma_underflow when the lower limit is attempted to be decremented, and modifies the testing to ensure that's the case. Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> --- Andrew, This should be clean to squash into 7f79fdb1d94d7 ("maple_tree: separate ma_state node from status.") which is currently in your mm-unstable branch. Thanks, Liam lib/maple_tree.c | 16 ++++++++++++---- lib/test_maple_tree.c | 9 +++++++-- 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 89f8d21602774..47f2a7a973852 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -4432,6 +4432,9 @@ static void *mas_prev_slot(struct ma_state *mas, unsigned long min, bool empty) mas->last = mas->index - 1; mas->index = mas_safe_min(mas, pivots, mas->offset); } else { + if (mas->index <= min) + goto underflow; + if (mas_prev_node(mas, min)) { mas_rewalk(mas, save_point); goto retry; @@ -4452,15 +4455,15 @@ static void *mas_prev_slot(struct ma_state *mas, unsigned long min, bool empty) if (unlikely(mas_rewalk_if_dead(mas, node, save_point))) goto retry; - if (mas->index <= min) - mas->status = ma_underflow; if (likely(entry)) return entry; if (!empty) { - if (mas_is_underflow(mas)) + if (mas->index <= min) { + mas->status = ma_underflow; return NULL; + } goto again; } @@ -4596,7 +4599,7 @@ static void *mas_next_slot(struct ma_state *mas, unsigned long max, bool empty) if (unlikely(mas_rewalk_if_dead(mas, node, save_point))) goto retry; - if (pivot >= max) { + if (pivot >= max) { /* Was at the limit, next will extend beyond */ mas->status = ma_overflow; return NULL; } @@ -4611,6 +4614,11 @@ static void *mas_next_slot(struct ma_state *mas, unsigned long max, bool empty) else mas->last = mas->max; } else { + if (mas->last >= max) { + mas->status = ma_overflow; + return NULL; + } + if (mas_next_node(mas, node, max)) { mas_rewalk(mas, save_point); goto retry; diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c index 15fbeb788f3ac..29185ac5c727f 100644 --- a/lib/test_maple_tree.c +++ b/lib/test_maple_tree.c @@ -3294,8 +3294,8 @@ static noinline void __init check_state_handling(struct maple_tree *mt) MT_BUG_ON(mt, mas.last != 0x1500); MT_BUG_ON(mt, !mas_is_active(&mas)); - /* next:active ->active */ - entry = mas_next(&mas, ULONG_MAX); + /* next:active ->active (spanning limit) */ + entry = mas_next(&mas, 0x2100); MT_BUG_ON(mt, entry != ptr2); MT_BUG_ON(mt, mas.index != 0x2000); MT_BUG_ON(mt, mas.last != 0x2500); @@ -3360,6 +3360,11 @@ static noinline void __init check_state_handling(struct maple_tree *mt) MT_BUG_ON(mt, entry != ptr); MT_BUG_ON(mt, mas.index != 0x1000); MT_BUG_ON(mt, mas.last != 0x1500); + MT_BUG_ON(mt, !mas_is_active(&mas)); /* spanning limit */ + entry = mas_prev(&mas, 0x1200); /* underflow */ + MT_BUG_ON(mt, entry != NULL); + MT_BUG_ON(mt, mas.index != 0x1000); + MT_BUG_ON(mt, mas.last != 0x1500); MT_BUG_ON(mt, !mas_is_underflow(&mas)); /* prev:underflow -> underflow (lower limit) spanning end range */ -- 2.40.1

2 years, 1 month

[PATCH] kselftest: dt: Stop relying on dirname to improve performance

by Nícolas F. R. A. Prado

When walking directory trees, instead of looking for specific files and running dirname to get the parent folder, traverse all folders and ignore the ones not containing the desired files. This avoids the need to call dirname inside the loop, which gives a big performance boost, approximately halving run time: Running locally on a mt8192-asurada-spherion, which reports 160 test cases, has gone from 5.5s to 2.9s, while running remotely with an nfsroot has gone from 13.5s to 5.5s. This change has a side-effect, which is that the root DT node now also shows in the output, even though it isn't expected to bind to a driver. However there shouldn't be a matching driver for the board compatible, so the end result will be just an extra skipped test: ok 1 / # SKIP Reported-by: Mark Brown <broonie(a)kernel.org> Closes: https://lore.kernel.org/all/310391e8-fdf2-4c2f-a680-7744eb685177@sirena.org… Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed Devicetree devices") Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com> --- tools/testing/selftests/dt/test_unprobed_devices.sh | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/dt/test_unprobed_devices.sh b/tools/testing/selftests/dt/test_unprobed_devices.sh index b07af2a4c4de..7fae90293a9d 100755 --- a/tools/testing/selftests/dt/test_unprobed_devices.sh +++ b/tools/testing/selftests/dt/test_unprobed_devices.sh @@ -33,8 +33,8 @@ if [[ ! -d "${PDT}" ]]; then fi nodes_compatible=$( - for node_compat in $(find ${PDT} -name compatible); do - node=$(dirname "${node_compat}") + for node in $(find ${PDT} -type d); do + [ ! -f "${node}"/compatible ] && continue # Check if node is available if [[ -e "${node}"/status ]]; then status=$(tr -d '\000' < "${node}"/status) @@ -46,10 +46,11 @@ nodes_compatible=$( nodes_dev_bound=$( IFS=$'\n' - for uevent in $(find /sys/devices -name uevent); do - if [[ -d "$(dirname "${uevent}")"/driver ]]; then - grep '^OF_FULLNAME=' "${uevent}" | sed -e 's|OF_FULLNAME=||' - fi + for dev_dir in $(find /sys/devices -type d); do + [ ! -f "${dev_dir}"/uevent ] && continue + [ ! -d "${dev_dir}"/driver ] && continue + + grep '^OF_FULLNAME=' "${dev_dir}"/uevent | sed -e 's|OF_FULLNAME=||' done ) -- 2.43.0

2 years, 1 month

[PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.

by Kyle Huey

The test sets a hardware breakpoint and uses a bpf program to suppress the side effects of a perf event sample, including I/O availability signals, SIGTRAPs, and decrementing the event counter limit, if the ip matches the expected value. Then the function with the breakpoint is executed multiple times to test that all effects behave as expected. Signed-off-by: Kyle Huey <khuey(a)kylehuey.com> --- .../selftests/bpf/prog_tests/perf_skip.c | 145 ++++++++++++++++++ .../selftests/bpf/progs/test_perf_skip.c | 15 ++ 2 files changed, 160 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c new file mode 100644 index 000000000000..f6fa9bfd9efa --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE + +/* We need the latest siginfo from the kernel repo. */ +#include <asm/siginfo.h> +#define __have_siginfo_t 1 +#define __have_sigval_t 1 +#define __have_sigevent_t 1 +#define __siginfo_t_defined +#define __sigval_t_defined +#define __sigevent_t_defined +#define _BITS_SIGINFO_CONSTS_H 1 +#define _BITS_SIGEVENT_CONSTS_H 1 + +#include <test_progs.h> +#include "test_perf_skip.skel.h" +#include <linux/compiler.h> +#include <linux/hw_breakpoint.h> +#include <sys/mman.h> + +int signals_unexpected = 1; +int sigio_count, sigtrap_count; + +static void handle_sigio(int sig __always_unused) +{ + ASSERT_OK(signals_unexpected, "perf event not skipped"); + ++sigio_count; +} + +static void handle_sigtrap(int signum __always_unused, + siginfo_t *info, + void *ucontext __always_unused) +{ + ASSERT_OK(signals_unexpected, "perf event not skipped"); + ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code"); + ++sigtrap_count; +} + +static noinline int test_function(void) +{ + asm volatile (""); + return 0; +} + +void serial_test_perf_skip(void) +{ + struct sigaction action = {}; + struct sigaction previous_sigtrap; + sighandler_t previous_sigio; + struct test_perf_skip *skel = NULL; + struct perf_event_attr attr = {}; + int perf_fd = -1; + int err; + struct f_owner_ex owner; + struct bpf_link *prog_link = NULL; + + action.sa_flags = SA_SIGINFO | SA_NODEFER; + action.sa_sigaction = handle_sigtrap; + sigemptyset(&action.sa_mask); + if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction")) + return; + + previous_sigio = signal(SIGIO, handle_sigio); + + skel = test_perf_skip__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_load")) + goto cleanup; + + attr.type = PERF_TYPE_BREAKPOINT; + attr.size = sizeof(attr); + attr.bp_type = HW_BREAKPOINT_X; + attr.bp_addr = (uintptr_t)test_function; + attr.bp_len = sizeof(long); + attr.sample_period = 1; + attr.sample_type = PERF_SAMPLE_IP; + attr.pinned = 1; + attr.exclude_kernel = 1; + attr.exclude_hv = 1; + attr.precise_ip = 3; + attr.sigtrap = 1; + attr.remove_on_exec = 1; + + perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0); + if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) { + printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n"); + test__skip(); + goto cleanup; + } + if (!ASSERT_OK(perf_fd < 0, "perf_event_open")) + goto cleanup; + + // Configure the perf event to signal on sample. + err = fcntl(perf_fd, F_SETFL, O_ASYNC); + if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)")) + goto cleanup; + + owner.type = F_OWNER_TID; + owner.pid = syscall(__NR_gettid); + err = fcntl(perf_fd, F_SETOWN_EX, &owner); + if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)")) + goto cleanup; + + // Allow at most one sample. A sample rejected by bpf should + // not count against this. + err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1); + if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)")) + goto cleanup; + + prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd); + if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event")) + goto cleanup; + + // Configure the bpf program to suppress the sample. + skel->bss->ip = (uintptr_t)test_function; + test_function(); + + ASSERT_EQ(sigio_count, 0, "sigio_count"); + ASSERT_EQ(sigtrap_count, 0, "sigtrap_count"); + + // Configure the bpf program to allow the sample. + skel->bss->ip = 0; + signals_unexpected = 0; + test_function(); + + ASSERT_EQ(sigio_count, 1, "sigio_count"); + ASSERT_EQ(sigtrap_count, 1, "sigtrap_count"); + + // Test that the sample above is the only one allowed (by perf, not + // by bpf) + test_function(); + + ASSERT_EQ(sigio_count, 1, "sigio_count"); + ASSERT_EQ(sigtrap_count, 1, "sigtrap_count"); + +cleanup: + if (prog_link) + bpf_link__destroy(prog_link); + if (perf_fd >= 0) + close(perf_fd); + if (skel) + test_perf_skip__destroy(skel); + + signal(SIGIO, previous_sigio); + sigaction(SIGTRAP, &previous_sigtrap, NULL); +} diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c new file mode 100644 index 000000000000..417a9db3b770 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +uintptr_t ip; + +SEC("perf_event") +int handler(struct bpf_perf_event_data *data) +{ + // Skip events that have the correct ip. + return ip != PT_REGS_IP(&data->regs); +} + +char _license[] SEC("license") = "GPL"; -- 2.34.1

2 years, 1 month

[PATCH v1] KVM: selftests: Fix Assertion on non-x86_64 platforms

by Shaoqin Huang

When running the set_memory_region_test on arm64 platform, it causes the below assert: ==== Test Assertion Failure ==== set_memory_region_test.c:355: r && errno == EINVAL pid=40695 tid=40695 errno=0 - Success 1 0x0000000000401baf: test_invalid_memory_region_flags at set_memory_region_test.c:355 2 (inlined by) main at set_memory_region_test.c:541 3 0x0000ffff951c879b: ?? ??:0 4 0x0000ffff951c886b: ?? ??:0 5 0x0000000000401caf: _start at ??:? KVM_SET_USER_MEMORY_REGION should have failed on v2 only flag 0x2 This is because the arm64 platform also support the KVM_MEM_READONLY flag, but the current implementation add it into the supportd_flags only on x86_64 platform, so this causes assert on other platform which also support the KVM_MEM_READONLY flag. Fix it by using the __KVM_HAVE_READONLY_MEM macro to detect if the current platform support the KVM_MEM_READONLY, thus fix this problem on all other platform which support KVM_MEM_READONLY. Fixes: 5d74316466f4 ("KVM: selftests: Add a memory region subtest to validate invalid flags") Signed-off-by: Shaoqin Huang <shahuang(a)redhat.com> --- This patch is based on the latest kvm-next[1] branch. [1] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next --- tools/testing/selftests/kvm/set_memory_region_test.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c index 6637a0845acf..1ce710fd7a5a 100644 --- a/tools/testing/selftests/kvm/set_memory_region_test.c +++ b/tools/testing/selftests/kvm/set_memory_region_test.c @@ -333,9 +333,11 @@ static void test_invalid_memory_region_flags(void) struct kvm_vm *vm; int r, i; -#ifdef __x86_64__ +#ifdef __KVM_HAVE_READONLY_MEM supported_flags |= KVM_MEM_READONLY; +#endif +#ifdef __x86_64__ if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM)) vm = vm_create_barebones_protected_vm(); else -- 2.40.1

2 years, 1 month

[PATCH v3 0/3] Add a test to catch unprobed Devicetree devices

by Nícolas F. R. A. Prado

Regressions that cause a device to no longer be probed by a driver can have a big impact on the platform's functionality, and despite being relatively common there isn't currently any generic test to detect them. As an example, bootrr [1] does test for device probe, but it requires defining the expected probed devices for each platform. Given that the Devicetree already provides a static description of devices on the system, it is a good basis for building such a test on top. This series introduces a test to catch regressions that prevent devices from probing. Patches 1 and 2 extend the existing dt-extract-compatibles to be able to output only the compatibles that can be expected to match a Devicetree node to a driver. Patch 2 adds a kselftest that walks over the Devicetree nodes on the current platform and compares the compatibles to the ones on the list, and on an ignore list, to point out devices that failed to be probed. A compatible list is needed because not all compatibles that can show up in a Devicetree node can be used to match to a driver, for example the code for that compatible might use "OF_DECLARE" type macros and avoid the driver framework, or the node might be controlled by a driver that was bound to a different node. An ignore list is needed for the few cases where it's common for a driver to match a device but not probe, like for the "simple-mfd" compatible, where the driver only probes if that compatible is the node's first compatible. The reason for parsing the kernel source instead of relying on information exposed by the kernel at runtime (say, looking at modaliases or introducing some other mechanism), is to be able to catch issues where a config was renamed or a driver moved across configs, and the .config used by the kernel not updated accordingly. We need to parse the source to find all compatibles present in the kernel independent of the current config being run. [1] https://github.com/kernelci/bootrr Changes in v3: - Added DT selftest path to MAINTAINERS - Enabled device probe test for nodes with 'status = "ok"' - Added pass/fail/skip totals to end of test output Changes in v2: - Extended dt-extract-compatibles script to be able to extract driver matching compatibles, instead of adding a new one in Coccinelle - Made kselftest output in the KTAP format Nícolas F. R. A. Prado (3): dt: dt-extract-compatibles: Handle cfile arguments in generator function dt: dt-extract-compatibles: Add flag for driver matching compatibles kselftest: Add new test for detecting unprobed Devicetree devices MAINTAINERS | 1 + scripts/dtc/dt-extract-compatibles | 74 +++++++++++++---- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/dt/.gitignore | 1 + tools/testing/selftests/dt/Makefile | 21 +++++ .../selftests/dt/compatible_ignore_list | 1 + tools/testing/selftests/dt/ktap_helpers.sh | 70 ++++++++++++++++ .../selftests/dt/test_unprobed_devices.sh | 83 +++++++++++++++++++ 8 files changed, 236 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/dt/.gitignore create mode 100644 tools/testing/selftests/dt/Makefile create mode 100644 tools/testing/selftests/dt/compatible_ignore_list create mode 100644 tools/testing/selftests/dt/ktap_helpers.sh create mode 100755 tools/testing/selftests/dt/test_unprobed_devices.sh -- 2.42.0

2 years, 1 month

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror