- Linux-kselftest-mirror - lists.linaro.org

[PATCH v3 0/6] binder: Set up KUnit tests for alloc

by Tiffany Yang

Hello, binder_alloc_selftest provides a robust set of checks for the binder allocator, but it rarely runs because it must hook into a running binder process and block all other binder threads until it completes. The test itself is a good candidate for conversion to KUnit, and it can be further isolated from user processes by using a test-specific lru freelist instead of the global one. This series converts the selftest to KUnit to make it less burdensome to run and to set up a foundation for unit testing future binder_alloc changes. Thanks, Tiffany Tiffany Yang (6): binder: Fix selftest page indexing binder: Store lru freelist in binder_alloc kunit: test: Export kunit_attach_mm() binder: Scaffolding for binder_alloc KUnit tests binder: Convert binder_alloc selftests to KUnit binder: encapsulate individual alloc test cases drivers/android/Kconfig | 15 +- drivers/android/Makefile | 2 +- drivers/android/binder.c | 10 +- drivers/android/binder_alloc.c | 39 +- drivers/android/binder_alloc.h | 14 +- drivers/android/binder_alloc_selftest.c | 306 ----------- drivers/android/binder_internal.h | 4 + drivers/android/tests/.kunitconfig | 3 + drivers/android/tests/Makefile | 3 + drivers/android/tests/binder_alloc_kunit.c | 573 +++++++++++++++++++++ include/kunit/test.h | 12 + lib/kunit/user_alloc.c | 4 +- 12 files changed, 645 insertions(+), 340 deletions(-) delete mode 100644 drivers/android/binder_alloc_selftest.c create mode 100644 drivers/android/tests/.kunitconfig create mode 100644 drivers/android/tests/Makefile create mode 100644 drivers/android/tests/binder_alloc_kunit.c -- 2.50.0.727.gbf7dc18ff4-goog

1 day, 18 hours

3
24
0 0

[PATCH v7 0/7] use per-vma locks for /proc/pid/maps reads

by Suren Baghdasaryan

Reading /proc/pid/maps requires read-locking mmap_lock which prevents any other task from concurrently modifying the address space. This guarantees coherent reporting of virtual address ranges, however it can block important updates from happening. Oftentimes /proc/pid/maps readers are low priority monitoring tasks and them blocking high priority tasks results in priority inversion. Locking the entire address space is required to present fully coherent picture of the address space, however even current implementation does not strictly guarantee that by outputting vmas in page-size chunks and dropping mmap_lock in between each chunk. Address space modifications are possible while mmap_lock is dropped and userspace reading the content is expected to deal with possible concurrent address space modifications. Considering these relaxed rules, holding mmap_lock is not strictly needed as long as we can guarantee that a concurrently modified vma is reported either in its original form or after it was modified. This patchset switches from holding mmap_lock while reading /proc/pid/maps to taking per-vma locks as we walk the vma tree. This reduces the contention with tasks modifying the address space because they would have to contend for the same vma as opposed to the entire address space. Previous version of this patchset [1] tried to perform /proc/pid/maps reading under RCU, however its implementation is quite complex and the results are worse than the new version because it still relied on mmap_lock speculation which retries if any part of the address space gets modified. New implementaion is both simpler and results in less contention. Note that similar approach would not work for /proc/pid/smaps reading as it also walks the page table and that's not RCU-safe. Paul McKenney's designed a test [2] to measure mmap/munmap latencies while concurrently reading /proc/pid/maps. The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10 second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. The map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears. The latest results from Paul: Stock mm-unstable, all of the runs had maximum latencies in excess of 0.5 milliseconds, and with 80% of the runs' latencies exceeding a full millisecond, and ranging up beyond 4 full milliseconds. In contrast, 99% of the runs with this patch series applied had maximum latencies of less than 0.5 milliseconds, with the single outlier at only 0.608 milliseconds. From a median-performance (as opposed to maximum-latency) viewpoint, this patch series also looks good, with stock mm weighing in at 11 microseconds and patch series at 6 microseconds, better than a 2x improvement. Before the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.011 0.008 0.521 0.011 0.008 0.552 0.011 0.008 0.590 0.011 0.008 0.660 ... 0.011 0.015 2.987 0.011 0.015 3.038 0.011 0.016 3.431 0.011 0.016 4.707 After the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.006 0.005 0.026 0.006 0.005 0.029 0.006 0.005 0.034 0.006 0.005 0.035 ... 0.006 0.006 0.421 0.006 0.006 0.423 0.006 0.006 0.439 0.006 0.006 0.608 The patchset also adds a number of tests to check for /proc/pid/maps data coherency. They are designed to detect any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. What is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's even more important now to test for unexpected inconsistencies. In [3] Lorenzo identified the following possible vma merging/splitting scenarios: Merges with changes to existing vmas: 1 Merge both - mapping a vma over another one and between two vmas which can be merged after this replacement; 2. Merge left full - mapping a vma at the end of an existing one and completely over its right neighbor; 3. Merge left partial - mapping a vma at the end of an existing one and partially over its right neighbor; 4. Merge right full - mapping a vma before the start of an existing one and completely over its left neighbor; 5. Merge right partial - mapping a vma before the start of an existing one and partially over its left neighbor; Merges without changes to existing vmas: 6. Merge both - mapping a vma into a gap between two vmas which can be merged after the insertion; 7. Merge left - mapping a vma at the end of an existing one; 8. Merge right - mapping a vma before the start end of an existing one; Splits 9. Split with new vma at the lower address; 10. Split with new vma at the higher address; If such merges or splits happen concurrently with the /proc/maps reading we might report a vma twice, once before the modification and once after it is modified: Case 1 might report overwritten and previous vma along with the final merged vma; Case 2 might report previous and the final merged vma; Case 3 might cause us to retry once we detect the temporary gap caused by shrinking of the right neighbor; Case 4 might report overritten and the final merged vma; Case 5 might cause us to retry once we detect the temporary gap caused by shrinking of the left neighbor; Case 6 might report previous vma and the gap along with the final marged vma; Case 7 might report previous and the final merged vma; Case 8 might report the original gap and the final merged vma covering the gap; Case 9 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma start; Case 10 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma end; In all these cases the retry mechanism prevents us from reporting possible temporary gaps. Changes since v6 [4]: - Updated patch 7/8 changelog, per Lorenzo Stoakes - Added comments, per Lorenzo Stoakes - Added Reviewed-by, per Lorenzo Stoakes and Liam Howlett - Replaced iter with vmi, per Lorenzo Stoakes - Renamed from lock_vma_under_mmap_lock() to lock_next_vma_under_mmap_lock(), per Lorenzo Stoakes - Renamed lock_next_vma() parameter from addr to from_addr - Renamed labels in lock_next_vma() to reflect fallback cases, per Lorenzo Stoakes - Handle vma_start_read_locked() failure inside lock_next_vma_under_mmap_lock() and added fallback_to_mmap_lock() for that, per Vlastimil Babka - Added missing vma_iter_init() after re-entering rcu read section inside lock_next_vma(), per Vlastimil Babka - Replaced vma_iter_init() with vma_iter_set(), per Liam Howlett - Removed the last patch converting PROCMAP_QUERY to use per-vma locks. That patch will be posted separately, per David Hildenbrand, Vlastimil Babka and Liam Howlett - Updated performance numbers, per Paul E. McKenney !!! NOTES FOR APPLYING THE PATCHSET !!! Applies cleanly over mm-unstable after reverting v6 version of this patchset (from 2771a4b86aa1 to a20b00f7cf33 in mm-unstable). [1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [2] https://github.com/paulmckrcu/proc-mmap_sem-test [3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.lo… [4] https://lore.kernel.org/all/20250704060727.724817-1-surenb@google.com/ Suren Baghdasaryan (7): selftests/proc: add /proc/pid/maps tearing from vma split test selftests/proc: extend /proc/pid/maps tearing test to include vma resizing selftests/proc: extend /proc/pid/maps tearing test to include vma remapping selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified selftests/proc: add verbose more for tests to facilitate debugging fs/proc/task_mmu: remove conversion of seq_file position to unsigned fs/proc/task_mmu: read proc/pid/maps under per-vma lock fs/proc/internal.h | 5 + fs/proc/task_mmu.c | 155 +++- include/linux/mmap_lock.h | 11 + mm/madvise.c | 3 +- mm/mmap_lock.c | 93 ++ tools/testing/selftests/proc/.gitignore | 1 + tools/testing/selftests/proc/Makefile | 1 + tools/testing/selftests/proc/proc-maps-race.c | 829 ++++++++++++++++++ 8 files changed, 1082 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/proc/proc-maps-race.c -- 2.50.0.727.gbf7dc18ff4-goog

1 day, 22 hours

5
19
0 0

[PATCH 0/2] bpf, arm64: relax constraint in BPF JIT compiler

by Alexis Lothoré (eBPF Foundation)

Hello, this series follows up on the one introducing 9+ args for tracing programs [1]. It has been observed with this series that there are cases for which we can not identify accurately the location of the target function arguments to prepare correctly the corresponding BPF trampoline. This is the case for example if: - the function consumes a struct variable _by value_ - it is passed on the stack (no more register available for it) - it has some __packed__ or __aligned(X)__ attribute As a consequence, a small restrictive check has been added to the ARM64 side, highlighting that other arch supporting 9+ args in BPF trampolines are already suffering from the same issue. After a bit of discussions and attempts, the chosen solution is, rather than applying the same constraint to all JIT compilers, to prevent such function from being encoded at all in BTF info([2]). As the pahole side is closed to be integrated, we can now remove the restrictive check from kernel side. [1] https://lore.kernel.org/bpf/20250527-many_args_arm64-v3-0-3faf7bb8e4a2@boot… [2] https://lore.kernel.org/bpf/20250707-btf_skip_structs_on_stack-v3-0-29569e0… Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (2): bpf, arm64: remove structs on stack constraint selftests/bpf: enable tracing_struct tests for arm64 arch/arm64/net/bpf_jit_comp.c | 5 ----- tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 - 2 files changed, 6 deletions(-) --- base-commit: 8da1e37fc84868b50ba6a7cdf082aa3b0d11e006 change-id: 20250708-arm64_relax_jit_comp-e8889647d8d2 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

1 day, 22 hours

5
8
0 0

[PATCH net-next v7 0/3] selftest: net: Add selftest for netpoll

by Breno Leitao

I am submitting a new selftest for the netpoll subsystem specifically targeting the case where the RX is polling in the TX path, which is a case that we don't have any test in the tree today. This is done when netpoll_poll_dev() called, and this test creates a scenario when that is probably. The test does the following: 1) Configuring a single RX/TX queue to increase contention on the interface. 2) Generating background traffic to saturate the network, mimicking real-world congestion. 3) Sending netconsole messages to trigger netpoll polling and monitor its behavior. 4) Using dynamic netconsole targets via configfs, with the ability to delete and recreate targets during the test. 5) Running bpftrace in parallel to verify that netpoll_poll_dev() is called when expected. If it is called, then the test passes, otherwise the test is marked as skipped. In order to achieve it, I stole Jakub's bpftrace helper from [1], and did some small changes that I found useful to use the helper. So, this patchset basically contains: 1) The code stolen from Jakub 2) Improvements on bpftrace() helper 3) The selftest itself Link: https://lore.kernel.org/all/20250421222827.283737-22-kuba@kernel.org/ [1] --- Changes in v7: - Rebased on top of net-next - Using `ethtool -l` json option instead of parsing it manually. - Link to v6: https://lore.kernel.org/r/20250711-netpoll_test-v6-0-130465f286a8@debian.org Changes in v6: - Remove the network toggled (Jakub) - Set ringsize and queue size (Jakub) - Some other general improvements (Jakub) - Link to v5: https://lore.kernel.org/r/20250709-netpoll_test-v5-0-b3737895affe@debian.org Changes in v5: - Rebased on top of net-next. - Calling bpftrace_stop using the defer helper. (Willem) - Link to v4: https://lore.kernel.org/r/20250702-netpoll_test-v4-0-cec227e85639@debian.org Changes in v4: - Make the test XFail if it doesn't hit the function we are looking for - Toggle the interface while the traffic is flowing. - Bumped the number of messages from 10 to 40 per iterations. * This is hitting ~15 times per run on my vng test. - Decreased the time from 15 seconds to 10 seconds, given that if it didn't hit the function in 10 seconds, 5 seconds extra will not help. - Link to v3: https://lore.kernel.org/r/20250627-netpoll_test-v3-0-575bd200c8a9@debian.org Changes in v3: - Make pylint happy (Simon) - Remove the unnecessary patch in bpftrace to raise an exception when it fails. (Jakub) - Improved the bpftrace code (Willem) - Stop sending messages if bpftrace is not alive anymore. - Link to v2: https://lore.kernel.org/r/20250625-netpoll_test-v2-0-47d27775222c@debian.org Changes in v2: - Stole Jakub's helper to run bpftrace - Removed the DEBUG option and moved logs to logging - Change the code to have a higher chance of calling netpoll_poll_dev(). In my current configuration, it is hitting multiple times during the test. - Save and restore TX/RX queue size (Jakub) - Link to v1: https://lore.kernel.org/r/20250620-netpoll_test-v1-1-5068832f72fc@debian.org --- Breno Leitao (2): selftests: drv-net: Strip '@' prefix from bpftrace map keys selftests: net: add netpoll basic functionality test Jakub Kicinski (1): selftests: drv-net: add helper/wrapper for bpftrace tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/py/__init__.py | 4 +- .../testing/selftests/drivers/net/netpoll_basic.py | 396 +++++++++++++++++++++ tools/testing/selftests/net/lib/py/utils.py | 35 ++ 4 files changed, 434 insertions(+), 2 deletions(-) --- base-commit: b06c4311711c57c5e558bd29824b08f0a6e2a155 change-id: 20250612-netpoll_test-a1324d2057c8 Best regards, -- Breno Leitao <leitao(a)debian.org>

1 day, 23 hours

2
4
0 0

[PATCH net] selftests: rtnetlink: fix addrlft test flakiness on power-saving systems

by Hangbin Liu

Jakub reported that the rtnetlink test for the preferred lifetime of an address has become quite flaky. The issue started appearing around the 6.16 merge window in May, and the test fails with: FAIL: preferred_lft addresses remaining The flakiness might be related to power-saving behavior, as address expiration is handled by a "power-efficient" workqueue. To address this, use slowwait to check more frequently whether the address still exists. This reduces the likelihood of the system entering a low-power state during the test, improving reliability. Reported-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com> --- tools/testing/selftests/net/rtnetlink.sh | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 2e8243a65b50..49141254065c 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -291,6 +291,17 @@ kci_test_route_get() end_test "PASS: route get" } +check_addr_not_exist() +{ + dev=$1 + addr=$2 + if ip addr show dev $dev | grep -q $addr; then + return 1 + else + return 0 + fi +} + kci_test_addrlft() { for i in $(seq 10 100) ;do @@ -298,9 +309,8 @@ kci_test_addrlft() run_cmd ip addr add 10.23.11.$i/32 dev "$devdummy" preferred_lft $lft valid_lft $((lft+1)) done - sleep 5 - run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy" - if [ $? -eq 0 ]; then + slowwait 5 check_addr_not_exist "$devdummy" "10.23.11." + if [ $? -eq 1 ]; then check_err 1 end_test "FAIL: preferred_lft addresses remaining" return -- 2.46.0

2 days, 2 hours

2
1
0 0

[PATCH v3 00/10] mm/mremap: permit mremap() move of multiple VMAs

by Lorenzo Stoakes

Historically we've made it a uAPI requirement that mremap() may only operate on a single VMA at a time. For instances where VMAs need to be resized, this makes sense, as it becomes very difficult to determine what a user actually wants should they indicate a desire to expand or shrink the size of multiple VMAs (truncate? Adjust sizes individually? Some other strategy?). However, in instances where a user is moving VMAs, it is restrictive to disallow this. This is especially the case when anonymous mapping remap may or may not be mergeable depending on whether VMAs have or have not been faulted due to anon_vma assignment and folio index alignment with vma->vm_pgoff. Often this can result in surprising impact where a moved region is faulted, then moved back and a user fails to observe a merge from otherwise compatible, adjacent VMAs. This change allows such cases to work without the user having to be cognizant of whether a prior mremap() move or other VMA operations has resulted in VMA fragmentation. In order to do this, this series performs a large amount of refactoring, most pertinently - grouping sanity checks together, separately those that check input parameters and those relating to VMAs. we also simplify the post-mmap lock drop processing for uffd and mlock()'d VMAs. With this done, we can then fairly straightforwardly implement this functionality. This works exclusively for mremap() invocations which specify MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the notification of the userland fault handler would require us to drop the mmap lock. It is also not compatible with file-backed mappings with customised get_unmapped_area() handlers as these may not honour MREMAP_FIXED. The input and output addresses ranges must not overlap. We carefully account for moves which would result in VMA iterator invalidation. While there can be gaps between VMAs in the input range, there can be no gap before the first VMA in the range. v3: * Disallowed move operation except for MREMAP_FIXED. * Disallow gap at start of aggregate range to avoid confusion. * Disallow any file-baked VMAs with custom get_unmapped_area. * Renamed multi_vma to seen_vma to be clearer. Stop reusing new_addr, use separate target_addr var to track next target address. * Check if first VMA fails multi VMA check, if so we'll allow one VMA but not multiple. * Updated the commit message for patch 9 to be clearer about gap behaviour. * Removed accidentally included debug goto statement in test (doh!). Test was and is passing regardless. * Unmap target range in test, previously we ended up moving additional VMAs unintentionally. This still all passed :) but was not what was intended. * Removed self-merge check - there is absolutely no way this can happen across multiple VMAs, as there is no means of moving VMAs such that a VMA merges with itself. v2: * Squashed uffd stub fix into series. * Propagated tags, thanks! * Fixed param naming in patch 4 as per Vlastimil. * Renamed vma_reset to vmi_needs_reset + dropped reset on unmap as per Liam. * Correctly return -EFAULT if no VMAs in input range. * Account for get_unmapped_area() disregarding MAP_FIXED and returning an altered address. * Added additional explanatatory comment to the remap_move() function. https://lore.kernel.org/all/cover.1751865330.git.lorenzo.stoakes@oracle.com/ v1: https://lore.kernel.org/all/cover.1751865330.git.lorenzo.stoakes@oracle.com/ Lorenzo Stoakes (10): mm/mremap: perform some simple cleanups mm/mremap: refactor initial parameter sanity checks mm/mremap: put VMA check and prep logic into helper function mm/mremap: cleanup post-processing stage of mremap mm/mremap: use an explicit uffd failure path for mremap mm/mremap: check remap conditions earlier mm/mremap: move remap_is_valid() into check_prep_vma() mm/mremap: clean up mlock populate behaviour mm/mremap: permit mremap() move of multiple VMAs tools/testing/selftests: extend mremap_test to test multi-VMA mremap fs/userfaultfd.c | 15 +- include/linux/userfaultfd_k.h | 5 + mm/mremap.c | 553 +++++++++++++++-------- tools/testing/selftests/mm/mremap_test.c | 146 +++++- 4 files changed, 518 insertions(+), 201 deletions(-) -- 2.50.0

2 days, 4 hours

2
15
0 0

[PATCH net 0/2] bonding: fix LACP negotiation issues in passive mode

by Hangbin Liu

This patchset fixes an issue where bonding fails to establish a stable LACP negotiation when operating in passive mode (lacp_active=off). In passive mode, the current implementation only replies when the partner's state changes, which results in LACP timeout and unstable aggregator formation. With this change, the bond responds to each received LACPDU in passive mode by setting ntt = true, ensuring timely replies and stable LACP negotiation. Hangbin Liu (2): bonding: update ntt to true in passive mode selftests: bonding: add test for passive LACP mode drivers/net/bonding/bond_3ad.c | 6 ++ .../drivers/net/bonding/bond_passive_lacp.sh | 21 +++++ .../drivers/net/bonding/bond_topo_lacp.sh | 77 +++++++++++++++++++ 3 files changed, 104 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh -- 2.46.0

2 days, 6 hours

3
7
0 0

[PATCH v2 4/4] selftests/rseq: Add test for mm_cid compaction

by Gabriele Monaco

A task in the kernel (task_mm_cid_work) runs somewhat periodically to compact the mm_cid for each process. Add a test to validate that it runs correctly and timely. The test spawns 1 thread pinned to each CPU, then each thread, including the main one, runs in short bursts for some time. During this period, the mm_cids should be spanning all numbers between 0 and nproc. At the end of this phase, a thread with high enough mm_cid (>= nproc/2) is selected to be the new leader, all other threads terminate. After some time, the only running thread should see 0 as mm_cid, if that doesn't happen, the compaction mechanism didn't work and the test fails. The test never fails if only 1 core is available, in which case, we cannot test anything as the only available mm_cid is 0. Acked-by: Shuah Khan <skhan(a)linuxfoundation.org> Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com> --- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../selftests/rseq/mm_cid_compaction_test.c | 204 ++++++++++++++++++ 3 files changed, 206 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore index 0fda241fa62b0..b3920c59bf401 100644 --- a/tools/testing/selftests/rseq/.gitignore +++ b/tools/testing/selftests/rseq/.gitignore @@ -3,6 +3,7 @@ basic_percpu_ops_test basic_percpu_ops_mm_cid_test basic_test basic_rseq_op_test +mm_cid_compaction_test param_test param_test_benchmark param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile index 0d0a5fae59547..bc4d940f66d40 100644 --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -17,7 +17,7 @@ OVERRIDE_TARGETS = 1 TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \ - syscall_errors_test + syscall_errors_test mm_cid_compaction_test TEST_GEN_PROGS_EXTENDED = librseq.so diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c new file mode 100644 index 0000000000000..d13623625f5a9 --- /dev/null +++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "../kselftest.h" +#include "rseq.h" + +#define VERBOSE 0 +#define printf_verbose(fmt, ...) \ + do { \ + if (VERBOSE) \ + printf(fmt, ##__VA_ARGS__); \ + } while (0) + +/* 50 ms */ +#define RUNNER_PERIOD 50000 +/* + * Number of runs before we terminate or get the token. + * The number is slowly increasing with the number of CPUs as the compaction + * process can take longer on larger systems. This is an arbitrary value. + */ +#define THREAD_RUNS (3 + args->num_cpus/8) + +/* + * Number of times we check that the mm_cid were compacted. + * Checks are repeated every RUNNER_PERIOD. + */ +#define MM_CID_COMPACT_TIMEOUT 10 + +struct thread_args { + int cpu; + int num_cpus; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + pthread_t *tinfo; + struct thread_args *args_head; +}; + +static void __noreturn *thread_runner(void *arg) +{ + struct thread_args *args = arg; + int i, ret, curr_mm_cid; + cpu_set_t cpumask; + + CPU_ZERO(&cpumask); + CPU_SET(args->cpu, &cpumask); + ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask); + if (ret) { + errno = ret; + perror("Error: failed to set affinity"); + abort(); + } + pthread_barrier_wait(args->barrier); + + for (i = 0; i < THREAD_RUNS; i++) + usleep(RUNNER_PERIOD); + curr_mm_cid = rseq_current_mm_cid(); + /* + * We select one thread with high enough mm_cid to be the new leader. + * All other threads (including the main thread) will terminate. + * After some time, the mm_cid of the only remaining thread should + * converge to 0, if not, the test fails. + */ + if (curr_mm_cid >= args->num_cpus / 2 && + !pthread_mutex_trylock(args->token)) { + printf_verbose( + "cpu%d has mm_cid=%d and will be the new leader.\n", + sched_getcpu(), curr_mm_cid); + for (i = 0; i < args->num_cpus; i++) { + if (args->tinfo[i] == pthread_self()) + continue; + ret = pthread_join(args->tinfo[i], NULL); + if (ret) { + errno = ret; + perror("Error: failed to join thread"); + abort(); + } + } + pthread_barrier_destroy(args->barrier); + free(args->tinfo); + free(args->token); + free(args->barrier); + free(args->args_head); + + for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) { + curr_mm_cid = rseq_current_mm_cid(); + printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i, + curr_mm_cid, sched_getcpu()); + if (curr_mm_cid == 0) + exit(EXIT_SUCCESS); + usleep(RUNNER_PERIOD); + } + exit(EXIT_FAILURE); + } + printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n", + sched_getcpu(), curr_mm_cid); + pthread_exit(NULL); +} + +int test_mm_cid_compaction(void) +{ + cpu_set_t affinity; + int i, j, ret = 0, num_threads; + pthread_t *tinfo; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + struct thread_args *args; + + sched_getaffinity(0, sizeof(affinity), &affinity); + num_threads = CPU_COUNT(&affinity); + tinfo = calloc(num_threads, sizeof(*tinfo)); + if (!tinfo) { + perror("Error: failed to allocate tinfo"); + return -1; + } + args = calloc(num_threads, sizeof(*args)); + if (!args) { + perror("Error: failed to allocate args"); + ret = -1; + goto out_free_tinfo; + } + token = malloc(sizeof(*token)); + if (!token) { + perror("Error: failed to allocate token"); + ret = -1; + goto out_free_args; + } + barrier = malloc(sizeof(*barrier)); + if (!barrier) { + perror("Error: failed to allocate barrier"); + ret = -1; + goto out_free_token; + } + if (num_threads == 1) { + fprintf(stderr, "Cannot test on a single cpu. " + "Skipping mm_cid_compaction test.\n"); + /* only skipping the test, this is not a failure */ + goto out_free_barrier; + } + pthread_mutex_init(token, NULL); + ret = pthread_barrier_init(barrier, NULL, num_threads); + if (ret) { + errno = ret; + perror("Error: failed to initialise barrier"); + goto out_free_barrier; + } + for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) { + if (!CPU_ISSET(i, &affinity)) + continue; + args[j].num_cpus = num_threads; + args[j].tinfo = tinfo; + args[j].token = token; + args[j].barrier = barrier; + args[j].cpu = i; + args[j].args_head = args; + if (!j) { + /* The first thread is the main one */ + tinfo[0] = pthread_self(); + ++j; + continue; + } + ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]); + if (ret) { + errno = ret; + perror("Error: failed to create thread"); + abort(); + } + ++j; + } + printf_verbose("Started %d threads.\n", num_threads); + + /* Also main thread will terminate if it is not selected as leader */ + thread_runner(&args[0]); + + /* only reached in case of errors */ +out_free_barrier: + free(barrier); +out_free_token: + free(token); +out_free_args: + free(args); +out_free_tinfo: + free(tinfo); + + return ret; +} + +int main(int argc, char **argv) +{ + if (!rseq_mm_cid_available()) { + fprintf(stderr, "Error: rseq_mm_cid unavailable\n"); + return -1; + } + if (test_mm_cid_compaction()) + return -1; + return 0; +} -- 2.50.1

2 days, 8 hours

1
0
0 0

[PATCH net-next V5 0/5] net: netdevsim: hook in XDP handling

by Mohsin Bashir

This patch series add tests to validate XDP native support for PASS, DROP, ABORT, and TX actions, as well as headroom and tailroom adjustment. For adjustment tests, validate support for both the extension and shrinking cases across various packet sizes and offset values. The pass criteria for head/tail adjustment tests require that at-least one adjustment value works for at-least one packet size. This ensure that the variability in maximum supported head/tail adjustment offset across different drivers is being incorporated. The results reported in this series are based on fbnic. However, the series is tested against multiple other drivers including netdevism. Note: The XDP support for fbnic will be added later. --- Change-log: V5: - Fix warning caused by rcu_dereference() in p1 - Fix checkpatch warnings with P3, P4, and P5 V4: https://lore.kernel.org/netdev/20250714210352.1115230-1-mohsin.bashr@gmail.… V3: https://lore.kernel.org/netdev/20250712002648.2385849-1-mohsin.bashr@gmail.… V2: https://lore.kernel.org/netdev/20250710184351.63797-1-mohsin.bashr@gmail.com V1: https://lore.kernel.org/netdev/20250709173707.3177206-1-mohsin.bashr@gmail.… Jakub Kicinski (1): net: netdevsim: hook in XDP handling Mohsin Bashir (4): selftests: drv-net: Test XDP_PASS/DROP support selftests: drv-net: Test XDP_TX support selftests: drv-net: Test tail-adjustment support selftests: drv-net: Test head-adjustment support drivers/net/netdevsim/netdev.c | 19 +- tools/testing/selftests/drivers/net/Makefile | 1 + tools/testing/selftests/drivers/net/xdp.py | 656 ++++++++++++++++++ .../selftests/net/lib/xdp_native.bpf.c | 540 ++++++++++++++ 4 files changed, 1215 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/drivers/net/xdp.py create mode 100644 tools/testing/selftests/net/lib/xdp_native.bpf.c -- 2.47.1

2 days, 9 hours

2
6
0 0

[PATCH v6 0/8] use per-vma locks for /proc/pid/maps reads and PROCMAP_QUERY

by Suren Baghdasaryan

Reading /proc/pid/maps requires read-locking mmap_lock which prevents any other task from concurrently modifying the address space. This guarantees coherent reporting of virtual address ranges, however it can block important updates from happening. Oftentimes /proc/pid/maps readers are low priority monitoring tasks and them blocking high priority tasks results in priority inversion. Locking the entire address space is required to present fully coherent picture of the address space, however even current implementation does not strictly guarantee that by outputting vmas in page-size chunks and dropping mmap_lock in between each chunk. Address space modifications are possible while mmap_lock is dropped and userspace reading the content is expected to deal with possible concurrent address space modifications. Considering these relaxed rules, holding mmap_lock is not strictly needed as long as we can guarantee that a concurrently modified vma is reported either in its original form or after it was modified. This patchset switches from holding mmap_lock while reading /proc/pid/maps to taking per-vma locks as we walk the vma tree. This reduces the contention with tasks modifying the address space because they would have to contend for the same vma as opposed to the entire address space. Same is done for PROCMAP_QUERY ioctl which locks only the vma that fell into the requested range instead of the entire address space. Previous version of this patchset [1] tried to perform /proc/pid/maps reading under RCU, however its implementation is quite complex and the results are worse than the new version because it still relied on mmap_lock speculation which retries if any part of the address space gets modified. New implementaion is both simpler and results in less contention. Note that similar approach would not work for /proc/pid/smaps reading as it also walks the page table and that's not RCU-safe. Paul McKenney's designed a test [2] to measure mmap/munmap latencies while concurrently reading /proc/pid/maps. The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10 second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. The map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears. This test registered close to 10x improvement in update latencies: Before the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.011 0.008 0.455 0.011 0.008 0.472 0.011 0.008 0.535 0.011 0.009 0.545 ... 0.011 0.014 2.875 0.011 0.014 2.913 0.011 0.014 3.007 0.011 0.015 3.018 After the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.006 0.005 0.036 0.006 0.005 0.039 0.006 0.005 0.039 0.006 0.005 0.039 ... 0.006 0.006 0.403 0.006 0.006 0.474 0.006 0.006 0.479 0.006 0.006 0.498 The patchset also adds a number of tests to check for /proc/pid/maps data coherency. They are designed to detect any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. What is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's even more important now to test for unexpected inconsistencies. In [3] Lorenzo identified the following possible vma merging/splitting scenarios: Merges with changes to existing vmas: 1 Merge both - mapping a vma over another one and between two vmas which can be merged after this replacement; 2. Merge left full - mapping a vma at the end of an existing one and completely over its right neighbor; 3. Merge left partial - mapping a vma at the end of an existing one and partially over its right neighbor; 4. Merge right full - mapping a vma before the start of an existing one and completely over its left neighbor; 5. Merge right partial - mapping a vma before the start of an existing one and partially over its left neighbor; Merges without changes to existing vmas: 6. Merge both - mapping a vma into a gap between two vmas which can be merged after the insertion; 7. Merge left - mapping a vma at the end of an existing one; 8. Merge right - mapping a vma before the start end of an existing one; Splits 9. Split with new vma at the lower address; 10. Split with new vma at the higher address; If such merges or splits happen concurrently with the /proc/maps reading we might report a vma twice, once before the modification and once after it is modified: Case 1 might report overwritten and previous vma along with the final merged vma; Case 2 might report previous and the final merged vma; Case 3 might cause us to retry once we detect the temporary gap caused by shrinking of the right neighbor; Case 4 might report overritten and the final merged vma; Case 5 might cause us to retry once we detect the temporary gap caused by shrinking of the left neighbor; Case 6 might report previous vma and the gap along with the final marged vma; Case 7 might report previous and the final merged vma; Case 8 might report the original gap and the final merged vma covering the gap; Case 9 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma start; Case 10 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma end; In all these cases the retry mechanism prevents us from reporting possible temporary gaps. Changes since v5 [4]: - Made /proc/pid/maps tearing test a separate selftest, per Alexey Dobriyan - Changed asserts with or'ed conditions into separate ones, per Alexey Dobriyan - Added a small cleanup patch [6/8] to avoid unnecessary seq_file position type casting - Removed unnecessary is_sentinel_pos() helper - Changed titles to use fs/proc/task_mmu instead of mm/maps prefix, per David Hildenbrand - Included Lorenzo's fix for mmap lock assertion in anon_vma_name() - Reworked the last patch to avoid allocation in the rcu read section, which replaces Jeongjun Park's fix !!! NOTES FOR APPLYING THE PATCHSET !!! Applies cleanly over mm-unstable after reverting old version with fixes. The following patches should be reverted before applyng this patchset: b33ce1be8a40 ("selftests/proc: add /proc/pid/maps tearing from vma split test") b538e0580fd6 ("selftests/proc: extend /proc/pid/maps tearing test to include vma resizing") 4996b4409cc6 ("selftests/proc: extend /proc/pid/maps tearing test to include vma remapping") c39471f78d5e ("selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified") 487570f548f3 ("selftests/proc: add verbose more for tests to facilitate debugging") e1ba4969cba1 ("mm/maps: read proc/pid/maps under per-vma lock") ecb110179e77 ("mm/madvise: fixup stray mmap lock assert in anon_vma_name()") 6772c457a865 ("fs/proc/task_mmu:: execute PROCMAP_QUERY ioctl under per-vma locks") d5c67bb2c5fb ("mm/maps: move kmalloc() call location in do_procmap_query() out of RCU critical section") [1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [2] https://github.com/paulmckrcu/proc-mmap_sem-test [3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.lo… [4] https://lore.kernel.org/all/20250624193359.3865351-1-surenb@google.com/ Suren Baghdasaryan (8): selftests/proc: add /proc/pid/maps tearing from vma split test selftests/proc: extend /proc/pid/maps tearing test to include vma resizing selftests/proc: extend /proc/pid/maps tearing test to include vma remapping selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified selftests/proc: add verbose more for tests to facilitate debugging fs/proc/task_mmu: remove conversion of seq_file position to unsigned fs/proc/task_mmu: read proc/pid/maps under per-vma lock fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks fs/proc/internal.h | 5 + fs/proc/task_mmu.c | 188 +++- include/linux/mmap_lock.h | 11 + mm/madvise.c | 3 +- mm/mmap_lock.c | 88 ++ tools/testing/selftests/proc/.gitignore | 1 + tools/testing/selftests/proc/Makefile | 1 + tools/testing/selftests/proc/proc-maps-race.c | 829 ++++++++++++++++++ 8 files changed, 1098 insertions(+), 28 deletions(-) create mode 100644 tools/testing/selftests/proc/proc-maps-race.c -- 2.50.0.727.gbf7dc18ff4-goog

2 days, 10 hours

6
46
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror