- Linux-kselftest-mirror - lists.linaro.org

[PATCH v6 0/7] Buddy allocator like (or non-uniform) folio split

by Zi Yan

Hi all, This patchset adds a new buddy allocator like (or non-uniform) large folio split to reduce the total number of after-split folios, the amount of memory needed for multi-index xarray split, and keep more large folios after a split. It is on top of mm-everything-2025-02-01-05-58. It is ready to be merged. Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). xfstests quick group passed for both tmpfs and xfs. Changelog === From V5[7]: 1. Split shmem to any lower order patches are in mm tree, so dropped from this series. 2. Rename split_folio_at() to try_folio_split() to clarify that non-uniform split will not be used if it is not supported. From V4[6]: 1. Enabled shmem support in both uniform and buddy allocator like split and added selftests for it. 2. Added functions to check if uniform split and buddy allocator like split are supported for the given folio and order. 3. Made truncate fall back to uniform split if buddy allocator split is not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio). 4. Added the missing folio_clear_has_hwpoisoned() to __split_unmapped_folio(). From V3[5]: 1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra operations inside xas_split_alloc() are needed for correctness. 2. Enabled folio_split() for shmem and no issue was found with xfstests quick test group. 3. Split both ends of a truncate range in truncate_inode_partial_folio() to avoid wasting memory in shmem truncate (per David Hildenbrand). 4. Removed page_in_folio_offset() since page_folio() does the same thing. 5. Finished truncate related tests from xfstests quick test group on XFS and tmpfs without issues. 6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio. This check was missed in the prior versions. From V2[3]: 1. Incorporated all the feedback from Kirill[4]. 2. Used GFP_NOWAIT for xas_nomem(). 3. Tested the code path when xas_nomem() fails. 4. Added selftests for folio_split(). 5. Fixed no THP config build error. From V1[2]: 1. Split the original patch 1 into multiple ones for easy review (per Kirill). 2. Added xas_destroy() to avoid memory leak. 3. Fixed nr_dropped not used error (per kernel test robot). 4. Added proper error handling when xas_nomem() fails to allocate memory for xas_split() during buddy allocator like split. From RFC[1]: 1. Merged backend code of split_huge_page_to_list_to_order() and folio_split(). The same code is used for both uniform split and buddy allocator like split. 2. Use xas_nomem() instead of xas_split_alloc() for folio_split(). 3. folio_split() now leaves the first after-split folio unlocked, instead of the one containing the given page, since the caller of truncate_inode_partial_folio() locks and unlocks the first folio. 4. Extended split_huge_page debugfs to use folio_split(). 5. Added truncate_inode_partial_folio() as first user of folio_split(). Design === folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache Since anon folio does not support order-1 yet. The split process is similar to existing approach: 1. Unmap all page mappings (split PMD mappings if exist); 2. Split meta data like memcg, page owner, page alloc tag; 3. Copy meta data in struct folio to sub pages, but instead of spliting the whole folio into multiple smaller ones with the same order in a shot, this approach splits the folio iteratively. Taking the example above, this approach first splits the original order-9 into two order-8, then splits left part of order-8 to two order-7 and so on; 4. Post-process split folios, like write mapping->i_pages for pagecache, adjust folio refcounts, add split folios to corresponding list; 5. Remap split folios 6. Unlock split folios. __split_unmapped_folio() and __split_folio_to_order() replace __split_huge_page() and __split_huge_page_tail() respectively. __split_unmapped_folio() uses different approaches to perform uniform split and buddy allocator like split: 1. uniform split: one single call to __split_folio_to_order() is used to uniformly split the given folio. All resulting folios are put back to the list after split. The folio containing the given page is left to caller to unlock and others are unlocked. 2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls to __split_folio_to_order() are used to split the given folio at order N to order N-1. After each call, the target folio is changed to the one containing the page, which is given as a folio_split() parameter. After each call, folios not containing the page are put back to the list. The folio containing the page is put back to the list when its order is new_order. All folios are unlocked except the first folio, which is left to caller to unlock. Patch Overview === 1. Patch 1 added __split_unmapped_folio() and __split_folio_to_order() to prepare for moving to new backend split code. 2. Patch 2 moved common code in split_huge_page_to_list_to_order() to __folio_split(). 3. Patch 3 added new folio_split() and made split_huge_page_to_list_to_order() share the new __split_unmapped_folio() with folio_split(). 4. Patch 4 removed no longer used __split_huge_page() and __split_huge_page_tail(). 5. Patch 5 added a new in_folio_offset to split_huge_page debugfs for folio_split() test. 6. Patch 6 used try_folio_split() for truncate operation. 7. Patch 7 added folio_split() tests. Any comments and/or suggestions are welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/ [3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/ [4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc… [5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/ [6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/ [7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/ Zi Yan (7): mm/huge_memory: add two new (not yet used) functions for folio_split() mm/huge_memory: move folio split common code to __folio_split() mm/huge_memory: add buddy allocator like folio_split() mm/huge_memory: remove the old, unused __split_huge_page() mm/huge_memory: add folio_split() to debugfs testing interface. mm/truncate: use buddy allocator like folio split for truncate operation. selftests/mm: add tests for folio_split(), buddy allocator like split. include/linux/huge_mm.h | 36 + mm/huge_memory.c | 749 ++++++++++++------ mm/truncate.c | 31 +- .../selftests/mm/split_huge_page_test.c | 34 +- 4 files changed, 582 insertions(+), 268 deletions(-) -- 2.47.2

4 months, 3 weeks

3
12
0 0

[PATCH v4 0/2] scanf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and printf), the rest having been converted to KUnit. In addition to the enclosed patch, please consider this an RFC on the removal of the "Test Module" kselftest machinery. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 scanf Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v4: - Bake `test` into various macros, greatly reducing diff noise. - Revert control flow changes. - Link to v3: https://lore.kernel.org/r/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@gm… Changes in v3: - Reduce diff noise in lib/Makefile. (Petr Mladek) - Split `scanf_test` into a few test cases. New output: : =================== scanf (10 subtests) ==================== : [PASSED] numbers_simple : ====================== numbers_list ======================= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ================== [PASSED] numbers_list =================== : ============ numbers_list_field_width_typemax ============= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======== [PASSED] numbers_list_field_width_typemax ========= : =========== numbers_list_field_width_val_width ============ : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======= [PASSED] numbers_list_field_width_val_width ======== : [PASSED] numbers_slice : [PASSED] numbers_prefix_overflow : [PASSED] test_simple_strtoull : [PASSED] test_simple_strtoll : [PASSED] test_simple_strtoul : [PASSED] test_simple_strtol : ====================== [PASSED] scanf ====================== : ============================================================ : Testing complete. Ran 22 tests: passed: 22 : Elapsed time: 5.517s total, 0.001s configuring, 5.440s building, 0.067s running - Link to v2: https://lore.kernel.org/r/20250203-scanf-kunit-convert-v2-1-277a618d804e@gm… Changes in v2: - Rename lib/{test_scanf.c => scanf_kunit.c}. (Andy Shevchenko) - Link to v1: https://lore.kernel.org/r/20250131-scanf-kunit-convert-v1-1-0976524f0eba@gm… --- Tamir Duberstein (2): scanf: convert self-test to KUnit scanf: break kunit into test cases MAINTAINERS | 2 +- arch/m68k/configs/amiga_defconfig | 1 - arch/m68k/configs/apollo_defconfig | 1 - arch/m68k/configs/atari_defconfig | 1 - arch/m68k/configs/bvme6000_defconfig | 1 - arch/m68k/configs/hp300_defconfig | 1 - arch/m68k/configs/mac_defconfig | 1 - arch/m68k/configs/multi_defconfig | 1 - arch/m68k/configs/mvme147_defconfig | 1 - arch/m68k/configs/mvme16x_defconfig | 1 - arch/m68k/configs/q40_defconfig | 1 - arch/m68k/configs/sun3_defconfig | 1 - arch/m68k/configs/sun3x_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig | 1 - lib/Kconfig.debug | 20 ++- lib/Makefile | 2 +- lib/{test_scanf.c => scanf_kunit.c} | 260 +++++++++++++++++------------------ tools/testing/selftests/lib/Makefile | 2 +- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/scanf.sh | 4 - 20 files changed, 149 insertions(+), 155 deletions(-) --- base-commit: a86bf2283d2c9769205407e2b54777c03d012939 change-id: 20250131-scanf-kunit-convert-f70dc33bb34c Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

4 months, 3 weeks

1
3
0 0

[PATCH net-next v18 00/25] Introducing OpenVPN Data Channel Offload

by Antonio Quartulli

Notable changes since v17: * fixed netdevice_tracker pointer assignment in netlink post_doit (triggered by kernel test robot on m86k) * renamed nla_get_uint() to ovpn_nla_get_uint() in ovpn-cli.c to avoid clashes with libnl-3.11.0 FTR, here are the notable changes since v16: * fixed usage of netdev tracker by removing dev_tracker member from ovpn_priv and adding it to ovpn_peer and ovpn_socket as those are the objects really holding a ref to the netdev * switched ovpn_get_dev_from_attrs() to GFP_ATOMIC to prevent sleep under rcu_read_lock * allocated netdevice_tracker in ovpn_nl_pre_doit() [stored in user_ptr[1]] to keep track of the netdev reference held during netlink handler calls * moved whole socket detaching routine to worker. This way the code is allowed to sleep and in turn it can be executed under lock_sock. This lock allows us to happily coordinate concurrent attach/detach calls. (note: lock is acquired everytime the refcnt for the socket is decremented, because this guarantees us that setting the refcnt to 0 and detaching the socket will happen atomically) * dropped kref_put_sock()/refcount handler as it's not required anymore, thanks to the point above * re-arranged ovpn_socket_new() in order to simplify error path by first allocating the new ovpn_sock and then attaching Please note that some patches were already reviewed/tested by a few people. iThese patches have retained the tags as they have hardly been touched. The latest code can also be found at: https://github.com/OpenVPN/linux-kernel-ovpn Thanks a lot! Best Regards, Antonio Quartulli OpenVPN Inc. --- Antonio Quartulli (25): net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on for MP interfaces ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ipv6: export inet6_stream_ops via EXPORT_SYMBOL_GPL ovpn: implement TCP transport skb: implement skb_send_sock_locked_with_flags() ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local UDP endpoint ovpn: add support for peer floating ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module Documentation/netlink/specs/ovpn.yaml | 372 +++ Documentation/netlink/specs/rt_link.yaml | 16 + MAINTAINERS | 11 + drivers/net/Kconfig | 15 + drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c | 55 + drivers/net/ovpn/bind.h | 101 + drivers/net/ovpn/crypto.c | 211 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 382 ++++ drivers/net/ovpn/crypto_aead.h | 33 + drivers/net/ovpn/io.c | 446 ++++ drivers/net/ovpn/io.h | 34 + drivers/net/ovpn/main.c | 350 +++ drivers/net/ovpn/main.h | 14 + drivers/net/ovpn/netlink-gen.c | 213 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1183 ++++++++++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnstruct.h | 54 + drivers/net/ovpn/peer.c | 1269 +++++++++++ drivers/net/ovpn/peer.h | 164 ++ drivers/net/ovpn/pktid.c | 129 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 118 + drivers/net/ovpn/skb.h | 60 + drivers/net/ovpn/socket.c | 204 ++ drivers/net/ovpn/socket.h | 49 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 565 +++++ drivers/net/ovpn/tcp.h | 33 + drivers/net/ovpn/udp.c | 421 ++++ drivers/net/ovpn/udp.h | 22 + include/linux/skbuff.h | 2 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 111 + include/uapi/linux/udp.h | 1 + net/core/skbuff.c | 18 +- net/ipv6/af_inet6.c | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 17 + tools/testing/selftests/net/ovpn/config | 10 + tools/testing/selftests/net/ovpn/data64.key | 5 + tools/testing/selftests/net/ovpn/ovpn-cli.c | 2367 ++++++++++++++++++++ tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + tools/testing/selftests/net/ovpn/test-float.sh | 9 + tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/test.sh | 185 ++ tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + 53 files changed, 9673 insertions(+), 5 deletions(-) --- base-commit: 7d0da8f862340c5f42f0062b8560b8d0971a6ac4 change-id: 20241002-b4-ovpn-eeee35c694a2 Best regards, -- Antonio Quartulli <antonio(a)openvpn.net>

4 months, 3 weeks

3
70
0 0

[PATCHv2 net 0/2] bonding: fix incorrect mac address setting

by Hangbin Liu

The mac address on backup slave should be convert from Solicited-Node Multicast address, not from bonding unicast target address. v2: fix patch 01's subject Hangbin Liu (2): bonding: fix incorrect MAC address setting to receive NS messages selftests: bonding: fix incorrect mac address drivers/net/bonding/bond_options.c | 4 +++- tools/testing/selftests/drivers/net/bonding/bond_options.sh | 4 ++-- 2 files changed, 5 insertions(+), 3 deletions(-) -- 2.46.0

4 months, 3 weeks

2
4
0 0

[PATCH bpf-next v1] selftests/bpf: correct the check of join cgroup

by Jason Xing

Use ASSERT_OK_FD to check the return value of join cgroup, or else this test will pass even if the fd < 0. ASSERT_OK_FD can print the error message to the console. Link: https://lore.kernel.org/all/6d62bd77-6733-40c7-b240-a1aeff55566c@linux.dev/ Suggested-by: Martin KaFai Lau <martin.lau(a)kernel.org> Signed-off-by: Jason Xing <kerneljasonxing(a)gmail.com> --- tools/testing/selftests/bpf/prog_tests/setget_sockopt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c index e12255121c15..e4dac529d424 100644 --- a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c +++ b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c @@ -202,7 +202,7 @@ static void test_nonstandard_opt(int family) void test_setget_sockopt(void) { cg_fd = test__join_cgroup(CG_NAME); - if (cg_fd < 0) + if (!ASSERT_OK_FD(cg_fd, "join cgroup")) return; if (create_netns()) -- 2.43.5

4 months, 3 weeks

3
2
0 0

[PATCH bpf] bpf: skip non existing key in generic_map_lookup_batch

by Yan Zhai

The generic_map_lookup_batch currently returns EINTR if it fails with ENOENT and retries several times on bpf_map_copy_value. The next batch would start from the same location, presuming it's a transient issue. This is incorrect if a map can actually have "holes", i.e. "get_next_key" can return a key that does not point to a valid value. At least the array of maps type may contain such holes legitly. Right now these holes show up, generic batch lookup cannot proceed any more. It will always fail with EINTR errors. Rather, do not retry in generic_map_lookup_batch. If it finds a non existing element, skip to the next key. Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op") Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/ Signed-off-by: Yan Zhai <yan(a)cloudflare.com> --- kernel/bpf/syscall.c | 16 ++---- .../bpf/map_tests/map_in_map_batch_ops.c | 54 ++++++++++++++----- 2 files changed, 45 insertions(+), 25 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c420edbfb7c8..5691fc0d370d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1979,7 +1979,7 @@ int generic_map_lookup_batch(struct bpf_map *map, void __user *values = u64_to_user_ptr(attr->batch.values); void __user *keys = u64_to_user_ptr(attr->batch.keys); void *buf, *buf_prevkey, *prev_key, *key, *value; - int err, retry = MAP_LOOKUP_RETRIES; + int err; u32 value_size, cp, max_count; if (attr->batch.elem_flags & ~BPF_F_LOCK) @@ -2026,14 +2026,8 @@ int generic_map_lookup_batch(struct bpf_map *map, err = bpf_map_copy_value(map, key, value, attr->batch.elem_flags); - if (err == -ENOENT) { - if (retry) { - retry--; - continue; - } - err = -EINTR; - break; - } + if (err == -ENOENT) + goto next_key; if (err) goto free_buf; @@ -2048,12 +2042,12 @@ int generic_map_lookup_batch(struct bpf_map *map, goto free_buf; } + cp++; +next_key: if (!prev_key) prev_key = buf_prevkey; swap(prev_key, key); - retry = MAP_LOOKUP_RETRIES; - cp++; cond_resched(); } diff --git a/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c b/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c index 66191ae9863c..b38be71f06be 100644 --- a/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c +++ b/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c @@ -120,11 +120,12 @@ static void validate_fetch_results(int outer_map_fd, static void fetch_and_validate(int outer_map_fd, struct bpf_map_batch_opts *opts, - __u32 batch_size, bool delete_entries) + __u32 batch_size, bool delete_entries, + bool has_holes) { + int err, max_entries = OUTER_MAP_ENTRIES - !!has_holes; __u32 *fetched_keys, *fetched_values, total_fetched = 0; __u32 batch_key = 0, fetch_count, step_size; - int err, max_entries = OUTER_MAP_ENTRIES; __u32 value_size = sizeof(__u32); /* Total entries needs to be fetched */ @@ -135,9 +136,9 @@ static void fetch_and_validate(int outer_map_fd, "error=%s\n", strerror(errno)); for (step_size = batch_size; - step_size <= max_entries; + step_size < max_entries + batch_size; /* allow read partial */ step_size += batch_size) { - fetch_count = step_size; + fetch_count = batch_size; err = delete_entries ? bpf_map_lookup_and_delete_batch(outer_map_fd, total_fetched ? &batch_key : NULL, @@ -184,18 +185,19 @@ static void fetch_and_validate(int outer_map_fd, } static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type, - enum bpf_map_type inner_map_type) + enum bpf_map_type inner_map_type, + bool has_holes) { + __u32 max_entries = OUTER_MAP_ENTRIES - !!has_holes; __u32 *outer_map_keys, *inner_map_fds; - __u32 max_entries = OUTER_MAP_ENTRIES; LIBBPF_OPTS(bpf_map_batch_opts, opts); __u32 value_size = sizeof(__u32); int batch_size[2] = {5, 10}; __u32 map_index, op_index; int outer_map_fd, ret; - outer_map_keys = calloc(max_entries, value_size); - inner_map_fds = calloc(max_entries, value_size); + outer_map_keys = calloc(OUTER_MAP_ENTRIES, value_size); + inner_map_fds = calloc(OUTER_MAP_ENTRIES, value_size); CHECK((!outer_map_keys || !inner_map_fds), "Memory allocation failed for outer_map_keys or inner_map_fds", "error=%s\n", strerror(errno)); @@ -209,6 +211,24 @@ static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type, ((outer_map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) ? 9 : 1000) - map_index; + /* This condition is only meaningful for array of maps. + * + * max_entries == OUTER_MAP_ENTRIES - 1 if it is true. Say + * max_entries is short for n, then outer_map_keys looks like: + * + * [n, n-1, ... 2, 1] + * + * We change it to + * + * [n, n-1, ... 2, 0] + * + * So it will leave key 1 as a hole. It will serve to test the + * correctness when batch on an array: a "non-exist" key might be + * actually allocated and returned from key iteration. + */ + if (has_holes) + outer_map_keys[max_entries - 1]--; + /* batch operation - map_update */ ret = bpf_map_update_batch(outer_map_fd, outer_map_keys, inner_map_fds, &max_entries, &opts); @@ -219,12 +239,14 @@ static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type, /* batch operation - map_lookup */ for (op_index = 0; op_index < 2; ++op_index) fetch_and_validate(outer_map_fd, &opts, - batch_size[op_index], false); + batch_size[op_index], false, + has_holes); /* batch operation - map_lookup_delete */ if (outer_map_type == BPF_MAP_TYPE_HASH_OF_MAPS) fetch_and_validate(outer_map_fd, &opts, - max_entries, true /*delete*/); + max_entries, true /*delete*/, + has_holes); /* close all map fds */ for (map_index = 0; map_index < max_entries; map_index++) @@ -237,16 +259,20 @@ static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type, void test_map_in_map_batch_ops_array(void) { - _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_ARRAY); + _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_ARRAY, false); printf("%s:PASS with inner ARRAY map\n", __func__); - _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH); + _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH, false); printf("%s:PASS with inner HASH map\n", __func__); + _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_ARRAY, true); + printf("%s:PASS with inner ARRAY map with holes\n", __func__); + _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH, true); + printf("%s:PASS with inner HASH map with holes\n", __func__); } void test_map_in_map_batch_ops_hash(void) { - _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_ARRAY); + _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_ARRAY, false); printf("%s:PASS with inner ARRAY map\n", __func__); - _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_HASH); + _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_HASH, false); printf("%s:PASS with inner HASH map\n", __func__); } -- 2.30.2

4 months, 4 weeks

2
3
0 0

[PATCH v2] wireguard: selftests: Cleanup CONFIG_UBSAN_SANITIZE_ALL

by WangYuli

Commit 918327e9b7ff ("ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL") removed the CONFIG_UBSAN_SANITIZE_ALL configuration option. Eliminate invalid configurations to improve code readability. Reviewed-by: Simon Horman <horms(a)kernel.org> Signed-off-by: WangYuli <wangyuli(a)uniontech.com> --- Changelog: *v1->v2: Add Simon Horman's "Reviewed-by" tag. --- tools/testing/selftests/wireguard/qemu/debug.config | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/wireguard/qemu/debug.config b/tools/testing/selftests/wireguard/qemu/debug.config index 139fd9aa8b12..828f14300d0a 100644 --- a/tools/testing/selftests/wireguard/qemu/debug.config +++ b/tools/testing/selftests/wireguard/qemu/debug.config @@ -22,7 +22,6 @@ CONFIG_HAVE_ARCH_KASAN=y CONFIG_KASAN=y CONFIG_KASAN_INLINE=y CONFIG_UBSAN=y -CONFIG_UBSAN_SANITIZE_ALL=y CONFIG_DEBUG_KMEMLEAK=y CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_SHIRQ=y -- 2.47.2

4 months, 4 weeks

1
0
0 0

[PATCH net-next v4 0/3] netdev-genl: Add an xsk attribute to queues

by Joe Damato

Greetings: Welcome to v4. Small functional change, which makes the code cleaner (see changelog) and tests pass on my machine with mlx5 and netdevsim. This is an attempt to followup on something Jakub asked me about [1], adding an xsk attribute to queues and more clearly documenting which queues are linked to NAPIs... After the RFC [2], Jakub suggested creating an empty nest for queues which have a pool, so I've adjusted this version to work that way. The nest can be extended in the future to express attributes about XSK as needed. Queues which are not used for AF_XDP do not have the xsk attribute present. I've run the included test on: - my mlx5 machine (via NETIF=) - without setting NETIF And the test seems to pass in both cases. Thanks, Joe [1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/ [2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/ v4: - Add patch 1, as suggested by Jakub, which adds an empty nest helper. - Use the helper in patch 2, which makes the code cleaner and prevents a possible bug. v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/ - Change comment format in patch 2 to avoid kdoc warnings. No other changes. v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/ - Switched from RFC to actual submission now that net-next is open - Adjusted patch 1 to include an empty nest as suggested by Jakub - Adjusted patch 2 to update the test based on changes to patch 1, and to incorporate some Python feedback from Jakub :) rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/ Joe Damato (3): netlink: Add nla_put_empty_nest helper netdev-genl: Add an XSK attribute to queues selftests: drv-net: Test queue xsk attribute Documentation/netlink/specs/netdev.yaml | 13 ++- include/net/netlink.h | 15 ++++ include/uapi/linux/netdev.h | 6 ++ net/core/netdev-genl.c | 12 +++ tools/include/uapi/linux/netdev.h | 6 ++ .../testing/selftests/drivers/net/.gitignore | 2 + tools/testing/selftests/drivers/net/Makefile | 3 + tools/testing/selftests/drivers/net/queues.py | 35 +++++++- .../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++ 9 files changed, 178 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/drivers/net/.gitignore create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c base-commit: f3eba8edd885db439f4bfaa2cf9d766bad1ae6c5 -- 2.43.0

4 months, 4 weeks

1
1
0 0

[PATCH RFT v14 0/8] fork: Support shadow stacks in clone3()

by Mark Brown

The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zicfiss respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process. Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented. Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process, keeping the current implicit allocation behaviour if one is not specified either with clone3() or through the use of clone(). The user must provide a shadow stack pointer, this must point to memory mapped for use as a shadow stackby map_shadow_stack() with an architecture specified shadow stack token at the top of the stack. Please note that the x86 portions of this code are build tested only, I don't appear to have a system that can run CET available to me. [1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87… Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v14: - Rebase onto v6.14-rc1. - Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k… Changes in v13: - Rebase onto v6.13-rc1. - Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k… Changes in v12: - Add the regular prctl() to the userspace API document since arm64 support is queued in -next. - Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k… Changes in v11: - Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and integrate arm64 support. - Rework the interface to specify a shadow stack pointer rather than a base and size like we do for the regular stack. - Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k… Changes in v10: - Integrate fixes & improvements for the x86 implementation from Rick Edgecombe. - Require that the shadow stack be VM_WRITE. - Require that the shadow stack base and size be sizeof(void *) aligned. - Clean up trailing newline. - Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke… Changes in v9: - Pull token validation earlier and report problems with an error return to parent rather than signal delivery to the child. - Verify that the top of the supplied shadow stack is VM_SHADOW_STACK. - Rework token validation to only do the page mapping once. - Drop no longer needed support for testing for signals in selftest. - Fix typo in comments. - Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke… Changes in v8: - Fix token verification with user specified shadow stack. - Don't track user managed shadow stacks for child processes. - Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke… Changes in v7: - Rebase onto v6.11-rc1. - Typo fixes. - Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke… Changes in v6: - Rebase onto v6.10-rc3. - Ensure we don't try to free the parent shadow stack in error paths of x86 arch code. - Spelling fixes in userspace API document. - Additional cleanups and improvements to the clone3() tests to support the shadow stack tests. - Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke… Changes in v5: - Rebase onto v6.8-rc2. - Rework ABI to have the user allocate the shadow stack memory with map_shadow_stack() and a token. - Force inlining of the x86 shadow stack enablement. - Move shadow stack enablement out into a shared header for reuse by other tests. - Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke… Changes in v4: - Formatting changes. - Use a define for minimum shadow stack size and move some basic validation to fork.c. - Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke… Changes in v3: - Rebase onto v6.7-rc2. - Remove stale shadow_stack in internal kargs. - If a shadow stack is specified unconditionally use it regardless of CLONE_ parameters. - Force enable shadow stacks in the selftest. - Update changelogs for RISC-V feature rename. - Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke… Changes in v2: - Rebase onto v6.7-rc1. - Remove ability to provide preallocated shadow stack, just specify the desired size. - Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke… --- Mark Brown (8): arm64/gcs: Return a success value from gcs_alloc_thread_stack() Documentation: userspace-api: Add shadow stack API documentation selftests: Provide helper header for shadow stack testing fork: Add shadow stack support to clone3() selftests/clone3: Remove redundant flushes of output streams selftests/clone3: Factor more of main loop into test_clone3() selftests/clone3: Allow tests to flag if -E2BIG is a valid error code selftests/clone3: Test shadow stack support Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/shadow_stack.rst | 44 +++++ arch/arm64/include/asm/gcs.h | 8 +- arch/arm64/kernel/process.c | 8 +- arch/arm64/mm/gcs.c | 62 +++++- arch/x86/include/asm/shstk.h | 11 +- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/shstk.c | 57 +++++- include/asm-generic/cacheflush.h | 11 ++ include/linux/sched/task.h | 17 ++ include/uapi/linux/sched.h | 10 +- kernel/fork.c | 96 +++++++-- tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++---- tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++- tools/testing/selftests/ksft_shstk.h | 98 ++++++++++ 15 files changed, 635 insertions(+), 81 deletions(-) --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20231019-clone3-shadow-stack-15d40d2bf536 Best regards, -- Mark Brown <broonie(a)kernel.org>

4 months, 4 weeks

2
9
0 0

[PATCH bpf-next v9 0/5] xsk: TX metadata Launch Time support

by Song Yoong Siang

This series expands the XDP TX metadata framework to allow user applications to pass per packet 64-bit launch time directly to the kernel driver, requesting launch time hardware offload support. The XDP TX metadata framework will not perform any clock conversion or packet reordering. Please note that the role of Tx metadata is just to pass the launch time, not to enable the offload feature. Users will need to enable the launch time hardware offload feature of the device by using the respective command, such as the tc-etf command. Although some devices use the tc-etf command to enable their launch time hardware offload feature, xsk packets will not go through the etf qdisc. Therefore, in my opinion, the launch time should always be based on the PTP Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the clock source. To simplify the test steps, I modified the xdp_hw_metadata bpf self-test tool in such a way that it will set the launch time based on the offset provided by the user and the value of the Receive Hardware Timestamp, which is against the PHC. This will eliminate the need to discipline System Clock with the PHC and then use clock_gettime() to get the time. Please note that AF_XDP lacks a feedback mechanism to inform the application if the requested launch time is invalid. So, users are expected to familiar with the horizon of the launch time of the device they use and not request a launch time that is beyond the horizon. Otherwise, the driver might interpret the launch time incorrectly and react wrongly. For stmmac and igc, where modulo computation is used, a launch time larger than the horizon will cause the device to transmit the packet earlier that the requested launch time. Although there is no feedback mechanism for the launch time request for now, user still can check whether the requested launch time is working or not, by requesting the Transmit Completion Hardware Timestamp. V9: - Remove the igc_desc_unused() checking (Maciej) - Ensure that skb allocation and DMA mapping work before proceeding to fill in igc_tx_buffer info, context desc, and data desc (Maciej) - Rate limit the error messages (Maciej) - Update the comment to indicate that the 2 descriptors needed by the empty frame are already taken into consideration (Maciej) - Handle the case where the insertion of an empty frame fails and explain the reason behind (Maciej) - put self SOB tag as last tag (Maciej) V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int… - check the number of used descriptor in xsk_tx_metadata_request() by using used_desc of struct igc_metadata_request, and then decreases the budget with it (Maciej) - submit another bug fix patch to set the buffer type for empty frame (Maciej): https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int… V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int… - split the refactoring code of igc empty packet insertion into a separate commit (Faizal) - add explanation on why the value "4" is used as igc transmit budget (Faizal) - perform a stress test by sending 1000 packets with 10ms interval and launch time set to 500us in the future (Faizal & Yong Liang) V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int… - fix selftest build errors by using asprintf() and realloc(), instead of managing the buffer sizes manually (Daniel, Stanislav) V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int… - change netdev feature name from tx-launch-time to tx-launch-time-fifo to explicitly state the FIFO behaviour (Stanislav) - improve the looping of xdp_hw_metadata app to wait for packet tx completion to be more readable by using clock_gettime() (Stanislav) - add launch time setup steps into xdp_hw_metadata app (Stanislav) V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel… - added XDP launch time support to the igc driver (Jesper & Florian) - added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper) - added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub) - added step to enable launch time in the commit message (Jesper & Willem) - explicitly documented the type of launch_time and which clock source it is against (Willem) V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in… - renamed to use launch time (Jesper & Willem) - changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s because some NICs do not support such a large future time. V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in… - renamed to use Earliest TxTime First (Willem) - renamed to use txtime (Willem) V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int… Song Yoong Siang (5): xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add launch time request to xdp_hw_metadata net: stmmac: Add launch time support to XDP ZC igc: Refactor empty frame insertion for launch time support igc: Add launch time support to XDP ZC Documentation/netlink/specs/netdev.yaml | 4 + Documentation/networking/xsk-tx-metadata.rst | 62 +++++++ drivers/net/ethernet/intel/igc/igc.h | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 141 +++++++++++---- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++ include/net/xdp_sock.h | 10 ++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 ++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 ++ tools/include/uapi/linux/netdev.h | 3 + tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++- 15 files changed, 394 insertions(+), 39 deletions(-) -- 2.34.1

4 months, 4 weeks

2
8
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror