- Linux-kselftest-mirror - lists.linaro.org

[PATCH v6 0/3] printf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and scanf), the rest having been converted to KUnit. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 printf I have also sent out a series converting scanf[0]. Link: https://lore.kernel.org/all/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@… [0] Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v6: - Use __printf correctly on `__test`. (Petr Mladek) - Rebase on linux-next. - Remove leftover references to `printf.sh`. - Update comment in `hash_pointer`. (Petr Mladek) - Avoid overrun in `KUNIT_EXPECT_MEMNEQ`. (Petr Mladek) - Restore trailing newlines on printk strings and add some missing ones. (Petr Mladek) - Use `kunit_skip` on not-yet-initialized crng. (Petr Mladek) - Link to v5: https://lore.kernel.org/r/20250221-printf-kunit-convert-v5-0-5db840301730@g… Changes in v5: - Update `do_test` `__printf` annotation (Rasmus Villemoes). - Link to v4: https://lore.kernel.org/r/20250214-printf-kunit-convert-v4-0-c254572f1565@g… Changes in v4: - Add patch "implicate test line in failure messages". - Rebase on linux-next, move scanf_kunit.c into lib/tests/. - Link to v3: https://lore.kernel.org/r/20250210-printf-kunit-convert-v3-0-ee6ac5500f5e@g… Changes in v3: - Remove extraneous trailing newlines from failure messages. - Replace `pr_warn` with `kunit_warn`. - Drop arch changes. - Remove KUnit boilerplate from CONFIG_PRINTF_KUNIT_TEST help text. - Restore `total_tests` counting. - Remove tc_fail macro in last patch. - Link to v2: https://lore.kernel.org/r/20250207-printf-kunit-convert-v2-0-057b23860823@g… Changes in v2: - Incorporate code review from prior work[0] by Arpitha Raghunandan. - Link to v1: https://lore.kernel.org/r/20250204-printf-kunit-convert-v1-0-ecf1b846a4de@g… Link: https://lore.kernel.org/lkml/20200817043028.76502-1-98.arpi@gmail.com/t/#u [0] --- Tamir Duberstein (3): printf: convert self-test to KUnit printf: break kunit into test cases printf: implicate test line in failure messages Documentation/core-api/printk-formats.rst | 4 +- Documentation/dev-tools/kselftest.rst | 2 +- MAINTAINERS | 2 +- lib/Kconfig.debug | 12 +- lib/Makefile | 1 - lib/tests/Makefile | 1 + lib/{test_printf.c => tests/printf_kunit.c} | 442 ++++++++++++---------------- tools/testing/selftests/kselftest/module.sh | 2 +- tools/testing/selftests/lib/Makefile | 2 +- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/printf.sh | 4 - 11 files changed, 207 insertions(+), 266 deletions(-) --- base-commit: 7ec162622e66a4ff886f8f28712ea1b13069e1aa change-id: 20250131-printf-kunit-convert-fd4012aa2ec6 Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

10 months

3
9
0 0

[PATCH v4 0/6] Fix some issues related to an interrupt type in pci_endpoint_test

by Kunihiko Hayashi

This series solves some issues about global "irq_type" that is used for indicating the current type for users. In addition, avoid an unexpected warning that occur due to interrupts remaining after displaying an error caused by devm_request_irq(). Patch 1 includes adding GET_IRQTYPE test (check for failure). Patch 2-4 include fixes for stable kernels that have global "irq_type". Patch 5-6 include improvements for the latest. Changes since v3: - Add GET_IRQTYPE check to pci_endpoint test in selftests - Add the reason why global variables aren't necessary (patch 5/6) - Add Reviewed-by: lines (patch {2, 4, 6}/6) Changes since v2: - Rebase to v6.14-rc1 - Update message to clarify, and add result of call trace (patch 1/5) - Add Reviewed-by: lines (patch 2/5) - Add new patch to remove global "irq_type" variable (patch 4/5) - Add new patch to replace "devm" version of IRQ functions (patch 5/5) Changes since v1: - Divide original patch into two - Add an error message example - Add "pcitest" display example - Add a patch to fix an interrupt remaining issue Kunihiko Hayashi (6): selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error misc: pci_endpoint_test: Fix displaying irq_type after request_irq error misc: pci_endpoint_test: Fix irq_type to convey the correct type misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi' misc: pci_endpoint_test: Do not use managed irq functions drivers/misc/pci_endpoint_test.c | 31 +++++++------------ .../pci_endpoint/pci_endpoint_test.c | 11 ++++++- 2 files changed, 21 insertions(+), 21 deletions(-) -- 2.25.1

10 months

4
11
0 0

[PATCH v10 0/8] Buddy allocator like (or non-uniform) folio split

by Zi Yan

Hi all, This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces 1. the total number of after-split folios from 2^(n-m) to n-m+1; 2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6; 3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios. For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios. Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). xfstests quick group passed for both tmpfs and xfs. I also semi-replicated Hugh's test[12] and ran it without any issue for almost 24 hours. It is on top of mm-everything-2025-03-07-07-55. It is ready to be merged. Changelog === From V9[13] 1. Incorporated Hugh's fixes[14] (Thanks Hugh): a) moved folio_set_order() in __split_folio_to_order() to be called only once for the input folio, b) used folio_test_swapcache() to catch both anon and shmem in swap cache cases, c) moved folio_next() out of for(;;), d) used mapping instead of origin_folio->mapping. 2. Added a TODO in __folio_split(), since large in-swap-cache shmem folio split is not supported yet. 3. Changed __split_folio_to_order() based on David Hildenbrand's MM owner tracking for large folios patchset[15], due to rebasing. From V8[11]: 1. Removed gfp parameter from xas_try_split() and GFP_NOWAIT is used all the time. (per Baolin Wang) 2. Used __xas_init_node_for_split() instead of __xas_alloc_node_for_split() and moved node allocation out. It fixed a bug when xa_node is pre-allocated by xas_nomem() before xas_try_split() is called without being initialized for split. From V7[9]: 1. Fixed a wrong function name in lib/test_xarray.c. 2. Made __split_folio_to_order() never fail, since the old order check is already done in __folio_split(). (per David Hildenbrand) 3. Fixed an issue reported by syzbot[10] by not dropping the original folio during truncate. 4. Fixed a WARNING when READ_ONLY_THP_FOR_FS is enabled. (Thank David Hildenbrand for reporting the issue) 5. Used two separate struct page* parameters, split_at and lock_at, to specify at which subpage the non-uniform split happens and which subpage to keep locked after the split, respectively. It improves code readability. From V6[8]: 1. Added an xarray function xas_try_split() to support iterative folio split, removing the need of using xas_split_alloc() and xas_split(). The function guarantees that at most one xa_node is allocated for each call. 2. Added concrete numbers of after-split folios and xa_node savings to cover letter, commit log. (per Andrew) From V5[7]: 1. Split shmem to any lower order patches are in mm tree, so dropped from this series. 2. Rename split_folio_at() to try_folio_split() to clarify that non-uniform split will not be used if it is not supported. From V4[6]: 1. Enabled shmem support in both uniform and buddy allocator like split and added selftests for it. 2. Added functions to check if uniform split and buddy allocator like split are supported for the given folio and order. 3. Made truncate fall back to uniform split if buddy allocator split is not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio). 4. Added the missing folio_clear_has_hwpoisoned() to __split_unmapped_folio(). From V3[5]: 1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra operations inside xas_split_alloc() are needed for correctness. 2. Enabled folio_split() for shmem and no issue was found with xfstests quick test group. 3. Split both ends of a truncate range in truncate_inode_partial_folio() to avoid wasting memory in shmem truncate (per David Hildenbrand). 4. Removed page_in_folio_offset() since page_folio() does the same thing. 5. Finished truncate related tests from xfstests quick test group on XFS and tmpfs without issues. 6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio. This check was missed in the prior versions. From V2[3]: 1. Incorporated all the feedback from Kirill[4]. 2. Used GFP_NOWAIT for xas_nomem(). 3. Tested the code path when xas_nomem() fails. 4. Added selftests for folio_split(). 5. Fixed no THP config build error. From V1[2]: 1. Split the original patch 1 into multiple ones for easy review (per Kirill). 2. Added xas_destroy() to avoid memory leak. 3. Fixed nr_dropped not used error (per kernel test robot). 4. Added proper error handling when xas_nomem() fails to allocate memory for xas_split() during buddy allocator like split. From RFC[1]: 1. Merged backend code of split_huge_page_to_list_to_order() and folio_split(). The same code is used for both uniform split and buddy allocator like split. 2. Use xas_nomem() instead of xas_split_alloc() for folio_split(). 3. folio_split() now leaves the first after-split folio unlocked, instead of the one containing the given page, since the caller of truncate_inode_partial_folio() locks and unlocks the first folio. 4. Extended split_huge_page debugfs to use folio_split(). 5. Added truncate_inode_partial_folio() as first user of folio_split(). Design === folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache Since anon folio does not support order-1 yet. The split process is similar to existing approach: 1. Unmap all page mappings (split PMD mappings if exist); 2. Split meta data like memcg, page owner, page alloc tag; 3. Copy meta data in struct folio to sub pages, but instead of spliting the whole folio into multiple smaller ones with the same order in a shot, this approach splits the folio iteratively. Taking the example above, this approach first splits the original order-9 into two order-8, then splits left part of order-8 to two order-7 and so on; 4. Post-process split folios, like write mapping->i_pages for pagecache, adjust folio refcounts, add split folios to corresponding list; 5. Remap split folios 6. Unlock split folios. __split_unmapped_folio() and __split_folio_to_order() replace __split_huge_page() and __split_huge_page_tail() respectively. __split_unmapped_folio() uses different approaches to perform uniform split and buddy allocator like split: 1. uniform split: one single call to __split_folio_to_order() is used to uniformly split the given folio. All resulting folios are put back to the list after split. The folio containing the given page is left to caller to unlock and others are unlocked. 2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls to __split_folio_to_order() are used to split the given folio at order N to order N-1. After each call, the target folio is changed to the one containing the page, which is given as a folio_split() parameter. After each call, folios not containing the page are put back to the list. The folio containing the page is put back to the list when its order is new_order. All folios are unlocked except the first folio, which is left to caller to unlock. Patch Overview === 1. Patch 1 added a new xarray function xas_try_split() to perform iterative xarray split. 2. Patch 2 added __split_unmapped_folio() and __split_folio_to_order() to prepare for moving to new backend split code. 3. Patch 3 moved common code in split_huge_page_to_list_to_order() to __folio_split(). 4. Patch 4 added new folio_split() and made split_huge_page_to_list_to_order() share the new __split_unmapped_folio() with folio_split(). 5. Patch 5 removed no longer used __split_huge_page() and __split_huge_page_tail(). 6. Patch 6 added a new in_folio_offset to split_huge_page debugfs for folio_split() test. 7. Patch 7 used try_folio_split() for truncate operation. 8. Patch 8 added folio_split() tests. Any comments and/or suggestions are welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/ [3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/ [4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc… [5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/ [6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/ [7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/ [8] https://lore.kernel.org/linux-mm/20250205031417.1771278-1-ziy@nvidia.com/ [9] https://lore.kernel.org/linux-mm/20250211155034.268962-1-ziy@nvidia.com/ [10] https://lore.kernel.org/all/67af65cb.050a0220.21dd3.004a.GAE@google.com/ [11] https://lore.kernel.org/linux-mm/20250218235012.1542225-1-ziy@nvidia.com/ [12] https://lore.kernel.org/linux-mm/D45D4F01-E5A5-47E6-8724-01610CC192CC@nvidi… [13] https://lore.kernel.org/linux-mm/20250226210032.2044041-1-ziy@nvidia.com/ [14] https://lore.kernel.org/linux-mm/2fae27fe-6e2e-3587-4b68-072118d80cf8@googl… [15] https://lore.kernel.org/all/20250303163014.1128035-4-david@redhat.com/ Zi Yan (8): xarray: add xas_try_split() to split a multi-index entry mm/huge_memory: add two new (not yet used) functions for folio_split() mm/huge_memory: move folio split common code to __folio_split() mm/huge_memory: add buddy allocator like (non-uniform) folio_split() mm/huge_memory: remove the old, unused __split_huge_page() mm/huge_memory: add folio_split() to debugfs testing interface mm/truncate: use folio_split() in truncate operation selftests/mm: add tests for folio_split(), buddy allocator like split Documentation/core-api/xarray.rst | 14 +- include/linux/huge_mm.h | 36 + include/linux/xarray.h | 6 + lib/test_xarray.c | 52 ++ lib/xarray.c | 132 ++- mm/huge_memory.c | 786 ++++++++++++------ mm/truncate.c | 37 +- tools/testing/radix-tree/Makefile | 1 + .../selftests/mm/split_huge_page_test.c | 34 +- 9 files changed, 809 insertions(+), 289 deletions(-) -- 2.47.2

10 months

2
16
0 0

[PATCH net 7/7] selftests: net: test for lwtunnel dst ref loops

by Justin Iurman

As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note on selftest posting") in net-next, the selftest is therefore shipped in this series. However, this selftest does not really test this series. It needs this series to avoid crashing the kernel. What it really tests, thanks to kmemleak, is what was fixed by the following commits: - commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels") - commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") - commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6 lwt") - commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl lwt") - commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel") - commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila lwtunnel") Cc: Shuah Khan <shuah(a)kernel.org> Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Justin Iurman <justin.iurman(a)uliege.be> --- tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/config | 2 + .../selftests/net/lwt_dst_cache_ref_loop.sh | 250 ++++++++++++++++++ 3 files changed, 253 insertions(+) create mode 100755 tools/testing/selftests/net/lwt_dst_cache_ref_loop.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 73ee88d6b043..8f32b4f01aee 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -100,6 +100,7 @@ TEST_PROGS += vlan_bridge_binding.sh TEST_PROGS += bpf_offload.py TEST_PROGS += ipv6_route_update_soft_lockup.sh TEST_PROGS += busy_poll_test.sh +TEST_PROGS += lwt_dst_cache_ref_loop.sh # YNL files, must be before "include ..lib.mk" YNL_GEN_FILES := busy_poller netlink-dumps diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 5b9baf708950..61e5116987f3 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -107,3 +107,5 @@ CONFIG_XFRM_INTERFACE=m CONFIG_XFRM_USER=m CONFIG_IP_NF_MATCH_RPFILTER=m CONFIG_IP6_NF_MATCH_RPFILTER=m +CONFIG_IPV6_ILA=m +CONFIG_IPV6_RPL_LWTUNNEL=y diff --git a/tools/testing/selftests/net/lwt_dst_cache_ref_loop.sh b/tools/testing/selftests/net/lwt_dst_cache_ref_loop.sh new file mode 100755 index 000000000000..9161f16154a5 --- /dev/null +++ b/tools/testing/selftests/net/lwt_dst_cache_ref_loop.sh @@ -0,0 +1,250 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ +# +# Author: Justin Iurman <justin.iurman(a)uliege.be> +# +# WARNING +# ------- +# This is just a dummy script that triggers encap cases with possible dst cache +# reference loops in affected lwt users (see list below). Some cases are +# pathological configurations for simplicity, others are valid. Overall, we +# don't want this issue to happen, no matter what. In order to catch any +# reference loops, kmemleak MUST be used. The results alone are always blindly +# successful, don't rely on them. Note that the following tests may crash the +# kernel if the fix to prevent lwtunnel_{input|output|xmit}() reentry loops is +# not present. +# +# Affected lwt users so far (please update accordingly if needed): +# - ila_lwt (output only) +# - ioam6_iptunnel (output only) +# - rpl_iptunnel (both input and output) +# - seg6_iptunnel (both input and output) + +source lib.sh + +check_compatibility() +{ + setup_ns tmp_node &>/dev/null + if [ $? != 0 ] + then + echo "SKIP: Cannot create netns." + exit $ksft_skip + fi + + ip link add name veth0 netns $tmp_node type veth \ + peer name veth1 netns $tmp_node &>/dev/null + local ret=$? + + ip -netns $tmp_node link set veth0 up &>/dev/null + ret=$((ret + $?)) + + ip -netns $tmp_node link set veth1 up &>/dev/null + ret=$((ret + $?)) + + if [ $ret != 0 ] + then + echo "SKIP: Cannot configure links." + cleanup_ns $tmp_node + exit $ksft_skip + fi + + lsmod 2>/dev/null | grep -q "ila" + ila_lsmod=$? + [ $ila_lsmod != 0 ] && modprobe ila &>/dev/null + + ip -netns $tmp_node route add 2001:db8:1::/64 \ + encap ila 1:2:3:4 csum-mode no-action ident-type luid hook-type output \ + dev veth0 &>/dev/null + + ip -netns $tmp_node route add 2001:db8:2::/64 \ + encap ioam6 trace prealloc type 0x800000 ns 0 size 4 dev veth0 &>/dev/null + + ip -netns $tmp_node route add 2001:db8:3::/64 \ + encap rpl segs 2001:db8:3::1 dev veth0 &>/dev/null + + ip -netns $tmp_node route add 2001:db8:4::/64 \ + encap seg6 mode inline segs 2001:db8:4::1 dev veth0 &>/dev/null + + ip -netns $tmp_node -6 route 2>/dev/null | grep -q "encap ila" + skip_ila=$? + + ip -netns $tmp_node -6 route 2>/dev/null | grep -q "encap ioam6" + skip_ioam6=$? + + ip -netns $tmp_node -6 route 2>/dev/null | grep -q "encap rpl" + skip_rpl=$? + + ip -netns $tmp_node -6 route 2>/dev/null | grep -q "encap seg6" + skip_seg6=$? + + cleanup_ns $tmp_node +} + +setup() +{ + setup_ns alpha beta gamma &>/dev/null + + ip link add name veth-alpha netns $alpha type veth \ + peer name veth-betaL netns $beta &>/dev/null + ip link add name veth-betaR netns $beta type veth \ + peer name veth-gamma netns $gamma &>/dev/null + + ip -netns $alpha link set veth-alpha name veth0 &>/dev/null + ip -netns $beta link set veth-betaL name veth0 &>/dev/null + ip -netns $beta link set veth-betaR name veth1 &>/dev/null + ip -netns $gamma link set veth-gamma name veth0 &>/dev/null + + ip -netns $alpha addr add 2001:db8:1::2/64 dev veth0 &>/dev/null + ip -netns $alpha link set veth0 up &>/dev/null + ip -netns $alpha link set lo up &>/dev/null + ip -netns $alpha route add 2001:db8:2::/64 \ + via 2001:db8:1::1 dev veth0 &>/dev/null + + ip -netns $beta addr add 2001:db8:1::1/64 dev veth0 &>/dev/null + ip -netns $beta addr add 2001:db8:2::1/64 dev veth1 &>/dev/null + ip -netns $beta link set veth0 up &>/dev/null + ip -netns $beta link set veth1 up &>/dev/null + ip -netns $beta link set lo up &>/dev/null + ip -netns $beta route del 2001:db8:2::/64 + ip -netns $beta route add 2001:db8:2::/64 dev veth1 + ip netns exec $beta sysctl -wq net.ipv6.conf.all.forwarding=1 &>/dev/null + + ip -netns $gamma addr add 2001:db8:2::2/64 dev veth0 &>/dev/null + ip -netns $gamma link set veth0 up &>/dev/null + ip -netns $gamma link set lo up &>/dev/null + ip -netns $gamma route add 2001:db8:1::/64 \ + via 2001:db8:2::1 dev veth0 &>/dev/null + + sleep 1 + + ip netns exec $alpha ping6 -c 5 -W 1 2001:db8:2::2 &>/dev/null + if [ $? != 0 ] + then + echo "SKIP: Setup failed." + cleanup + exit $ksft_skip + fi + + sleep 1 +} + +cleanup() +{ + cleanup_ns $alpha $beta $gamma + [ $ila_lsmod != 0 ] && modprobe -r ila &>/dev/null +} + +run_ila() +{ + if [ $skip_ila != 0 ] + then + echo "SKIP: ila (output)" + return + fi + + ip -netns $beta route del 2001:db8:2::/64 + + ip -netns $beta route add 2001:db8:2:0:0:0:0:2/128 \ + encap ila 2001:db8:2:0 csum-mode no-action ident-type luid hook-type output \ + dev veth1 &>/dev/null + sleep 1 + + echo "TEST: ila (output)" + ip netns exec $beta ping6 -c 2 -W 1 2001:db8:2::2 &>/dev/null + sleep 1 + + ip -netns $beta route del 2001:db8:2:0:0:0:0:2/128 + ip -netns $beta route add 2001:db8:2::/64 dev veth1 + sleep 1 +} + +run_ioam6() +{ + if [ $skip_ioam6 != 0 ] + then + echo "SKIP: ioam6 (output)" + return + fi + + ip -netns $beta route change 2001:db8:2::/64 \ + encap ioam6 trace prealloc type 0x800000 ns 1 size 4 \ + dev veth1 &>/dev/null + sleep 1 + + echo "TEST: ioam6 (output)" + ip netns exec $beta ping6 -c 2 -W 1 2001:db8:2::2 &>/dev/null + sleep 1 +} + +run_rpl() +{ + if [ $skip_rpl != 0 ] + then + echo "SKIP: rpl (input)" + echo "SKIP: rpl (output)" + return + fi + + ip -netns $beta route change 2001:db8:2::/64 \ + encap rpl segs 2001:db8:2::2 \ + dev veth1 &>/dev/null + sleep 1 + + echo "TEST: rpl (input)" + ip netns exec $alpha ping6 -c 2 -W 1 2001:db8:2::2 &>/dev/null + sleep 1 + + echo "TEST: rpl (output)" + ip netns exec $beta ping6 -c 2 -W 1 2001:db8:2::2 &>/dev/null + sleep 1 +} + +run_seg6() +{ + if [ $skip_seg6 != 0 ] + then + echo "SKIP: seg6 (input)" + echo "SKIP: seg6 (output)" + return + fi + + ip -netns $beta route change 2001:db8:2::/64 \ + encap seg6 mode inline segs 2001:db8:2::2 \ + dev veth1 &>/dev/null + sleep 1 + + echo "TEST: seg6 (input)" + ip netns exec $alpha ping6 -c 2 -W 1 2001:db8:2::2 &>/dev/null + sleep 1 + + echo "TEST: seg6 (output)" + ip netns exec $beta ping6 -c 2 -W 1 2001:db8:2::2 &>/dev/null + sleep 1 +} + +run() +{ + run_ila + run_ioam6 + run_rpl + run_seg6 +} + +if [ "$(id -u)" -ne 0 ] +then + echo "SKIP: Need root privileges." + exit $ksft_skip +fi + +if [ ! -x "$(command -v ip)" ] +then + echo "SKIP: Could not run test without ip tool." + exit $ksft_skip +fi + +check_compatibility +setup +run +cleanup + +exit $ksft_pass -- 2.34.1

10 months

2
1
0 0

Re: [PATCH 3/3] rust: replace `addr_of[_mut]!` with `&raw [mut]`

by Benno Lossin

On Thu Mar 13, 2025 at 6:33 AM CET, Antonio Hickey wrote: > Replacing all occurrences of `addr_of!(place)` with `&raw place`, and > all occurrences of `addr_of_mut!(place)` with `&raw mut place`. > > Utilizing the new feature will allow us to reduce macro complexity, and > improve consistency with existing reference syntax as `&raw`, `&raw mut` > is very similar to `&`, `&mut` making it fit more naturally with other > existing code. > > Depends on: Patch 1/3 0001-rust-enable-raw_ref_op-feature.patch This information shouldn't be in the commit message. You can put it below the `---` (that won't end up in the commit message). But since you sent this as part of a series, you don't need to mention it. > Suggested-by: Benno Lossin <y86-dev(a)protonmail.com> > Link: https://github.com/Rust-for-Linux/linux/issues/1148 > Signed-off-by: Antonio Hickey <contact(a)antoniohickey.com> > --- > rust/kernel/block/mq/request.rs | 4 ++-- > rust/kernel/faux.rs | 4 ++-- > rust/kernel/fs/file.rs | 2 +- > rust/kernel/init.rs | 8 ++++---- > rust/kernel/init/macros.rs | 28 +++++++++++++------------- > rust/kernel/jump_label.rs | 4 ++-- > rust/kernel/kunit.rs | 4 ++-- > rust/kernel/list.rs | 2 +- > rust/kernel/list/impl_list_item_mod.rs | 6 +++--- > rust/kernel/net/phy.rs | 4 ++-- > rust/kernel/pci.rs | 4 ++-- > rust/kernel/platform.rs | 4 +--- > rust/kernel/rbtree.rs | 22 ++++++++++---------- > rust/kernel/sync/arc.rs | 2 +- > rust/kernel/task.rs | 4 ++-- > rust/kernel/workqueue.rs | 8 ++++---- > 16 files changed, 54 insertions(+), 56 deletions(-) [...] > diff --git a/rust/kernel/jump_label.rs b/rust/kernel/jump_label.rs > index 4e974c768dbd..05d4564714c7 100644 > --- a/rust/kernel/jump_label.rs > +++ b/rust/kernel/jump_label.rs > @@ -20,8 +20,8 @@ > #[macro_export] > macro_rules! static_branch_unlikely { > ($key:path, $keytyp:ty, $field:ident) => {{ > - let _key: *const $keytyp = ::core::ptr::addr_of!($key); > - let _key: *const $crate::bindings::static_key_false = ::core::ptr::addr_of!((*_key).$field); > + let _key: *const $keytyp = &raw $key; This should be `&raw const $key`. I wrote that wrongly in the issue. > + let _key: *const $crate::bindings::static_key_false = &raw (*_key).$field; Same here. > let _key: *const $crate::bindings::static_key = _key.cast(); > > #[cfg(not(CONFIG_JUMP_LABEL))] > diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs > index 824da0e9738a..18357dd782ed 100644 > --- a/rust/kernel/kunit.rs > +++ b/rust/kernel/kunit.rs > @@ -128,9 +128,9 @@ unsafe impl Sync for UnaryAssert {} > unsafe { > $crate::bindings::__kunit_do_failed_assertion( > kunit_test, > - core::ptr::addr_of!(LOCATION.0), > + &raw LOCATION.0, And here. > $crate::bindings::kunit_assert_type_KUNIT_ASSERTION, > - core::ptr::addr_of!(ASSERTION.0.assert), > + &raw ASSERTION.0.assert, Lastly here as well. --- Cheers, Benno

10 months

1
0
0 0

arm64: Kernel crash at devm_kmalloc include/linux/device.h: drivers/base/devres.c

by Naresh Kamboju

Regression on arm64 FVP and rock-pi-4-b while booting the Linux next-20250312 and next-20250313. the following crash noticed with KVM Kconfigs. First seen on next-20250312. Good: next-20250311 Bad: 6.14.0-rc6-next-20250312 and 6.14.0-rc6-next-20250313 Boot regression: arm64 devm_kmalloc rk_iommu_of_xlate kernel panic Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Boot log [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] [ 0.000000] Linux version 6.14.0-rc6-next-20250313 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.3.0-12) 13.3.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT @1741850074 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: Radxa ROCK Pi 4B <trim> [ 1.028830] SuperH (H)SCI(F) driver initialized [ 1.030168] STM32 USART driver initialized [ 1.052086] Unable to handle kernel NULL pointer dereference at virtual address 00000000000002a0 [ 1.052877] Mem abort info: [ 1.053137] ESR = 0x0000000096000004 [ 1.053481] EC = 0x25: DABT (current EL), IL = 32 bits [ 1.053960] SET = 0, FnV = 0 [ 1.054241] EA = 0, S1PTW = 0 [ 1.054530] FSC = 0x04: level 0 translation fault [ 1.054971] Data abort info: [ 1.055236] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1.055760] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1.056216] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1.056693] [00000000000002a0] user address but active_mm is swapper [ 1.057264] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 1.057821] Modules linked in: [ 1.058105] CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6-next-20250313 #1 [ 1.058823] Hardware name: Radxa ROCK Pi 4B (DT) [ 1.059236] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1.059854] pc : devm_kmalloc (include/linux/device.h:805 drivers/base/devres.c:853) [ 1.060225] lr : rk_iommu_of_xlate (drivers/iommu/rockchip-iommu.c:1152) [ 1.060614] sp : ffff80008306b7c0 [ 1.060913] x29: ffff80008306b7c0 x28: ffff8000825a9068 x27: ffff80008225ce00 [ 1.061560] x26: 0000000000000001 x25: ffff80008225e188 x24: ffff000000eafa80 [ 1.062206] x23: 0000000000000000 x22: ffff800081af6a98 x21: 0000000000000000 [ 1.062850] x20: 0000000000000010 x19: 0000000000000000 x18: 00000000ffffffff [ 1.063493] x17: ffff0000019abc00 x16: ffff000001985400 x15: ffff80008306b820 [ 1.064136] x14: ffff000002736a1c x13: ffff00000273627c x12: 0101010101010101 [ 1.064780] x11: 7f7f7f7f7f7f7f7f x10: 000000000016f8a0 x9 : ffff800080b65870 [ 1.065424] x8 : ffff80008306b718 x7 : 0000000000000000 x6 : 0000000000000001 [ 1.066066] x5 : ffff800082890000 x4 : ffff8000828905f0 x3 : 0000000000000000 [ 1.066708] x2 : 0000000000000dc0 x1 : 0000000000000010 x0 : 0000000000000090 [ 1.067351] Call trace: [ 1.067576] devm_kmalloc (include/linux/device.h:805 drivers/base/devres.c:853) (P) [ 1.067937] rk_iommu_of_xlate (drivers/iommu/rockchip-iommu.c:1152) [ 1.068295] of_iommu_xlate (drivers/iommu/of_iommu.c:39) [ 1.068629] of_iommu_configure (drivers/iommu/of_iommu.c:71 drivers/iommu/of_iommu.c:98 drivers/iommu/of_iommu.c:149) [ 1.069008] of_dma_configure_id (drivers/of/device.c:161) [ 1.069392] platform_dma_configure (drivers/base/platform.c:1455) [ 1.069796] __iommu_probe_device (drivers/iommu/iommu.c:430 drivers/iommu/iommu.c:569) [ 1.070194] probe_iommu_group (drivers/iommu/iommu.c:1722) [ 1.070553] bus_for_each_dev (drivers/base/bus.c:370) [ 1.070906] iommu_device_register (drivers/iommu/iommu.c:1875 drivers/iommu/iommu.c:276) [ 1.071304] rk_iommu_probe (drivers/iommu/rockchip-iommu.c:1263) [ 1.071652] platform_probe (drivers/base/platform.c:1404) [ 1.071986] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658) [ 1.072316] __driver_probe_device (drivers/base/dd.c:800) [ 1.072716] driver_probe_device (drivers/base/dd.c:830) [ 1.073100] __driver_attach (drivers/base/dd.c:1217) [ 1.073453] bus_for_each_dev (drivers/base/bus.c:370) [ 1.073805] driver_attach (drivers/base/dd.c:1235) [ 1.074135] bus_add_driver (drivers/base/bus.c:678) [ 1.074488] driver_register (drivers/base/driver.c:249) [ 1.074836] __platform_driver_register (drivers/base/platform.c:868) [ 1.075261] rk_iommu_driver_init (drivers/iommu/rockchip-iommu.c:1380) [ 1.075644] do_one_initcall (init/main.c:1257) [ 1.075996] kernel_init_freeable (init/main.c:1318 (discriminator 1) init/main.c:1335 (discriminator 1) init/main.c:1354 (discriminator 1) init/main.c:1567 (discriminator 1)) [ 1.076393] kernel_init (init/main.c:1461) [ 1.076720] ret_from_fork (arch/arm64/kernel/entry.S:863) [ 1.077053] Code: aa0003f5 b1020020 a90153f3 aa0103f4 (b942a2b7) All code ======== 0: aa0003f5 mov x21, x0 4: b1020020 adds x0, x1, #0x80 8: a90153f3 stp x19, x20, [sp, #16] c: aa0103f4 mov x20, x1 10:* b942a2b7 ldr w23, [x21, #672] <-- trapping instruction Code starting with the faulting instruction =========================================== 0: b942a2b7 ldr w23, [x21, #672] [ 1.077594] ---[ end trace 0000000000000000 ]--- [ 1.078059] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 1.078736] SMP: stopping secondary CPUs [ 1.079099] Kernel Offset: disabled [ 1.079412] CPU features: 0x0400,00041058,01000400,8200421b [ 1.079909] Memory Limit: none [ 1.080188] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- ## Source * Kernel version: 6.14.0-rc6-next-20250313 * Git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: 613af589b566093ce7388bf3202fca70d742c166 * Git describe: 6.14.0-rc6-next-20250313 * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250305/ * Architectures: arm64 * Compilers: gcc-13 ## Test data * Test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250313/te… * Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250313/te… * Test details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250313/te… * Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2uFgT27J8KGNrdNXzKgU… * Kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2uFgT27J8KGNrdNXzKgU… -- Linaro LKFT https://lkft.linaro.org

10 months

2
1
0 0

[PATCH net-next] selftests: net: bump GRO timeout for gro/setup_veth

by Jakub Kicinski

Commit 51bef03e1a71 ("selftests/net: deflake GRO tests") recently switched to NAPI suspension, and lowered the timeout from 1ms to 100us. This started causing flakes in netdev-run CI. Let's bump it to 200us. In a quick test of a debug kernel I see failures with 100us, with 200us in 5 runs I see 2 completely clean runs and 3 with a single retry (GRO test will retry up to 5 times). Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: krakauer(a)google.com CC: willemb(a)google.com CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/net/setup_veth.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh index eb3182066d12..152bf4c65747 100644 --- a/tools/testing/selftests/net/setup_veth.sh +++ b/tools/testing/selftests/net/setup_veth.sh @@ -11,7 +11,7 @@ setup_veth_ns() { local -r ns_mac="$4" [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" - echo 100000 > "/sys/class/net/${ns_dev}/gro_flush_timeout" + echo 200000 > "/sys/class/net/${ns_dev}/gro_flush_timeout" echo 1 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs" ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 ip -netns "${ns_name}" link set dev "${ns_dev}" up -- 2.48.1

10 months

3
2
0 0

[PATCH net-next v22 00/23] Introducing OpenVPN Data Channel Offload

by Antonio Quartulli

Notable changes since v21: * accessed crypto_slot->primary_idx via READ/WRITE_ONCE * made ovpn_aead_init() static * converted link tx/rx packet counters from u32 to to uint * ensured all u32 NL attributes are read by nla_get_u32() * ensured all u32 NL attrivutes are written by nla_put_u32() * reset cache upon float or local endpoint change * dropped check for delta > 0 in keepalive worker scheduling * improved comments in update endpoints logic * converted local_ip to void* to avoid useless casts Please note that some patches were already reviewed/tested by a few people. These patches have retained the tags as they have hardly been touched. The latest code can also be found at: https://github.com/OpenVPN/ovpn-net-next Thanks a lot! Best Regards, Antonio Quartulli OpenVPN Inc. --- Changes in v21: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v20: https://lore.kernel.org/r/20250227-b4-ovpn-v20-0-93f363310834@openvpn.net --- Antonio Quartulli (23): net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on for MP interfaces ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ovpn: implement TCP transport skb: implement skb_send_sock_locked_with_flags() ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local or remote UDP endpoint ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module Documentation/netlink/specs/ovpn.yaml | 367 +++ Documentation/netlink/specs/rt_link.yaml | 16 + MAINTAINERS | 11 + drivers/net/Kconfig | 15 + drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c | 55 + drivers/net/ovpn/bind.h | 101 + drivers/net/ovpn/crypto.c | 211 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 409 ++++ drivers/net/ovpn/crypto_aead.h | 29 + drivers/net/ovpn/io.c | 462 ++++ drivers/net/ovpn/io.h | 34 + drivers/net/ovpn/main.c | 339 +++ drivers/net/ovpn/main.h | 14 + drivers/net/ovpn/netlink-gen.c | 213 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1249 ++++++++++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnpriv.h | 57 + drivers/net/ovpn/peer.c | 1367 +++++++++++ drivers/net/ovpn/peer.h | 163 ++ drivers/net/ovpn/pktid.c | 129 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 118 + drivers/net/ovpn/skb.h | 61 + drivers/net/ovpn/socket.c | 244 ++ drivers/net/ovpn/socket.h | 49 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 592 +++++ drivers/net/ovpn/tcp.h | 36 + drivers/net/ovpn/udp.c | 442 ++++ drivers/net/ovpn/udp.h | 25 + include/linux/skbuff.h | 2 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 109 + include/uapi/linux/udp.h | 1 + net/core/skbuff.c | 18 +- net/ipv6/af_inet6.c | 1 + net/ipv6/udp.c | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 31 + tools/testing/selftests/net/ovpn/common.sh | 92 + tools/testing/selftests/net/ovpn/config | 10 + tools/testing/selftests/net/ovpn/data64.key | 5 + tools/testing/selftests/net/ovpn/ovpn-cli.c | 2395 ++++++++++++++++++++ tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + .../selftests/net/ovpn/test-close-socket-tcp.sh | 9 + .../selftests/net/ovpn/test-close-socket.sh | 45 + tools/testing/selftests/net/ovpn/test-float.sh | 9 + tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/test.sh | 113 + tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + 57 files changed, 10072 insertions(+), 5 deletions(-) --- base-commit: 40587f749df216889163dd6e02d88ad53e759e66 change-id: 20241002-b4-ovpn-eeee35c694a2 Best regards, -- Antonio Quartulli <antonio(a)openvpn.net>

10 months

2
24
0 0

[PATCH v7 00/10] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

by Nicolin Chen

Following the previous vIOMMU series, this adds another vDEVICE structure, representing the association from an iommufd_device to an iommufd_viommu. This gives the whole architecture a new "v" layer: _______________________________________________________________________ | iommufd (with vIOMMU/vDEVICE) | | _____________ _____________ | | | | | | | | |----------------| vIOMMU |<---| vDEVICE |<------| | | | | | |_____________| | | | | ______ | | _____________ ___|____ | | | | | | | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | |______|________|______________|__________________|_______________|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| |____________| storage|____________| |____________| |______| This vDEVICE object is used to collect and store all vIOMMU-related device information/attributes in a VM. As an initial series for vDEVICE, add only the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM: e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID of the device against the physical IOMMU instance. This is essential for a vIOMMU-based invalidation, where the request contains a device's vID for a device cache flush, e.g. ATC invalidation. Therefore, with this vDEVICE object, support a vIOMMU-based invalidation, by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache with a given driver data. As for the implementation of the series, add driver support in ARM SMMUv3 for a real world use case. This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v7 (QEMU branch for testing will be provided in Jason's nesting series) Changelog v7 * Added "Reviewed-by" from Jason * Corrected a line of comments in iommufd_vdevice_destroy() v6 https://lore.kernel.org/all/cover.1730313494.git.nicolinc@nvidia.com/ * Fixed kdoc in the uAPI header * Fixed indentations in iommufd.rst * Replaced vdev->idev with vdev->dev * Added "Reviewed-by" from Kevin and Jason * Updated kdoc of struct iommu_vdevice_alloc * Fixed lockdep function call in iommufd_viommu_find_dev * Added missing iommu_dev validation between viommu and idev * Skipped SMMUv3 driver changes (to post in a separate series) * Replaced !cache_invalidate_user in WARN_ON of the allocation path with cache_invalidate_user validation in iommufd_hwpt_invalidate v5 https://lore.kernel.org/all/cover.1729897278.git.nicolinc@nvidia.com/ * Dropped driver-allocated vDEVICE support * Changed vdev_to_dev helper to iommufd_viommu_find_dev v4 https://lore.kernel.org/all/cover.1729555967.git.nicolinc@nvidia.com/ * Added missing brackets in switch-case * Fixed the unreleased idev refcount issue * Reworked the iommufd_vdevice_alloc allocator * Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT * Added missing TEST_LENGTH and fail_nth coverages * Added a verification to the driver-allocated vDEVICE object * Added an iommufd_vdevice_abort for a missing mutex protection * Added a u64 structure arm_vsmmu_invalidation_cmd for user command conversion v3 https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/ * Added Jason's Reviewed-by * Split this invalidation part out of the part-1 series * Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl * Reduced viommu_api functions by allowing drivers to access viommu and vdevice structure directly * Dropped vdevs_rwsem by using xa_lock instead * Dropped arm_smmu_cache_invalidate_user v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/ * Limited vdev_id to one per idev * Added a rw_sem to protect the vdev_id list * Reworked driver-level APIs with proper lockings * Added a new viommu_api file for IOMMUFD_DRIVER config * Dropped useless iommu_dev point from the viommu structure * Added missing index numnbers to new types in the uAPI header * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one * Reworked mock_viommu_cache_invalidate() using the new iommu helper * Reordered details of set/unset_vdev_id handlers for proper lockings v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/ Thanks! Nicolin Jason Gunthorpe (1): iommu: Add iommu_copy_struct_from_full_user_array helper Nicolin Chen (9): iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage iommu/viommu: Add cache_invalidate to iommufd_viommu_ops iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE iommufd/viommu: Add iommufd_viommu_find_dev helper iommufd/selftest: Add mock_viommu_cache_invalidate iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl Documentation: userspace-api: iommufd: Update vDEVICE drivers/iommu/iommufd/iommufd_private.h | 18 ++ drivers/iommu/iommufd/iommufd_test.h | 30 +++ include/linux/iommu.h | 48 ++++- include/linux/iommufd.h | 22 ++ include/uapi/linux/iommufd.h | 31 ++- tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++ drivers/iommu/iommufd/driver.c | 13 ++ drivers/iommu/iommufd/hw_pagetable.c | 40 +++- drivers/iommu/iommufd/main.c | 6 + drivers/iommu/iommufd/selftest.c | 98 ++++++++- drivers/iommu/iommufd/viommu.c | 76 +++++++ tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 4 + Documentation/userspace-api/iommufd.rst | 41 +++- 14 files changed, 688 insertions(+), 26 deletions(-) base-commit: 0780dd4af09a5360392f5c376c35ffc2599a9c0e -- 2.43.0

10 months

4
14
0 0

[PATCHv5 net 0/3] bond: fix xfrm offload issues

by Hangbin Liu

The first patch fixes the incorrect locks using in bond driver. The second patch fixes the xfrm offload feature during setup active-backup mode. The third patch add a ipsec offload testing. v5: use list_for_each_entry_safe() when del item in list (Nikolay Aleksandrov) do not call spin_lock_bh in sleep function xdo_dev_state_free (Nikolay Aleksandrov) set xso.real_dev = NULL to avoid __xfrm_state_delete() is called in parallel() (Cosmin Ratiu) remove spin lock in bond_ipsec_add_sa_all() as it doesn't resolve the race condition. v4: hold xs->lock for bond_ipsec_{del, add}_sa_all (Cosmin Ratiu) use the defer helpers in lib.sh for selftest (Petr Machata) v3: move the ipsec deletion to bond_ipsec_free_sa (Cosmin Ratiu) v2: do not turn carrier on if bond change link failed (Nikolay Aleksandrov) move the mutex lock to a work queue (Cosmin Ratiu) Hangbin Liu (3): bonding: fix calling sleeping function in spin lock and some race conditions bonding: fix xfrm offload feature setup on active-backup mode selftests: bonding: add ipsec offload test drivers/net/bonding/bond_main.c | 71 +++++--- drivers/net/bonding/bond_netlink.c | 16 +- include/net/bonding.h | 1 + .../selftests/drivers/net/bonding/Makefile | 3 +- .../drivers/net/bonding/bond_ipsec_offload.sh | 154 ++++++++++++++++++ .../selftests/drivers/net/bonding/config | 4 + 6 files changed, 222 insertions(+), 27 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh -- 2.46.0

10 months

5
13
0 0

[PATCH] selftests/mm/cow: Fix the incorrect error handling

by Cyan Yang

There are two error handlings did not check the correct return value. This patch will fix them. Fixes: f4b5fd6946e244cdedc3bbb9a1f24c8133b2077a ("selftests/vm: anon_cow: THP tests") Signed-off-by: Cyan Yang <cyan.yang(a)sifive.com> --- tools/testing/selftests/mm/cow.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 9446673645eb..16fcadc090a4 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -876,13 +876,13 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run, size_t thpsize) mremap_size = thpsize / 2; mremap_mem = mmap(NULL, mremap_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (mem == MAP_FAILED) { + if (mremap_mem == MAP_FAILED) { ksft_test_result_fail("mmap() failed\n"); goto munmap; } tmp = mremap(mem + mremap_size, mremap_size, mremap_size, MREMAP_MAYMOVE | MREMAP_FIXED, mremap_mem); - if (tmp != mremap_mem) { + if (tmp == MAP_FAILED) { ksft_test_result_fail("mremap() failed\n"); goto munmap; } -- 2.39.5 (Apple Git-154)

10 months

3
4
0 0

[PATCH] selftest/powerpc/mm/pkey: fix build-break introduced by commit 00894c3fc917

by Madhavan Srinivasan

Build break was reported in the powerpc mailing list for next-20250218 with below errors make[1]: Nothing to be done for 'all'. BUILD_TARGET=/root/venkat/linux-next/tools/testing/selftests/powerpc/mm; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C mm all CC pkey_exec_prot In file included from pkey_exec_prot.c:18: /root/venkat/linux-next/tools/testing/selftests/powerpc/include/pkeys.h: In function ‘pkeys_unsupported’: /root/venkat/linux-next/tools/testing/selftests/powerpc/include/pkeys.h:96:34: error: ‘PKEY_UNRESTRICTED’ undeclared (first use in this function) 96 | pkey = sys_pkey_alloc(0, PKEY_UNRESTRICTED); | ^~~~~~~~~~~~~~~~~ https://lore.kernel.org/all/20250113170619.484698-2-yury.khrustalev@arm.com/ patchset has been queued to arm64/for-next/pkey_unrestricted which is causing a build break in the selftest/powerpc builds. Commit 6d61527d931ba ("mm/pkey: Add PKEY_UNRESTRICTED macro") added a macro PKEY_UNRESTRICTED to handle implicit literal value of 0x0 (which is "unrestricted"). Add the same to selftest/powerpc/pkeys.h to fix the reported build break. Reported-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Closes: https://lore.kernel.org/lkml/3267ea6e-5a1a-4752-96ef-8351c912d386@linux.ibm… Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy(a)linux.ibm.com> --- Catalin, can you take this fix via arm64/for-next/pkey_unrestricted? Patch applies clean on top of arm64/for-next/pkey_unrestricted tools/testing/selftests/powerpc/include/pkeys.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/powerpc/include/pkeys.h b/tools/testing/selftests/powerpc/include/pkeys.h index c6d4063dd4f6..d6deb6ffa1b9 100644 --- a/tools/testing/selftests/powerpc/include/pkeys.h +++ b/tools/testing/selftests/powerpc/include/pkeys.h @@ -24,6 +24,9 @@ #undef PKEY_DISABLE_EXECUTE #define PKEY_DISABLE_EXECUTE 0x4 +#undef PKEY_UNRESTRICTED +#define PKEY_UNRESTRICTED 0x0 + /* Older versions of libc do not define this */ #ifndef SEGV_PKUERR #define SEGV_PKUERR 4 -- 2.48.1

10 months

2
2
0 0

[PATCH 6.1.y] selftests/vm: fix undefined reference of the `default_huge_page_size`

by Andrey Kalachev

The commit a584c7734a4d ("selftests: mm: fix map_hugetlb failure on 64K page size systems") backported the fix from v6.8 to stable v6.1. The patch uses default_huge_page_size() function, which definition moved into vm_util.[ch] by commit bd4d67e76f699 ("selftests/mm: merge default_huge_page_size() into one") merged to upsream since v6.4. However, in v6.1 common definition/declaration for the default_huge_page_size() we doesn't have, the following build error is seen: map_hugetlb.c:79:25: warning: implicit declaration of function ‘default_huge_page_size’ [-Wimplicit-function-declaration] 79 | hugepage_size = default_huge_page_size(); | ^~~~~~~~~~~~~~~~~~~~~~ /usr/bin/ld: /tmp/ccx95BZz.o: in function `main': map_hugetlb.c:(.text+0x104): undefined reference to `default_huge_page_size' Place default_huge_page_size() function body into map_hugetlb.c to fix this issue. Fixes: a584c7734a4d ("selftests: mm: fix map_hugetlb failure on 64K page size systems") Signed-off-by: Andrey Kalachev <kalachev(a)swemel.ru> --- tools/testing/selftests/vm/map_hugetlb.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/tools/testing/selftests/vm/map_hugetlb.c b/tools/testing/selftests/vm/map_hugetlb.c index c65c55b7a789..5826c50b6736 100644 --- a/tools/testing/selftests/vm/map_hugetlb.c +++ b/tools/testing/selftests/vm/map_hugetlb.c @@ -67,6 +67,30 @@ static int read_bytes(char *addr, size_t length) return 0; } +/* + * default_huge_page_size copied from mlock2-tests.c + */ +unsigned long default_huge_page_size(void) +{ + unsigned long hps = 0; + char *line = NULL; + size_t linelen = 0; + FILE *f = fopen("/proc/meminfo", "r"); + + if (!f) + return 0; + while (getline(&line, &linelen, f) > 0) { + if (sscanf(line, "Hugepagesize: %lu kB", &hps) == 1) { + hps <<= 10; + break; + } + } + + free(line); + fclose(f); + return hps; +} + int main(int argc, char **argv) { void *addr; -- 2.39.5

10 months

1
0
0 0

[PATCH 00/11] selftests: ublk: bug fixes & consolidation

by Ming Lei

Hello Jens and guys, This patchset fixes several issues(1, 2, 4) and consolidate & improve the tests in the following ways: - support shellcheck and fixes all warning - misc cleanup - improve cleanup code path(module load/unload, cleanup temp files) - help to reuse the same test source code and scripts for other projects(liburing[1], blktest, ...) - add two stress tests for covering IO workloads vs. removing device & killing ublk server, given buffer lifetime is one big thing for ublk-zc [1] https://github.com/ming1/liburing/commits/ublk-zc - just need one line change for overriding skip_code, libring uses 77 and kselftests takes 4 Ming Lei (11): selftests: ublk: make ublk_stop_io_daemon() more reliable selftests: ublk: fix build failure selftests: ublk: add --foreground command line selftests: ublk: fix parsing '-a' argument selftests: ublk: support shellcheck and fix all warning selftests: ublk: don't pass ${dev_id} to _cleanup_test() selftests: ublk: move zero copy feature check into _add_ublk_dev() selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests selftests: ublk: add one stress test for covering IO vs. removing device selftests: ublk: add stress test for covering IO vs. killing ublk server selftests: ublk: improve test usability tools/testing/selftests/ublk/Makefile | 6 + tools/testing/selftests/ublk/kublk.c | 43 +++-- tools/testing/selftests/ublk/kublk.h | 2 + tools/testing/selftests/ublk/test_common.sh | 167 ++++++++++++++---- tools/testing/selftests/ublk/test_loop_01.sh | 13 +- tools/testing/selftests/ublk/test_loop_02.sh | 14 +- tools/testing/selftests/ublk/test_loop_03.sh | 16 +- tools/testing/selftests/ublk/test_loop_04.sh | 14 +- tools/testing/selftests/ublk/test_null_01.sh | 9 +- .../testing/selftests/ublk/test_stress_01.sh | 47 +++++ .../testing/selftests/ublk/test_stress_02.sh | 47 +++++ 11 files changed, 300 insertions(+), 78 deletions(-) create mode 100755 tools/testing/selftests/ublk/test_stress_01.sh create mode 100755 tools/testing/selftests/ublk/test_stress_02.sh -- 2.47.0

10 months, 1 week

3
20
0 0

[PATCH v4 00/12] selftests/mm: Some cleanups from trying to run them

by Brendan Jackman

I never had much luck running mm selftests so I spent a few hours digging into why. Looks like most of the reason is missing SKIP checks, so this series is just adding a bunch of those that I found. I did not do anything like all of them, just the ones I spotted in gup_longterm, gup_test, mmap, userfaultfd and memfd_secret. It's a bit unfortunate to have to skip those tests when ftruncate() fails, but I don't have time to dig deep enough into it to actually make them pass. I have observed the issue on 9pfs and heard rumours that NFS has a similar problem. I'm now able to run these test groups successfully: - mmap - gup_test - compaction - migration - page_frag - userfaultfd - mlock I've never gone past "Waiting for hugetlb memory to get depleted", in the hugetlb tests. I don't know if they are stuck or if they would eventually work if I was patient enough (testing on a 1G machine). I have not investigated further. I had some issues with mlock tests failing due to -ENOSRCH from mlock2(), I can no longer reproduce that though, things work OK now. Of the remaining tests there may be others that work fine, but there's no convenient way to survey the whole output of run_vmtests.sh so I'm just going test by test here. In my spare moments I am slowly chipping away at a setup to run these tests continuously in a reasonably hermetic QEMU environment via virtme-ng: https://github.com/bjackman/linux/blob/5fad4b9c592290f38e0f8bc73c9abb9c99d8… Hopefully that will eventually offer a way to provide a "canned" environment where the tests are known to work, which can be fairly easily reproduced by any developer. Signed-off-by: Brendan Jackman <jackmanb(a)google.com> --- Changes in v4: - NOT ADDRESSED: still using errno==ENOENT as a hacky way to detect buggy filesystems: https://lore.kernel.org/all/CA+i-1C3srkh44tN8dMQ5aD-jhoksUkdEpa+mMfdDtDrPAU… - Added some incomplete cleanups for the mlock tests. - Fixed divide-by-zero error when running uffd-stress on <32cpu systems. - Fixed misnamed nr_threads variable (now nr_parallel). - Fixed reporting io_uring errors (retval instead of errno). - Link to v3: https://lore.kernel.org/r/20250228-mm-selftests-v3-0-958e3b6f0203@google.com Changes in v3: - Added fix for userfaultfd tests. - Dropped attempts to use sudo. - Fixed garbage printf in uffd-stress. (Added EXTRA_CFLAGS=-Werror FORCE_TARGETS=1 to my scripts to prevent such errors happening again). - Fixed missing newlines in ksft_test_result_skip() calls. - Link to v2: https://lore.kernel.org/r/20250221-mm-selftests-v2-0-28c4d66383c5@google.com Changes in v2 (Thanks to Dev for the reviews): - Improve and cleanup some error messages - Add some extra SKIPs - Fix misnaming of nr_cpus variable in uffd tests - Link to v1: https://lore.kernel.org/r/20250220-mm-selftests-v1-0-9bbf57d64463@google.com --- Brendan Jackman (12): selftests/mm: Report errno when things fail in gup_longterm selftests/mm: Skip uffd-stress if userfaultfd not available selftests/mm: Skip uffd-wp-mremap if userfaultfd not available selftests/mm/uffd: Rename nr_cpus -> nr_parallel selftests/mm: Print some details when uffd-stress gets bad params selftests/mm: Don't fail uffd-stress if too many CPUs selftests/mm: Skip map_populate on weird filesystems selftests/mm: Skip gup_longterm tests on weird filesystems selftests/mm: Drop unnecessary sudo usage selftests/mm: Ensure uffd-wp-mremap gets pages of each size selftests/mm: Skip mlock tests if nobody user can't read it selftests/mm/mlock: Print error on failure tools/testing/selftests/mm/gup_longterm.c | 45 +++++++++++++++++--------- tools/testing/selftests/mm/map_populate.c | 7 ++++ tools/testing/selftests/mm/mlock-random-test.c | 4 +-- tools/testing/selftests/mm/mlock2.h | 8 ++++- tools/testing/selftests/mm/run_vmtests.sh | 27 ++++++++++++++-- tools/testing/selftests/mm/uffd-common.c | 8 ++--- tools/testing/selftests/mm/uffd-common.h | 2 +- tools/testing/selftests/mm/uffd-stress.c | 42 +++++++++++++++--------- tools/testing/selftests/mm/uffd-unit-tests.c | 2 +- tools/testing/selftests/mm/uffd-wp-mremap.c | 5 ++- 10 files changed, 105 insertions(+), 45 deletions(-) --- base-commit: dcb38e6757f1b7944af9347ce6b54263d3666478 change-id: 20250220-mm-selftests-2d7d0542face Best regards, -- Brendan Jackman <jackmanb(a)google.com>

10 months, 1 week

1
12
0 0

[PATCHv4 net 0/2] bonding: fix incorrect mac address setting

by Hangbin Liu

The mac address on backup slave should be convert from Solicited-Node Multicast address, not from bonding unicast target address. v4: no change, just repost. v3: also fix the mac setting for slave_set_ns_maddr. (Jay) Add function description for slave_set_ns_maddr/slave_set_ns_maddrs (Jay) v2: fix patch 01's subject Hangbin Liu (2): bonding: fix incorrect MAC address setting to receive NS messages selftests: bonding: fix incorrect mac address drivers/net/bonding/bond_options.c | 55 ++++++++++++++++--- .../drivers/net/bonding/bond_options.sh | 4 +- 2 files changed, 49 insertions(+), 10 deletions(-) -- 2.46.0

10 months, 1 week

3
5
0 0

[PATCH v7 bpf-next 0/2] security: Propagate caller information in bpf hooks

by Blaise Boscaccy

Hello, While trying to implement an eBPF gatekeeper program, we ran into an issue whereas the LSM hooks are missing some relevant data. Certain subcommands passed to the bpf() syscall can be invoked from either the kernel or userspace. Additionally, some fields in the bpf_attr struct contain pointers, and depending on where the subcommand was invoked, they could point to either user or kernel memory. One example of this is the bpf_prog_load subcommand and its fd_array. This data is made available and used by the verifier but not made available to the LSM subsystem. This patchset simply exposes that information to applicable LSM hooks. Change list: - v6 -> v7 - use gettid/pid in lieu of getpid/tgid in test condition - v5 -> v6 - fix regression caused by is_kernel renaming - simplify test logic - v4 -> v5 - merge v4 selftest breakout patch back into a single patch - change "is_kernel" to "kernel" - add selftest using new kernel flag - v3 -> v4 - split out selftest changes into a separate patch - v2 -> v3 - reorder params so that the new boolean flag is the last param - fixup function signatures in bpf selftests - v1 -> v2 - Pass a boolean flag in lieu of bpfptr_t Revisions: - v6 https://lore.kernel.org/bpf/20250308013314.719150-1-bboscaccy@linux.microso… - v5 https://lore.kernel.org/bpf/20250307213651.3065714-1-bboscaccy@linux.micros… - v4 https://lore.kernel.org/bpf/20250304203123.3935371-1-bboscaccy@linux.micros… - v3 https://lore.kernel.org/bpf/20250303222416.3909228-1-bboscaccy@linux.micros… - v2 https://lore.kernel.org/bpf/20250228165322.3121535-1-bboscaccy@linux.micros… - v1 https://lore.kernel.org/bpf/20250226003055.1654837-1-bboscaccy@linux.micros… Blaise Boscaccy (2): security: Propagate caller information in bpf hooks selftests/bpf: Add a kernel flag test for LSM bpf hook include/linux/lsm_hook_defs.h | 6 +-- include/linux/security.h | 12 +++--- kernel/bpf/syscall.c | 10 ++--- security/security.c | 15 ++++--- security/selinux/hooks.c | 6 +-- .../selftests/bpf/prog_tests/kernel_flag.c | 43 +++++++++++++++++++ .../selftests/bpf/progs/rcu_read_lock.c | 3 +- .../bpf/progs/test_cgroup1_hierarchy.c | 4 +- .../selftests/bpf/progs/test_kernel_flag.c | 28 ++++++++++++ .../bpf/progs/test_kfunc_dynptr_param.c | 6 +-- .../selftests/bpf/progs/test_lookup_key.c | 2 +- .../selftests/bpf/progs/test_ptr_untrusted.c | 2 +- .../bpf/progs/test_task_under_cgroup.c | 2 +- .../bpf/progs/test_verify_pkcs7_sig.c | 2 +- 14 files changed, 108 insertions(+), 33 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/kernel_flag.c create mode 100644 tools/testing/selftests/bpf/progs/test_kernel_flag.c -- 2.48.1

10 months, 1 week

4
5
0 0

[PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID

by Alexander Mikhalitsyn

Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: netdev(a)vger.kernel.org Cc: cgroups(a)vger.kernel.org Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Eric Dumazet <edumazet(a)google.com> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Paolo Abeni <pabeni(a)redhat.com> Cc: Willem de Bruijn <willemb(a)google.com> Cc: Leon Romanovsky <leon(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Cc: Lennart Poettering <mzxreary(a)0pointer.de> Cc: Luca Boccassi <bluca(a)debian.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: "Michal Koutný" <mkoutny(a)suse.com> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn(a)canonical.com> --- tools/testing/selftests/cgroup/Makefile | 2 + .../selftests/cgroup/test_so_peercgroupid.c | 308 ++++++++++++++++++ 2 files changed, 310 insertions(+) create mode 100644 tools/testing/selftests/cgroup/test_so_peercgroupid.c diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile index 1b897152bab6..a932ff068081 100644 --- a/tools/testing/selftests/cgroup/Makefile +++ b/tools/testing/selftests/cgroup/Makefile @@ -16,6 +16,7 @@ TEST_GEN_PROGS += test_kill TEST_GEN_PROGS += test_kmem TEST_GEN_PROGS += test_memcontrol TEST_GEN_PROGS += test_pids +TEST_GEN_PROGS += test_so_peercgroupid TEST_GEN_PROGS += test_zswap LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h @@ -31,4 +32,5 @@ $(OUTPUT)/test_kill: cgroup_util.c $(OUTPUT)/test_kmem: cgroup_util.c $(OUTPUT)/test_memcontrol: cgroup_util.c $(OUTPUT)/test_pids: cgroup_util.c +$(OUTPUT)/test_so_peercgroupid: cgroup_util.c $(OUTPUT)/test_zswap: cgroup_util.c diff --git a/tools/testing/selftests/cgroup/test_so_peercgroupid.c b/tools/testing/selftests/cgroup/test_so_peercgroupid.c new file mode 100644 index 000000000000..2bf1f00a45c7 --- /dev/null +++ b/tools/testing/selftests/cgroup/test_so_peercgroupid.c @@ -0,0 +1,308 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +#define _GNU_SOURCE +#include <error.h> +#include <inttypes.h> +#include <limits.h> +#include <stddef.h> +#include <stdio.h> +#include <stdlib.h> +#include <sys/socket.h> +#include <linux/socket.h> +#include <unistd.h> +#include <string.h> +#include <errno.h> +#include <sys/un.h> +#include <sys/signal.h> +#include <sys/types.h> +#include <sys/wait.h> + +#include "../kselftest_harness.h" +#include "cgroup_util.h" + +#define clean_errno() (errno == 0 ? "None" : strerror(errno)) +#define log_err(MSG, ...) \ + fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", __FILE__, __LINE__, \ + clean_errno(), ##__VA_ARGS__) + +#ifndef SO_PEERCGROUPID +#define SO_PEERCGROUPID 83 +#endif + +static void child_die() +{ + exit(1); +} + +struct sock_addr { + char sock_name[32]; + struct sockaddr_un listen_addr; + socklen_t addrlen; +}; + +FIXTURE(so_peercgroupid) +{ + int server; + pid_t client_pid; + int sync_sk[2]; + struct sock_addr server_addr; + struct sock_addr *client_addr; + char cgroup_root[PATH_MAX]; + char *test_cgroup1; + char *test_cgroup2; +}; + +FIXTURE_VARIANT(so_peercgroupid) +{ + int type; + bool abstract; +}; + +FIXTURE_VARIANT_ADD(so_peercgroupid, stream_pathname) +{ + .type = SOCK_STREAM, + .abstract = 0, +}; + +FIXTURE_VARIANT_ADD(so_peercgroupid, stream_abstract) +{ + .type = SOCK_STREAM, + .abstract = 1, +}; + +FIXTURE_VARIANT_ADD(so_peercgroupid, seqpacket_pathname) +{ + .type = SOCK_SEQPACKET, + .abstract = 0, +}; + +FIXTURE_VARIANT_ADD(so_peercgroupid, seqpacket_abstract) +{ + .type = SOCK_SEQPACKET, + .abstract = 1, +}; + +FIXTURE_VARIANT_ADD(so_peercgroupid, dgram_pathname) +{ + .type = SOCK_DGRAM, + .abstract = 0, +}; + +FIXTURE_VARIANT_ADD(so_peercgroupid, dgram_abstract) +{ + .type = SOCK_DGRAM, + .abstract = 1, +}; + +FIXTURE_SETUP(so_peercgroupid) +{ + self->client_addr = mmap(NULL, sizeof(*self->client_addr), PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, -1, 0); + ASSERT_NE(MAP_FAILED, self->client_addr); + + self->cgroup_root[0] = '\0'; +} + +FIXTURE_TEARDOWN(so_peercgroupid) +{ + close(self->server); + + kill(self->client_pid, SIGKILL); + waitpid(self->client_pid, NULL, 0); + + if (!variant->abstract) { + unlink(self->server_addr.sock_name); + unlink(self->client_addr->sock_name); + } + + if (strlen(self->cgroup_root) > 0) { + cg_enter_current(self->cgroup_root); + + if (self->test_cgroup1) + cg_destroy(self->test_cgroup1); + free(self->test_cgroup1); + + if (self->test_cgroup2) + cg_destroy(self->test_cgroup2); + free(self->test_cgroup2); + } +} + +static void fill_sockaddr(struct sock_addr *addr, bool abstract) +{ + char *sun_path_buf = (char *)&addr->listen_addr.sun_path; + + addr->listen_addr.sun_family = AF_UNIX; + addr->addrlen = offsetof(struct sockaddr_un, sun_path); + snprintf(addr->sock_name, sizeof(addr->sock_name), "so_peercgroupid_%d", getpid()); + addr->addrlen += strlen(addr->sock_name); + if (abstract) { + *sun_path_buf = '\0'; + addr->addrlen++; + sun_path_buf++; + } else { + unlink(addr->sock_name); + } + memcpy(sun_path_buf, addr->sock_name, strlen(addr->sock_name)); +} + +static void client(FIXTURE_DATA(so_peercgroupid) *self, + const FIXTURE_VARIANT(so_peercgroupid) *variant) +{ + int cfd, err; + socklen_t len; + uint64_t peer_cgroup_id = 0, test_cgroup1_id = 0, test_cgroup2_id = 0; + char state; + + cfd = socket(AF_UNIX, variant->type, 0); + if (cfd < 0) { + log_err("socket"); + child_die(); + } + + if (variant->type == SOCK_DGRAM) { + fill_sockaddr(self->client_addr, variant->abstract); + + if (bind(cfd, (struct sockaddr *)&self->client_addr->listen_addr, self->client_addr->addrlen)) { + log_err("bind"); + child_die(); + } + } + + /* negative testcase: no peer for socket yet */ + len = sizeof(peer_cgroup_id); + err = getsockopt(cfd, SOL_SOCKET, SO_PEERCGROUPID, &peer_cgroup_id, &len); + if (!err || (errno != ENODATA)) { + log_err("getsockopt must fail with errno == ENODATA when socket has no peer"); + child_die(); + } + + if (connect(cfd, (struct sockaddr *)&self->server_addr.listen_addr, + self->server_addr.addrlen) != 0) { + log_err("connect"); + child_die(); + } + + state = 'R'; + write(self->sync_sk[1], &state, sizeof(state)); + + read(self->sync_sk[1], &test_cgroup1_id, sizeof(uint64_t)); + read(self->sync_sk[1], &test_cgroup2_id, sizeof(uint64_t)); + + len = sizeof(peer_cgroup_id); + if (getsockopt(cfd, SOL_SOCKET, SO_PEERCGROUPID, &peer_cgroup_id, &len)) { + log_err("Failed to get SO_PEERCGROUPID"); + child_die(); + } + + /* + * There is a difference between connection-oriented sockets + * and connectionless ones from the perspective of SO_PEERCGROUPID. + * + * sk->sk_cgrp_data is getting filled when we allocate struct sock (see call to cgroup_sk_alloc()). + * For DGRAM socket, self->server socket is our peer and by the time when we allocate it, + * parent process sits in a test_cgroup1. Then it changes cgroup to test_cgroup2, but it does not + * affect anything. + * For STREAM/SEQPACKET socket, self->server is not our peer, but that one we get from accept() + * syscall. And by the time when we call accept(), parent process sits in test_cgroup2. + * + * Let's ensure that it works like that and if it get changed then we should detect it + * as it's a clear UAPI change. + */ + if (variant->type == SOCK_DGRAM) { + /* cgroup id from SO_PEERCGROUPID should be equal to the test_cgroup1_id */ + if (peer_cgroup_id != test_cgroup1_id) { + log_err("peer_cgroup_id != test_cgroup1_id: %" PRId64 " != %" PRId64, peer_cgroup_id, test_cgroup1_id); + child_die(); + } + } else { + /* cgroup id from SO_PEERCGROUPID should be equal to the test_cgroup2_id */ + if (peer_cgroup_id != test_cgroup2_id) { + log_err("peer_cgroup_id != test_cgroup2_id: %" PRId64 " != %" PRId64, peer_cgroup_id, test_cgroup2_id); + child_die(); + } + } +} + +TEST_F(so_peercgroupid, test) +{ + uint64_t test_cgroup1_id, test_cgroup2_id; + int err; + int pfd; + char state; + int child_status = 0; + + if (cg_find_unified_root(self->cgroup_root, sizeof(self->cgroup_root), NULL)) + ksft_exit_skip("cgroup v2 isn't mounted\n"); + + self->test_cgroup1 = cg_name(self->cgroup_root, "so_peercgroupid_cg1"); + ASSERT_NE(NULL, self->test_cgroup1); + + self->test_cgroup2 = cg_name(self->cgroup_root, "so_peercgroupid_cg2"); + ASSERT_NE(NULL, self->test_cgroup2); + + err = cg_create(self->test_cgroup1); + ASSERT_EQ(0, err); + + err = cg_create(self->test_cgroup2); + ASSERT_EQ(0, err); + + test_cgroup1_id = cg_get_id(self->test_cgroup1); + ASSERT_LT(0, test_cgroup1_id); + + test_cgroup2_id = cg_get_id(self->test_cgroup2); + ASSERT_LT(0, test_cgroup2_id); + + /* enter test_cgroup1 before allocating a socket */ + err = cg_enter_current(self->test_cgroup1); + ASSERT_EQ(0, err); + + self->server = socket(AF_UNIX, variant->type, 0); + ASSERT_NE(-1, self->server); + + /* enter test_cgroup2 after allocating a socket */ + err = cg_enter_current(self->test_cgroup2); + ASSERT_EQ(0, err); + + fill_sockaddr(&self->server_addr, variant->abstract); + + err = bind(self->server, (struct sockaddr *)&self->server_addr.listen_addr, self->server_addr.addrlen); + ASSERT_EQ(0, err); + + if (variant->type != SOCK_DGRAM) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + } + + err = socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, self->sync_sk); + EXPECT_EQ(err, 0); + + self->client_pid = fork(); + ASSERT_NE(-1, self->client_pid); + if (self->client_pid == 0) { + close(self->server); + close(self->sync_sk[0]); + client(self, variant); + exit(0); + } + close(self->sync_sk[1]); + + if (variant->type != SOCK_DGRAM) { + pfd = accept(self->server, NULL, NULL); + ASSERT_NE(-1, pfd); + } else { + pfd = self->server; + } + + /* wait until the child arrives at checkpoint */ + read(self->sync_sk[0], &state, sizeof(state)); + ASSERT_EQ(state, 'R'); + + write(self->sync_sk[0], &test_cgroup1_id, sizeof(uint64_t)); + write(self->sync_sk[0], &test_cgroup2_id, sizeof(uint64_t)); + + close(pfd); + waitpid(self->client_pid, &child_status, 0); + ASSERT_EQ(0, WIFEXITED(child_status) ? WEXITSTATUS(child_status) : 1); +} + +TEST_HARNESS_MAIN -- 2.43.0

10 months, 1 week

3
2
0 0

[PATCH v12 3/3] selftests/rseq: Add test for mm_cid compaction

by Gabriele Monaco

A task in the kernel (task_mm_cid_work) runs somewhat periodically to compact the mm_cid for each process. Add a test to validate that it runs correctly and timely. The test spawns 1 thread pinned to each CPU, then each thread, including the main one, runs in short bursts for some time. During this period, the mm_cids should be spanning all numbers between 0 and nproc. At the end of this phase, a thread with high enough mm_cid (>= nproc/2) is selected to be the new leader, all other threads terminate. After some time, the only running thread should see 0 as mm_cid, if that doesn't happen, the compaction mechanism didn't work and the test fails. The test never fails if only 1 core is available, in which case, we cannot test anything as the only available mm_cid is 0. Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com> --- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++ 3 files changed, 202 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore index 16496de5f6ce4..2c89f97e4f737 100644 --- a/tools/testing/selftests/rseq/.gitignore +++ b/tools/testing/selftests/rseq/.gitignore @@ -3,6 +3,7 @@ basic_percpu_ops_test basic_percpu_ops_mm_cid_test basic_test basic_rseq_op_test +mm_cid_compaction_test param_test param_test_benchmark param_test_compare_twice diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile index 5a3432fceb586..ce1b38f46a355 100644 --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1 TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test TEST_GEN_PROGS_EXTENDED = librseq.so diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c new file mode 100644 index 0000000000000..7ddde3b657dd6 --- /dev/null +++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c @@ -0,0 +1,200 @@ +// SPDX-License-Identifier: LGPL-2.1 +#define _GNU_SOURCE +#include <assert.h> +#include <pthread.h> +#include <sched.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stddef.h> + +#include "../kselftest.h" +#include "rseq.h" + +#define VERBOSE 0 +#define printf_verbose(fmt, ...) \ + do { \ + if (VERBOSE) \ + printf(fmt, ##__VA_ARGS__); \ + } while (0) + +/* 0.5 s */ +#define RUNNER_PERIOD 500000 +/* Number of runs before we terminate or get the token */ +#define THREAD_RUNS 5 + +/* + * Number of times we check that the mm_cid were compacted. + * Checks are repeated every RUNNER_PERIOD. + */ +#define MM_CID_COMPACT_TIMEOUT 10 + +struct thread_args { + int cpu; + int num_cpus; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + pthread_t *tinfo; + struct thread_args *args_head; +}; + +static void __noreturn *thread_runner(void *arg) +{ + struct thread_args *args = arg; + int i, ret, curr_mm_cid; + cpu_set_t cpumask; + + CPU_ZERO(&cpumask); + CPU_SET(args->cpu, &cpumask); + ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask); + if (ret) { + errno = ret; + perror("Error: failed to set affinity"); + abort(); + } + pthread_barrier_wait(args->barrier); + + for (i = 0; i < THREAD_RUNS; i++) + usleep(RUNNER_PERIOD); + curr_mm_cid = rseq_current_mm_cid(); + /* + * We select one thread with high enough mm_cid to be the new leader. + * All other threads (including the main thread) will terminate. + * After some time, the mm_cid of the only remaining thread should + * converge to 0, if not, the test fails. + */ + if (curr_mm_cid >= args->num_cpus / 2 && + !pthread_mutex_trylock(args->token)) { + printf_verbose( + "cpu%d has mm_cid=%d and will be the new leader.\n", + sched_getcpu(), curr_mm_cid); + for (i = 0; i < args->num_cpus; i++) { + if (args->tinfo[i] == pthread_self()) + continue; + ret = pthread_join(args->tinfo[i], NULL); + if (ret) { + errno = ret; + perror("Error: failed to join thread"); + abort(); + } + } + pthread_barrier_destroy(args->barrier); + free(args->tinfo); + free(args->token); + free(args->barrier); + free(args->args_head); + + for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) { + curr_mm_cid = rseq_current_mm_cid(); + printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i, + curr_mm_cid, sched_getcpu()); + if (curr_mm_cid == 0) + exit(EXIT_SUCCESS); + usleep(RUNNER_PERIOD); + } + exit(EXIT_FAILURE); + } + printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n", + sched_getcpu(), curr_mm_cid); + pthread_exit(NULL); +} + +int test_mm_cid_compaction(void) +{ + cpu_set_t affinity; + int i, j, ret = 0, num_threads; + pthread_t *tinfo; + pthread_mutex_t *token; + pthread_barrier_t *barrier; + struct thread_args *args; + + sched_getaffinity(0, sizeof(affinity), &affinity); + num_threads = CPU_COUNT(&affinity); + tinfo = calloc(num_threads, sizeof(*tinfo)); + if (!tinfo) { + perror("Error: failed to allocate tinfo"); + return -1; + } + args = calloc(num_threads, sizeof(*args)); + if (!args) { + perror("Error: failed to allocate args"); + ret = -1; + goto out_free_tinfo; + } + token = malloc(sizeof(*token)); + if (!token) { + perror("Error: failed to allocate token"); + ret = -1; + goto out_free_args; + } + barrier = malloc(sizeof(*barrier)); + if (!barrier) { + perror("Error: failed to allocate barrier"); + ret = -1; + goto out_free_token; + } + if (num_threads == 1) { + fprintf(stderr, "Cannot test on a single cpu. " + "Skipping mm_cid_compaction test.\n"); + /* only skipping the test, this is not a failure */ + goto out_free_barrier; + } + pthread_mutex_init(token, NULL); + ret = pthread_barrier_init(barrier, NULL, num_threads); + if (ret) { + errno = ret; + perror("Error: failed to initialise barrier"); + goto out_free_barrier; + } + for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) { + if (!CPU_ISSET(i, &affinity)) + continue; + args[j].num_cpus = num_threads; + args[j].tinfo = tinfo; + args[j].token = token; + args[j].barrier = barrier; + args[j].cpu = i; + args[j].args_head = args; + if (!j) { + /* The first thread is the main one */ + tinfo[0] = pthread_self(); + ++j; + continue; + } + ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]); + if (ret) { + errno = ret; + perror("Error: failed to create thread"); + abort(); + } + ++j; + } + printf_verbose("Started %d threads.\n", num_threads); + + /* Also main thread will terminate if it is not selected as leader */ + thread_runner(&args[0]); + + /* only reached in case of errors */ +out_free_barrier: + free(barrier); +out_free_token: + free(token); +out_free_args: + free(args); +out_free_tinfo: + free(tinfo); + + return ret; +} + +int main(int argc, char **argv) +{ + if (!rseq_mm_cid_available()) { + fprintf(stderr, "Error: rseq_mm_cid unavailable\n"); + return -1; + } + if (test_mm_cid_compaction()) + return -1; + return 0; +} -- 2.48.1

10 months, 1 week

1
0
0 0

[PATCH v8 net-next 0/3] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find DUALPI2 patch v8. v8 - Fix warning messages in v7 v7 - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add dscsriptions of packet counter statistics and reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning v3 - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning For more details of DualPI2, plesae refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu Chia-Yu Chang (2): Documentation: netlink: specs: tc: Add DualPI2 specification selftests/tc-testing: Add selftests for qdisc DualPI2 Koen De Schepper (1): sched: Add dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 140 +++ include/linux/netdevice.h | 1 + include/uapi/linux/pkt_sched.h | 38 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1082 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 149 +++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1425 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

10 months, 1 week

2
4
0 0

[PATCH] selftests/bpf: Convert comma to semicolon

by Chen Ni

Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Signed-off-by: Chen Ni <nichen(a)iscas.ac.cn> --- tools/testing/selftests/bpf/prog_tests/fd_array.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/fd_array.c b/tools/testing/selftests/bpf/prog_tests/fd_array.c index a1d52e73fb16..9add890c2d37 100644 --- a/tools/testing/selftests/bpf/prog_tests/fd_array.c +++ b/tools/testing/selftests/bpf/prog_tests/fd_array.c @@ -83,8 +83,8 @@ static inline int bpf_prog_get_map_ids(int prog_fd, __u32 *nr_map_ids, __u32 *ma int err; memset(&info, 0, len); - info.nr_map_ids = *nr_map_ids, - info.map_ids = ptr_to_u64(map_ids), + info.nr_map_ids = *nr_map_ids; + info.map_ids = ptr_to_u64(map_ids); err = bpf_prog_get_info_by_fd(prog_fd, &info, &len); if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd")) -- 2.25.1

10 months, 1 week

3
2
0 0

[PATCH v21 00/24] Introducing OpenVPN Data Channel Offload

by Antonio Quartulli

Notable changes since v20: * removed newline at the end of message in NL_SET_ERR_MSG_FMT_MOD * dropped udp_init() and related build_protos() and directly use .encap_destroy [instead of overriding sk->close()] * defered peer_del() call to worker in case of transport errors, as we may be in non-sleepable context * used kfree() instead of kfree_rcu() when releasing socket as we just invoked synchronize_rcu() * fix comment in ovpn_tcp_parse() * invoked peer->tcp.sk_cb.prot->release_cb instead of explicitly calling tcp_release_cb. This way we don't need to export it again * moved call to peer_put() after call to peer->tcp.sk_cb.prot->close * moved switch(mode) inside ovpn_peers_free() * simplified skip logic in ovpn_peers_free() * fixed check to avoid passing a negative delay to keepalive's schedule_work() Please note that some patches were already reviewed/tested by a few people. These patches have retained the tags as they have hardly been touched. The latest code can also be found at: https://github.com/OpenVPN/ovpn-net-next Thanks a lot! Best Regards, Antonio Quartulli OpenVPN Inc. To: netdev(a)vger.kernel.org To: Eric Dumazet <edumazet(a)google.com> To: Jakub Kicinski <kuba(a)kernel.org> To: Paolo Abeni <pabeni(a)redhat.com> To: Donald Hunter <donald.hunter(a)gmail.com> To: Antonio Quartulli <antonio(a)openvpn.net> To: Shuah Khan <shuah(a)kernel.org> To: sd(a)queasysnail.net To: ryazanov.s.a(a)gmail.com To: Andrew Lunn <andrew+netdev(a)lunn.ch> Cc: Simon Horman <horms(a)kernel.org> Cc: linux-kernel(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: Xiao Liang <shaw.leon(a)gmail.com> Signed-off-by: Antonio Quartulli <antonio(a)openvpn.net> --- Antonio Quartulli (24): net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on for MP interfaces ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ovpn: implement TCP transport skb: implement skb_send_sock_locked_with_flags() ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local UDP endpoint ovpn: add support for peer floating ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module Documentation/netlink/specs/ovpn.yaml | 371 +++ Documentation/netlink/specs/rt_link.yaml | 16 + MAINTAINERS | 11 + drivers/net/Kconfig | 15 + drivers/net/Makefile | 1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c | 55 + drivers/net/ovpn/bind.h | 101 + drivers/net/ovpn/crypto.c | 211 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 408 ++++ drivers/net/ovpn/crypto_aead.h | 33 + drivers/net/ovpn/io.c | 462 ++++ drivers/net/ovpn/io.h | 34 + drivers/net/ovpn/main.c | 339 +++ drivers/net/ovpn/main.h | 14 + drivers/net/ovpn/netlink-gen.c | 213 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1249 ++++++++++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnpriv.h | 57 + drivers/net/ovpn/peer.c | 1352 +++++++++++ drivers/net/ovpn/peer.h | 163 ++ drivers/net/ovpn/pktid.c | 129 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 118 + drivers/net/ovpn/skb.h | 61 + drivers/net/ovpn/socket.c | 244 ++ drivers/net/ovpn/socket.h | 49 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 592 +++++ drivers/net/ovpn/tcp.h | 36 + drivers/net/ovpn/udp.c | 442 ++++ drivers/net/ovpn/udp.h | 25 + include/linux/skbuff.h | 2 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 110 + include/uapi/linux/udp.h | 1 + net/core/skbuff.c | 18 +- net/ipv6/af_inet6.c | 1 + net/ipv6/udp.c | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ovpn/.gitignore | 2 + tools/testing/selftests/net/ovpn/Makefile | 31 + tools/testing/selftests/net/ovpn/common.sh | 92 + tools/testing/selftests/net/ovpn/config | 10 + tools/testing/selftests/net/ovpn/data64.key | 5 + tools/testing/selftests/net/ovpn/ovpn-cli.c | 2395 ++++++++++++++++++++ tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 + .../testing/selftests/net/ovpn/test-chachapoly.sh | 9 + .../selftests/net/ovpn/test-close-socket-tcp.sh | 9 + .../selftests/net/ovpn/test-close-socket.sh | 45 + tools/testing/selftests/net/ovpn/test-float.sh | 9 + tools/testing/selftests/net/ovpn/test-tcp.sh | 9 + tools/testing/selftests/net/ovpn/test.sh | 113 + tools/testing/selftests/net/ovpn/udp_peers.txt | 5 + 57 files changed, 10065 insertions(+), 5 deletions(-) --- base-commit: fb05579a176f7bccc8d279665fc0e46dfed43dfb change-id: 20250304-b4-ovpn-tmp-153379e78603 Best regards, -- Antonio Quartulli <antonio(a)openvpn.net>

10 months, 1 week

2
45
0 0

[PATCH v15 00/15] PCI: EP: Add RC-to-EP doorbell with platform MSI controller

by Frank Li

┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ │ │ PCI Endpoint │ │ PCI Host │ │ │ │ │ │ │ │ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │ │ │ │ │ │ │ │ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │ │ Controller │ │ update doorbell register address│ │ │ │ │ │ for BAR │ │ │ │ │ │ │ │ 3. Write BAR<n>│ │ │◄──┼───────────────────────────────────┼───┤ │ │ │ │ │ │ │ │ ├──►│ 4.Irq Handle │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └────────────┘ └───────────────────────────────────┘ └────────────────┘ This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/ Original patch only target to vntb driver. But actually it is common method. This patches add new API to pci-epf-core, so any EP driver can use it. Previous v2 discussion here. https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/ Changes in v15: - rebase to v6.14-rc1 - fix build issue find by kernel test robot - Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com Changes in v14: Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As a result, the approach has been reverted to the v9 method. However, there are several improvements: MSI now supports msi-map in addition to msi-parent. - The struct device: id is used as the endpoint function (EPF) device identity to map to the stream ID (sideband information). - The EPC device tree source (DTS) utilizes msi-map to provide such information. - The EPF device's of_node is set to the EPC controller’s node. This approach is commonly used for multi-function device (MFD) platform child devices, allowing them to inherit properties from the MFD device’s DTS, such as reset-cells and gpio-cells. This method is well-suited for the current case, as the EPF is inherently created/binded to the EPC and should inherit the EPC’s DTS node properties. Additionally: Since the basic IMX95 LUT support has already been merged into the mainline, a DTS and driver increment patch is added to complete the solution. The patch is rebased onto the latest linux-next tree and aligned with the new pcitest framework. - Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com Changes in v13: - Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI - Change request id as func | vfunc << 3 - Remove IRQ_DOMAIN_MSI_IMMUTABLE Thomas Gleixner: I hope capture all your points in review comments. If missed, let me know. - Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com Changes in v12: - Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function irq_domain_msi_is_immuatble(). - split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches - Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com Changes in v11: - Change to use MSI_FLAG_MSG_IMMUTABLE - Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com Changes in v10: Thomas Gleixner: There are big change in pci-ep-msi.c. I am sure if go on the corrent path. The key improvement is remove only 1 function devices's limitation. I use new patch for imutable check, which relative additional feature compared to base enablement patch. - Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info - Remove only support 1 endpoint function limiation. - Create one MSI domain for each endpoint function devices. - Use "msi-map" in pci ep controler node, instead of of msi-parent. first argument is (func_no << 8 | vfunc_no) - Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com Changes in v9 - Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering - Remove API pci_epf_align_inbound_addr_lo_hi - Move doorbell_alloc in to doorbell_enable function. - Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com Changes in v8: - update helper function name to pci_epf_align_inbound_addr() - Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com Changes in v7: - Add helper function pci_epf_align_addr(); - Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com Changes in v6: - change doorbell_addr to doorbell_offset - use round_down() - add Niklas's test by tag - rebase to pci/endpoint - Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com Changes in v5: - Move request_irq to epf test function driver for more flexiable user case - Add fixed size bar handler - Some minor improvememtn to see each patches's changelog. - Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com Changes in v4: - Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs() - Use new method to avoid compatible problem. Add new command DOORBELL_ENABLE and DOORBELL_DISABLE. pcitest -B send DOORBELL_ENABLE first, EP test function driver try to remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old driver don't support new command, so failure return, not side effect. After test, DOORBELL_DISABLE command send out to recover original map, so pcitest bar test can pass as normal. - Other detail change see each patches's change log - Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com Change from v2 to v3 - Fixed manivannan's comments - Move common part to pci-ep-msi.c and pci-ep-msi.h - rebase to 6.12-rc1 - use RevID to distingiush old version mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid ^^^^^^ to enable platform msi support. ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep - use new device ID, which identify support doorbell to avoid broken compatility. Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices keep the same behavior as before. EP side RC with old driver RC with new driver PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled Other device ID doorbell disabled* doorbell disabled* * Behavior remains unchanged. Change from v1 to v2 - Add missed patch for endpont/pci-epf-test.c - Move alloc and free to epc driver from epf. - Provide general help function for EPC driver to alloc platform msi irq. - Fixed manivannan's comments. Signed-off-by: Frank Li <Frank.Li(a)nxp.com> --- Frank Li (15): platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask PCI: endpoint: Set ID and of_node for function driver PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment PCI: endpoint: pci-epf-test: Add doorbell test support misc: pci_endpoint_test: Add doorbell test case selftests: pci_endpoint: Add doorbell test case pci: imx6: Add helper function imx_pcie_add_lut_by_rid() pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode arm64: dts: imx95: Add msi-map for pci-ep device arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file arch/arm64/boot/dts/freescale/Makefile | 3 + .../dts/freescale/imx95-19x19-evk-pcie1-ep.dtso | 21 ++++ arch/arm64/boot/dts/freescale/imx95.dtsi | 1 + drivers/base/platform-msi.c | 1 + drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++ drivers/irqchip/irq-gic-v3-its.c | 2 +- drivers/misc/pci_endpoint_test.c | 81 +++++++++++++ drivers/pci/controller/dwc/pci-imx6.c | 25 ++-- drivers/pci/endpoint/Makefile | 1 + drivers/pci/endpoint/functions/pci-epf-test.c | 132 +++++++++++++++++++++ drivers/pci/endpoint/pci-ep-msi.c | 90 ++++++++++++++ drivers/pci/endpoint/pci-epf-core.c | 48 ++++++++ include/linux/irqdomain.h | 7 ++ include/linux/pci-ep-msi.h | 28 +++++ include/linux/pci-epf.h | 21 ++++ include/uapi/linux/pcitest.h | 1 + .../selftests/pci_endpoint/pci_endpoint_test.c | 25 ++++ 17 files changed, 486 insertions(+), 9 deletions(-) --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20241010-ep-msi-8b4cab33b1be Best regards, --- Frank Li <Frank.Li(a)nxp.com>

10 months, 1 week

4
28
0 0

[PATCH v3 net 0/8] eth: bnxt: fix several bugs in the bnxt module

by Taehee Yoo

The first fixes setting incorrect skb->truesize. When xdp-mb prog returns XDP_PASS, skb is allocated and initialized. Currently, The truesize is calculated as BNXT_RX_PAGE_SIZE * sinfo->nr_frags, but sinfo->nr_frags is flushed by napi_build_skb(). So, it stores sinfo before calling napi_build_skb() and then use it for calculate truesize. The second fixes kernel panic in the bnxt_queue_mem_alloc(). The bnxt_queue_mem_alloc() accesses rx ring descriptor. rx ring descriptors are allocated when the interface is up and it's freed when the interface is down. So, if bnxt_queue_mem_alloc() is called when the interface is down, kernel panic occurs. This patch makes the bnxt_queue_mem_alloc() return -ENETDOWN if rx ring descriptors are not allocated. The third patch fixes kernel panic in the bnxt_queue_{start | stop}(). When a queue is restarted bnxt_queue_{start | stop}() are called. These functions set MRU to 0 to stop packet flow and then to set up the remaining things. MRU variable is a member of vnic_info[] the first vnic_info is for default and the second is for ntuple. The first vnic_info is always allocated when interface is up, but the second is allocated only when ntuple is enabled. (ethtool -K eth0 ntuple <on | off>). Currently, the bnxt_queue_{start | stop}() access vnic_info[BNXT_VNIC_NTUPLE] regardless of whether ntuple is enabled or not. So kernel panic occurs. This patch make the bnxt_queue_{start | stop}() use bp->nr_vnics instead of BNXT_VNIC_NTUPLE. The fourth patch fixes a warning due to checksum state. The bnxt_rx_pkt() checks whether skb->ip_summed is not CHECKSUM_NONE before updating ip_summed. if ip_summed is not CHECKSUM_NONE, it WARNS about it. However, the bnxt_xdp_build_skb() is called in XDP-MB-PASS path and it updates ip_summed earlier than bnxt_rx_pkt(). So, in the XDP-MB-PASS path, the bnxt_rx_pkt() always warns about checksum. Updating ip_summed at the bnxt_xdp_build_skb() is unnecessary and duplicate, so it is removed. The fifth patch fixes a kernel panic in the bnxt_get_queue_stats{rx | tx}(). The bnxt_get_queue_stats{rx | tx}() callback functions are called when a queue is resetting. These internally access rx and tx rings without null check, but rings are allocated and initialized when the interface is up. So, these functions are called when the interface is down, it occurs a kernel panic. The sixth patch fixes memory leak in queue reset logic. When a queue is resetting, tpa_info is allocated for the new queue and tpa_info for an old queue is not used anymore. So it should be freed, but not. The seventh patch makes net_devmem_unbind_dmabuf() ignore -ENETDOWN. When devmem socket is closed, net_devmem_unbind_dmabuf() is called to unbind/release resources. If interface is down, the driver returns -ENETDOWN. The -ENETDOWN return value is not an actual error, because the interface will release resources when the interface is down. So, net_devmem_unbind_dmabuf() needs to ignore -ENETDOWN. The last patch adds XDP testcases to tools/testing/selftests/drivers/net/ping.py. v3: - Copy nr_frags instead of full copy. (1/8) - Add Review tags from Somnath. (3/8) - Add new patch for fixing kernel panic in the bnxt_get_queue_stats{rx | tx}(). (5/8) - Add new patch for fixing memory leak in queue reset. (6/8) v2: - Do not use num_frags in the bnxt_xdp_build_skb(). (1/6) - Add Review tags from Somnath and Jakub. (2/6) - Add new patch for fixing checksum warning. (4/6) - Add new patch for fixing warning in net_devmem_unbind_dmabuf(). (5/6) - Add new XDP testcases to ping.py (6/6) Taehee Yoo (8): eth: bnxt: fix truesize for mb-xdp-pass case eth: bnxt: return fail if interface is down in bnxt_queue_mem_alloc() eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic eth: bnxt: do not update checksum in bnxt_xdp_build_skb() eth: bnxt: fix kernel panic in the bnxt_get_queue_stats{rx | tx} eth: bnxt: fix memory leak in queue reset net: devmem: do not WARN conditionally after netdev_rx_queue_restart() selftests: drv-net: add xdp cases for ping.py drivers/net/ethernet/broadcom/bnxt/bnxt.c | 25 ++- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 13 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 3 +- net/core/devmem.c | 4 +- tools/testing/selftests/drivers/net/ping.py | 200 ++++++++++++++++-- .../testing/selftests/net/lib/xdp_dummy.bpf.c | 6 + 6 files changed, 220 insertions(+), 31 deletions(-) -- 2.34.1

10 months, 1 week

5
15
0 0

[PATCH v6 bpf-next 0/2] security: Propagate caller information in bpf hooks

by Blaise Boscaccy

Hello, While trying to implement an eBPF gatekeeper program, we ran into an issue whereas the LSM hooks are missing some relevant data. Certain subcommands passed to the bpf() syscall can be invoked from either the kernel or userspace. Additionally, some fields in the bpf_attr struct contain pointers, and depending on where the subcommand was invoked, they could point to either user or kernel memory. One example of this is the bpf_prog_load subcommand and its fd_array. This data is made available and used by the verifier but not made available to the LSM subsystem. This patchset simply exposes that information to applicable LSM hooks. Change list: - v5 -> v6 - fix regression caused by is_kernel renaming - simplify test logic - v4 -> v5 - merge v4 selftest breakout patch back into a single patch - change "is_kernel" to "kernel" - add selftest using new kernel flag - v3 -> v4 - split out selftest changes into a separate patch - v2 -> v3 - reorder params so that the new boolean flag is the last param - fixup function signatures in bpf selftests - v1 -> v2 - Pass a boolean flag in lieu of bpfptr_t Revisions: - v5 https://lore.kernel.org/bpf/20250307213651.3065714-1-bboscaccy@linux.micros… - v4 https://lore.kernel.org/bpf/20250304203123.3935371-1-bboscaccy@linux.micros… - v3 https://lore.kernel.org/bpf/20250303222416.3909228-1-bboscaccy@linux.micros… - v2 https://lore.kernel.org/bpf/20250228165322.3121535-1-bboscaccy@linux.micros… - v1 https://lore.kernel.org/bpf/20250226003055.1654837-1-bboscaccy@linux.micros… Blaise Boscaccy (2): security: Propagate caller information in bpf hooks selftests/bpf: Add a kernel flag test for LSM bpf hook include/linux/lsm_hook_defs.h | 6 +-- include/linux/security.h | 12 +++--- kernel/bpf/syscall.c | 10 ++--- security/security.c | 15 ++++--- security/selinux/hooks.c | 6 +-- .../selftests/bpf/prog_tests/kernel_flag.c | 43 +++++++++++++++++++ .../selftests/bpf/progs/rcu_read_lock.c | 3 +- .../bpf/progs/test_cgroup1_hierarchy.c | 4 +- .../selftests/bpf/progs/test_kernel_flag.c | 28 ++++++++++++ .../bpf/progs/test_kfunc_dynptr_param.c | 6 +-- .../selftests/bpf/progs/test_lookup_key.c | 2 +- .../selftests/bpf/progs/test_ptr_untrusted.c | 2 +- .../bpf/progs/test_task_under_cgroup.c | 2 +- .../bpf/progs/test_verify_pkcs7_sig.c | 2 +- 14 files changed, 108 insertions(+), 33 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/kernel_flag.c create mode 100644 tools/testing/selftests/bpf/progs/test_kernel_flag.c -- 2.48.1

10 months, 1 week

3
8
0 0

[PATCH 6.1.y] KVM: selftests: Fix build error due to assert in dirty_log_test

by Andrey Kalachev

Hi all. Please apply that forgotten patch to fix v6.1 KVM selftests broken build. Origin of the patch can be founded here [1] Regards, AK [1] https://lore.kernel.org/stable/20240403164230.1722018-1-rananta@google.com/ -- 2.30.2

10 months, 1 week

1
1
0 0

[PATCH] mm/huge_memory: drop beyond-EOF folios with the right number of refs.

by Zi Yan

When an after-split folio is large and needs to be dropped due to EOF, folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all page cache refs. Otherwise, the folio will not be freed, causing memory leak. This leak would happen on a filesystem with blocksize > page_size and a truncate is performed, where the blocksize makes folios split to >0 order ones, causing truncated folios not being freed. Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Reported-by: Hugh Dickins <hughd(a)google.com> Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/ Cc: stable(a)vger.kernel.org Signed-off-by: Zi Yan <ziy(a)nvidia.com> --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3d3ebdc002d5..373781b21e5c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3304,7 +3304,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, folio_account_cleaned(tail, inode_to_wb(folio->mapping->host)); __filemap_remove_folio(tail, NULL); - folio_put(tail); + folio_put_refs(tail, folio_nr_pages(tail)); } else if (!folio_test_anon(folio)) { __xa_store(&folio->mapping->i_pages, tail->index, tail, 0); -- 2.47.2

10 months, 1 week

1
0
0 0

[PATCH v9 0/8] Buddy allocator like (or non-uniform) folio split

by Zi Yan

Hi all, This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces 1. the total number of after-split folios from 2^(n-m) to n-m+1; 2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6; 3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios. For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios. Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). xfstests quick group passed for both tmpfs and xfs. It is on top of mm-everything-2025-02-26-03-56 with V8 reverted. It is ready to be merged. Changelog === From V8[11]: 1. Removed gfp parameter from xas_try_split() and GFP_NOWAIT is used all the time. (per Baolin Wang) 2. Used __xas_init_node_for_split() instead of __xas_alloc_node_for_split() and moved node allocation out. It fixed a bug when xa_node is pre-allocated by xas_nomem() before xas_try_split() is called without being initialized for split. From V7[9]: 1. Fixed a wrong function name in lib/test_xarray.c. 2. Made __split_folio_to_order() never fail, since the old order check is already done in __folio_split(). (per David Hildenbrand) 3. Fixed an issue reported by syzbot[10] by not dropping the original folio during truncate. 4. Fixed a WARNING when READ_ONLY_THP_FOR_FS is enabled. (Thank David Hildenbrand for reporting the issue) 5. Used two separate struct page* parameters, split_at and lock_at, to specify at which subpage the non-uniform split happens and which subpage to keep locked after the split, respectively. It improves code readability. From V6[8]: 1. Added an xarray function xas_try_split() to support iterative folio split, removing the need of using xas_split_alloc() and xas_split(). The function guarantees that at most one xa_node is allocated for each call. 2. Added concrete numbers of after-split folios and xa_node savings to cover letter, commit log. (per Andrew) From V5[7]: 1. Split shmem to any lower order patches are in mm tree, so dropped from this series. 2. Rename split_folio_at() to try_folio_split() to clarify that non-uniform split will not be used if it is not supported. From V4[6]: 1. Enabled shmem support in both uniform and buddy allocator like split and added selftests for it. 2. Added functions to check if uniform split and buddy allocator like split are supported for the given folio and order. 3. Made truncate fall back to uniform split if buddy allocator split is not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio). 4. Added the missing folio_clear_has_hwpoisoned() to __split_unmapped_folio(). From V3[5]: 1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra operations inside xas_split_alloc() are needed for correctness. 2. Enabled folio_split() for shmem and no issue was found with xfstests quick test group. 3. Split both ends of a truncate range in truncate_inode_partial_folio() to avoid wasting memory in shmem truncate (per David Hildenbrand). 4. Removed page_in_folio_offset() since page_folio() does the same thing. 5. Finished truncate related tests from xfstests quick test group on XFS and tmpfs without issues. 6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio. This check was missed in the prior versions. From V2[3]: 1. Incorporated all the feedback from Kirill[4]. 2. Used GFP_NOWAIT for xas_nomem(). 3. Tested the code path when xas_nomem() fails. 4. Added selftests for folio_split(). 5. Fixed no THP config build error. From V1[2]: 1. Split the original patch 1 into multiple ones for easy review (per Kirill). 2. Added xas_destroy() to avoid memory leak. 3. Fixed nr_dropped not used error (per kernel test robot). 4. Added proper error handling when xas_nomem() fails to allocate memory for xas_split() during buddy allocator like split. From RFC[1]: 1. Merged backend code of split_huge_page_to_list_to_order() and folio_split(). The same code is used for both uniform split and buddy allocator like split. 2. Use xas_nomem() instead of xas_split_alloc() for folio_split(). 3. folio_split() now leaves the first after-split folio unlocked, instead of the one containing the given page, since the caller of truncate_inode_partial_folio() locks and unlocks the first folio. 4. Extended split_huge_page debugfs to use folio_split(). 5. Added truncate_inode_partial_folio() as first user of folio_split(). Design === folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache Since anon folio does not support order-1 yet. The split process is similar to existing approach: 1. Unmap all page mappings (split PMD mappings if exist); 2. Split meta data like memcg, page owner, page alloc tag; 3. Copy meta data in struct folio to sub pages, but instead of spliting the whole folio into multiple smaller ones with the same order in a shot, this approach splits the folio iteratively. Taking the example above, this approach first splits the original order-9 into two order-8, then splits left part of order-8 to two order-7 and so on; 4. Post-process split folios, like write mapping->i_pages for pagecache, adjust folio refcounts, add split folios to corresponding list; 5. Remap split folios 6. Unlock split folios. __split_unmapped_folio() and __split_folio_to_order() replace __split_huge_page() and __split_huge_page_tail() respectively. __split_unmapped_folio() uses different approaches to perform uniform split and buddy allocator like split: 1. uniform split: one single call to __split_folio_to_order() is used to uniformly split the given folio. All resulting folios are put back to the list after split. The folio containing the given page is left to caller to unlock and others are unlocked. 2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls to __split_folio_to_order() are used to split the given folio at order N to order N-1. After each call, the target folio is changed to the one containing the page, which is given as a folio_split() parameter. After each call, folios not containing the page are put back to the list. The folio containing the page is put back to the list when its order is new_order. All folios are unlocked except the first folio, which is left to caller to unlock. Patch Overview === 1. Patch 1 added a new xarray function xas_try_split() to perform iterative xarray split. 2. Patch 2 added __split_unmapped_folio() and __split_folio_to_order() to prepare for moving to new backend split code. 3. Patch 3 moved common code in split_huge_page_to_list_to_order() to __folio_split(). 4. Patch 4 added new folio_split() and made split_huge_page_to_list_to_order() share the new __split_unmapped_folio() with folio_split(). 5. Patch 5 removed no longer used __split_huge_page() and __split_huge_page_tail(). 6. Patch 6 added a new in_folio_offset to split_huge_page debugfs for folio_split() test. 7. Patch 7 used try_folio_split() for truncate operation. 8. Patch 8 added folio_split() tests. Any comments and/or suggestions are welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/ [3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/ [4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc… [5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/ [6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/ [7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/ [8] https://lore.kernel.org/linux-mm/20250205031417.1771278-1-ziy@nvidia.com/ [9] https://lore.kernel.org/linux-mm/20250211155034.268962-1-ziy@nvidia.com/ [10] https://lore.kernel.org/all/67af65cb.050a0220.21dd3.004a.GAE@google.com/ [11] https://lore.kernel.org/linux-mm/20250218235012.1542225-1-ziy@nvidia.com/ Zi Yan (8): xarray: add xas_try_split() to split a multi-index entry mm/huge_memory: add two new (not yet used) functions for folio_split() mm/huge_memory: move folio split common code to __folio_split() mm/huge_memory: add buddy allocator like (non-uniform) folio_split() mm/huge_memory: remove the old, unused __split_huge_page() mm/huge_memory: add folio_split() to debugfs testing interface mm/truncate: use buddy allocator like folio split for truncate operation selftests/mm: add tests for folio_split(), buddy allocator like split Documentation/core-api/xarray.rst | 14 +- include/linux/huge_mm.h | 36 + include/linux/xarray.h | 6 + lib/test_xarray.c | 52 ++ lib/xarray.c | 131 ++- mm/huge_memory.c | 755 ++++++++++++------ mm/truncate.c | 31 +- tools/testing/radix-tree/Makefile | 1 + .../selftests/mm/split_huge_page_test.c | 34 +- 9 files changed, 783 insertions(+), 277 deletions(-) -- 2.47.2

10 months, 1 week

5
30
0 0

[PATCH bpf-next v2 0/3] bpf: Fix use-after-free of sockmap

by Jiayuan Chen

1. Issue Syzkaller reported this issue [1]. 2. Reproduce We can reproduce this issue by using the test_sockmap_with_close_on_write() test I provided in selftest, also you need to apply the following patch to ensure 100% reproducibility (sleep after checking sock): ''' static void sk_psock_verdict_data_ready(struct sock *sk) { ....... if (unlikely(!sock)) return; + if (!strcmp("test_progs", current->comm)) { + printk("sleep 2s to wait socket freed\n"); + mdelay(2000); + printk("sleep end\n"); + } ops = READ_ONCE(sock->ops); if (!ops || !ops->read_skb) return; } ''' Then running './test_progs -v sockmap_basic', and if the kernel has KASAN enabled [2], you will see the following warning: ''' BUG: KASAN: slab-use-after-free in sk_psock_verdict_data_ready+0x29b/0x2d0 Read of size 8 at addr ffff88813a777020 by task test_progs/47055 Tainted: [O]=OOT_MODULE Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x30/0x420 ? sk_psock_verdict_data_ready+0x29b/0x2d0 print_report+0xb7/0x270 ? sk_psock_verdict_data_ready+0x29b/0x2d0 ? kasan_addr_to_slab+0xd/0xa0 ? sk_psock_verdict_data_ready+0x29b/0x2d0 kasan_report+0xca/0x100 ? sk_psock_verdict_data_ready+0x29b/0x2d0 sk_psock_verdict_data_ready+0x29b/0x2d0 unix_stream_sendmsg+0x4a6/0xa40 ? __pfx_unix_stream_sendmsg+0x10/0x10 ? fdget+0x2c1/0x3a0 __sys_sendto+0x39c/0x410 ''' 3. Reason ''' CPU0 CPU1 unix_stream_sendmsg(sk): other = unix_peer(sk) other->sk_data_ready(other): socket *sock = sk->sk_socket if (unlikely(!sock)) return; close(other): ... other->close() free(socket) READ_ONCE(sock->ops) ^ use 'sock' after free ''' For TCP, UDP, or other protocols, we have already performed rcu_read_lock() when the network stack receives packets in ip_input.c: ''' ip_local_deliver_finish(): rcu_read_lock() ip_protocol_deliver_rcu() xxx_rcv rcu_read_unlock() ''' However, for Unix sockets, sk_data_ready is called directly from the process context without rcu_read_lock() protection. 4. Solution Based on the fact that the 'struct socket' is released using call_rcu(), We add rcu_read_{un}lock() at the entrance and exit of our sk_data_ready. It will not increase performance overhead, at least for TCP and UDP, they are already in a relatively large critical section. Of course, we can also add a custom callback for Unix sockets and call rcu_read_lock() before calling _verdict_data_ready like this: ''' if (sk_is_unix(sk)) sk->sk_data_ready = sk_psock_verdict_data_ready_rcu; else sk->sk_data_ready = sk_psock_verdict_data_ready; sk_psock_verdict_data_ready_rcu(): rcu_read_lock() sk_psock_verdict_data_ready() rcu_read_unlock() ''' However, this will cause too many branches, and it's not suitable to distinguish network protocols in skmsg.c. [1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072 [2] https://syzkaller.appspot.com/text?tag=KernelConfig&x=1362a5aee630ff34 --- v1 -> v2: 1. Add Fixes tag. 2. Extend selftest of edge case for TCP/UDP sockets. 3. Add Reviewed-by and Acked-by tag. https://lore.kernel.org/bpf/20250226132242.52663-1-jiayuan.chen@linux.dev/T… Jiayuan Chen (3): bpf, sockmap: avoid using sk_socket after free selftests/bpf: Add socketpair to create_pair to support unix socket selftests/bpf: Add edge case tests for sockmap net/core/skmsg.c | 18 ++++-- .../selftests/bpf/prog_tests/socket_helpers.h | 13 +++- .../selftests/bpf/prog_tests/sockmap_basic.c | 59 +++++++++++++++++++ 3 files changed, 84 insertions(+), 6 deletions(-) -- 2.47.1

10 months, 1 week

2
7
0 0

[PATCH bpf-next v12 0/5] xsk: TX metadata Launch Time support

by Song Yoong Siang

This series expands the XDP TX metadata framework to allow user applications to pass per packet 64-bit launch time directly to the kernel driver, requesting launch time hardware offload support. The XDP TX metadata framework will not perform any clock conversion or packet reordering. Please note that the role of Tx metadata is just to pass the launch time, not to enable the offload feature. Users will need to enable the launch time hardware offload feature of the device by using the respective command, such as the tc-etf command. Although some devices use the tc-etf command to enable their launch time hardware offload feature, xsk packets will not go through the etf qdisc. Therefore, in my opinion, the launch time should always be based on the PTP Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the clock source. To simplify the test steps, I modified the xdp_hw_metadata bpf self-test tool in such a way that it will set the launch time based on the offset provided by the user and the value of the Receive Hardware Timestamp, which is against the PHC. This will eliminate the need to discipline System Clock with the PHC and then use clock_gettime() to get the time. Please note that AF_XDP lacks a feedback mechanism to inform the application if the requested launch time is invalid. So, users are expected to familiar with the horizon of the launch time of the device they use and not request a launch time that is beyond the horizon. Otherwise, the driver might interpret the launch time incorrectly and react wrongly. For stmmac and igc, where modulo computation is used, a launch time larger than the horizon will cause the device to transmit the packet earlier that the requested launch time. Although there is no feedback mechanism for the launch time request for now, user still can check whether the requested launch time is working or not, by requesting the Transmit Completion Hardware Timestamp. v12: - Fix the comment in include/uapi/linux/if_xdp.h to allign with what is generated by ./tools/net/ynl/ynl-regen.sh to avoid dirty tree error in the netdev/ynl checks. v11: https://lore.kernel.org/netdev/20250216074302.956937-1-yoong.siang.song@int… - regenerate netdev_xsk_flags based on latest netdev.yaml (Jakub) v10: https://lore.kernel.org/netdev/20250207021943.814768-1-yoong.siang.song@int… - use net_err_ratelimited(), instead of net_ratelimit() (Maciej) - accumulate the amount of used descs in local variable and update the igc_metadata_request::used_desc once (Maciej) - Ensure reverse christmas tree rule (Maciej) V9: https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.song@int… - Remove the igc_desc_unused() checking (Maciej) - Ensure that skb allocation and DMA mapping work before proceeding to fill in igc_tx_buffer info, context desc, and data desc (Maciej) - Rate limit the error messages (Maciej) - Update the comment to indicate that the 2 descriptors needed by the empty frame are already taken into consideration (Maciej) - Handle the case where the insertion of an empty frame fails and explain the reason behind (Maciej) - put self SOB tag as last tag (Maciej) V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int… - check the number of used descriptor in xsk_tx_metadata_request() by using used_desc of struct igc_metadata_request, and then decreases the budget with it (Maciej) - submit another bug fix patch to set the buffer type for empty frame (Maciej): https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int… V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int… - split the refactoring code of igc empty packet insertion into a separate commit (Faizal) - add explanation on why the value "4" is used as igc transmit budget (Faizal) - perform a stress test by sending 1000 packets with 10ms interval and launch time set to 500us in the future (Faizal & Yong Liang) V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int… - fix selftest build errors by using asprintf() and realloc(), instead of managing the buffer sizes manually (Daniel, Stanislav) V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int… - change netdev feature name from tx-launch-time to tx-launch-time-fifo to explicitly state the FIFO behaviour (Stanislav) - improve the looping of xdp_hw_metadata app to wait for packet tx completion to be more readable by using clock_gettime() (Stanislav) - add launch time setup steps into xdp_hw_metadata app (Stanislav) V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel… - added XDP launch time support to the igc driver (Jesper & Florian) - added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper) - added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub) - added step to enable launch time in the commit message (Jesper & Willem) - explicitly documented the type of launch_time and which clock source it is against (Willem) V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in… - renamed to use launch time (Jesper & Willem) - changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s because some NICs do not support such a large future time. V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in… - renamed to use Earliest TxTime First (Willem) - renamed to use txtime (Willem) V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int… Song Yoong Siang (5): xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add launch time request to xdp_hw_metadata net: stmmac: Add launch time support to XDP ZC igc: Refactor empty frame insertion for launch time support igc: Add launch time support to XDP ZC Documentation/netlink/specs/netdev.yaml | 4 + Documentation/networking/xsk-tx-metadata.rst | 62 +++++++ drivers/net/ethernet/intel/igc/igc.h | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 143 +++++++++++---- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++ include/net/xdp_sock.h | 10 ++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 ++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 ++ tools/include/uapi/linux/netdev.h | 3 + tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++- 15 files changed, 396 insertions(+), 39 deletions(-) -- 2.34.1

10 months, 1 week

5
10
0 0

[PATCH net-next 3/4] tools/testing/selftests/cgroup/cgroup_util: add cg_get_id helper

by Alexander Mikhalitsyn

Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: netdev(a)vger.kernel.org Cc: cgroups(a)vger.kernel.org Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Eric Dumazet <edumazet(a)google.com> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Paolo Abeni <pabeni(a)redhat.com> Cc: Willem de Bruijn <willemb(a)google.com> Cc: Leon Romanovsky <leon(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Cc: Lennart Poettering <mzxreary(a)0pointer.de> Cc: Luca Boccassi <bluca(a)debian.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: "Michal Koutný" <mkoutny(a)suse.com> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn(a)canonical.com> --- tools/testing/selftests/cgroup/cgroup_util.c | 15 +++++++++++++++ tools/testing/selftests/cgroup/cgroup_util.h | 2 ++ 2 files changed, 17 insertions(+) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c index 1e2d46636a0c..b60e0e1433f4 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -205,6 +205,21 @@ int cg_open(const char *cgroup, const char *control, int flags) return open(path, flags); } +/* + * Returns cgroup id on success, or -1 on failure. + */ +uint64_t cg_get_id(const char *cgroup) +{ + struct stat st; + int ret; + + ret = stat(cgroup, &st); + if (ret) + return -1; + + return st.st_ino; +} + int cg_write_numeric(const char *cgroup, const char *control, long value) { char buf[64]; diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h index 19b131ee7707..3f2d9676ceda 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.h +++ b/tools/testing/selftests/cgroup/cgroup_util.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include <stdbool.h> +#include <stdint.h> #include <stdlib.h> #include "../kselftest.h" @@ -39,6 +40,7 @@ long cg_read_key_long(const char *cgroup, const char *control, const char *key); extern long cg_read_lc(const char *cgroup, const char *control); extern int cg_write(const char *cgroup, const char *control, char *buf); extern int cg_open(const char *cgroup, const char *control, int flags); +extern uint64_t cg_get_id(const char *cgroup); int cg_write_numeric(const char *cgroup, const char *control, long value); extern int cg_run(const char *cgroup, int (*fn)(const char *cgroup, void *arg), -- 2.43.0

10 months, 1 week

1
0
0 0

🎁 Yes, a Truly Free SEO Cleanup Awaits!

by Free Backlinks Clean up

Hi there, I understand the skepticism—“free” offers often come with strings attached. But we’re genuinely committed to supporting webmasters and giving back to the community with no hidden catches. Simply fill out the form, and our team will deliver a comprehensive SEO cleanup within a day—no cost, no commitments. Get Your Free SEO Cleanup Here: https://www.1free-seo.com/get-started/ (https://www.1free-seo.com/get-started/) Cheers, Mike Larson WhatsApp: +1 (833) 854-6783 (https://wa.me/18338546783) Telegram: https://t.me/freeseosupport (https://t.me/freeseosupport) Book appointment: https://1free-seo.com/free-seo-consultaition/ (https://1free-seo.com/free-seo-consultaition/) Unsubscribe (https://clicks.1free-seo.com/?na=u&nk=2080001-9a11fc6fe9&nek=24-) | Manage your subscription (https://clicks.1free-seo.com/?na=p&nk=2080001-9a11fc6fe9&nek=24-) | View online (https://clicks.1free-seo.com/?na=v&nk=2080001-9a11fc6fe9&id=24)

10 months, 1 week

1
0
0 0

[PATCH v3 0/2] scanf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and printf), the rest having been converted to KUnit. In addition to the enclosed patch, please consider this an RFC on the removal of the "Test Module" kselftest machinery. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 scanf Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v3: - Reduce diff noise in lib/Makefile. (Petr Mladek) - Split `scanf_test` into a few test cases. New output: : =================== scanf (10 subtests) ==================== : [PASSED] numbers_simple : ====================== numbers_list ======================= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ================== [PASSED] numbers_list =================== : ============ numbers_list_field_width_typemax ============= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======== [PASSED] numbers_list_field_width_typemax ========= : =========== numbers_list_field_width_val_width ============ : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======= [PASSED] numbers_list_field_width_val_width ======== : [PASSED] numbers_slice : [PASSED] numbers_prefix_overflow : [PASSED] test_simple_strtoull : [PASSED] test_simple_strtoll : [PASSED] test_simple_strtoul : [PASSED] test_simple_strtol : ====================== [PASSED] scanf ====================== : ============================================================ : Testing complete. Ran 22 tests: passed: 22 : Elapsed time: 5.517s total, 0.001s configuring, 5.440s building, 0.067s running - Link to v2: https://lore.kernel.org/r/20250203-scanf-kunit-convert-v2-1-277a618d804e@gm… Changes in v2: - Rename lib/{test_scanf.c => scanf_kunit.c}. (Andy Shevchenko) - Link to v1: https://lore.kernel.org/r/20250131-scanf-kunit-convert-v1-1-0976524f0eba@gm… --- Tamir Duberstein (2): scanf: convert self-test to KUnit scanf: break kunit into test cases MAINTAINERS | 2 +- arch/m68k/configs/amiga_defconfig | 1 - arch/m68k/configs/apollo_defconfig | 1 - arch/m68k/configs/atari_defconfig | 1 - arch/m68k/configs/bvme6000_defconfig | 1 - arch/m68k/configs/hp300_defconfig | 1 - arch/m68k/configs/mac_defconfig | 1 - arch/m68k/configs/multi_defconfig | 1 - arch/m68k/configs/mvme147_defconfig | 1 - arch/m68k/configs/mvme16x_defconfig | 1 - arch/m68k/configs/q40_defconfig | 1 - arch/m68k/configs/sun3_defconfig | 1 - arch/m68k/configs/sun3x_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig | 1 - lib/Kconfig.debug | 20 +- lib/Makefile | 2 +- lib/scanf_kunit.c | 800 ++++++++++++++++++++++++++++++++++ lib/test_scanf.c | 814 ----------------------------------- tools/testing/selftests/lib/Makefile | 2 +- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/scanf.sh | 4 - 21 files changed, 820 insertions(+), 838 deletions(-) --- base-commit: a86bf2283d2c9769205407e2b54777c03d012939 change-id: 20250131-scanf-kunit-convert-f70dc33bb34c Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

10 months, 1 week

3
6
0 0

[PATCH bpf-next 0/2] selftests/bpf: Move test_lwt_seg6local to test_progs

by Bastien Curutchet (eBPF Foundation)

Hi all, This patch series continues the work to migrate the script tests into prog_tests. test_lwt_seg6local.sh tests some bpf_lwt_* helpers. It contains only one test that uses a network topology quite different than the ones that can be found in others prog_tests/lwt_*.c files so I add a new prog_tests/lwt_seg6local.c file. While working on the migration I noticed that some routes present in the script weren't needed so PATCH 1 deletes them and then PATCH 2 migrates the test into the test_progs framework. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Bastien Curutchet (eBPF Foundation) (2): selftests/bpf: lwt_seg6local: Remove unused routes selftests/bpf: lwt_seg6local: Move test to test_progs tools/testing/selftests/bpf/Makefile | 1 - .../selftests/bpf/prog_tests/lwt_seg6local.c | 176 +++++++++++++++++++++ tools/testing/selftests/bpf/test_lwt_seg6local.sh | 156 ------------------ 3 files changed, 176 insertions(+), 157 deletions(-) --- base-commit: 86eb3a47230a41c6ccf5cdae8ee0a7e7292aa29d change-id: 20250214-seg6local-64bcde44b66e Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

10 months, 1 week

3
4
0 0

[PATCH v2] kunit: tool: Fix bug in parsing test plan

by Rae Moar

A bug was identified where the KTAP below caused an infinite loop: TAP version 13 ok 4 test_case 1..4 The infinite loop was caused by the parser not parsing a test plan if following a test result line. Fix this bug to correctly parse test plan line. Signed-off-by: Rae Moar <rmoar(a)google.com> --- Changes since v1: - Remove error reported when test plan is missing. tools/testing/kunit/kunit_parser.py | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py index 29fc27e8949b..da53a709773a 100644 --- a/tools/testing/kunit/kunit_parser.py +++ b/tools/testing/kunit/kunit_parser.py @@ -759,7 +759,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest: # If parsing the main/top-level test, parse KTAP version line and # test plan test.name = "main" - ktap_line = parse_ktap_header(lines, test, printer) + parse_ktap_header(lines, test, printer) test.log.extend(parse_diagnostic(lines)) parse_test_plan(lines, test) parent_test = True @@ -768,13 +768,12 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest: # the KTAP version line and/or subtest header line ktap_line = parse_ktap_header(lines, test, printer) subtest_line = parse_test_header(lines, test) + test.log.extend(parse_diagnostic(lines)) + parse_test_plan(lines, test) parent_test = (ktap_line or subtest_line) if parent_test: - # If KTAP version line and/or subtest header is found, attempt - # to parse test plan and print test header - test.log.extend(parse_diagnostic(lines)) - parse_test_plan(lines, test) print_test_header(test, printer) + expected_count = test.expected_count subtests = [] test_num = 1 base-commit: 0619a4868fc1b32b07fb9ed6c69adc5e5cf4e4b2 -- 2.49.0.rc0.332.g42c0ae87b1-goog

10 months, 1 week

3
2
0 0

[PATCH v5 bpf-next 0/2] security: Propagate caller information in bpf hooks

by Blaise Boscaccy

Hello, While trying to implement an eBPF gatekeeper program, we ran into an issue whereas the LSM hooks are missing some relevant data. Certain subcommands passed to the bpf() syscall can be invoked from either the kernel or userspace. Additionally, some fields in the bpf_attr struct contain pointers, and depending on where the subcommand was invoked, they could point to either user or kernel memory. One example of this is the bpf_prog_load subcommand and its fd_array. This data is made available and used by the verifier but not made available to the LSM subsystem. This patchset simply exposes that information to applicable LSM hooks. Change list: - v4 -> v5 - merge v4 selftest breakout patch back into a single patch - change "is_kernel" to "kernel" - add selftest using new kernel flag - v3 -> v4 - split out selftest changes into a separate patch - v2 -> v3 - reorder params so that the new boolean flag is the last param - fixup function signatures in bpf selftests - v1 -> v2 - Pass a boolean flag in lieu of bpfptr_t Revisions: - v4 https://lore.kernel.org/bpf/20250304203123.3935371-1-bboscaccy@linux.micros… - v3 https://lore.kernel.org/bpf/20250303222416.3909228-1-bboscaccy@linux.micros… - v2 https://lore.kernel.org/bpf/20250228165322.3121535-1-bboscaccy@linux.micros… - v1 https://lore.kernel.org/bpf/20250226003055.1654837-1-bboscaccy@linux.micros… Blaise Boscaccy (2): security: Propagate caller information in bpf hooks selftests/bpf: Add a kernel flag test for LSM bpf hook include/linux/lsm_hook_defs.h | 6 +-- include/linux/security.h | 12 +++--- kernel/bpf/syscall.c | 10 ++--- security/security.c | 15 ++++--- security/selinux/hooks.c | 6 +-- .../selftests/bpf/prog_tests/kernel_flag.c | 43 +++++++++++++++++++ .../selftests/bpf/progs/rcu_read_lock.c | 3 +- .../bpf/progs/test_cgroup1_hierarchy.c | 4 +- .../selftests/bpf/progs/test_kernel_flag.c | 32 ++++++++++++++ .../bpf/progs/test_kfunc_dynptr_param.c | 6 +-- .../selftests/bpf/progs/test_lookup_key.c | 2 +- .../selftests/bpf/progs/test_ptr_untrusted.c | 2 +- .../bpf/progs/test_task_under_cgroup.c | 2 +- .../bpf/progs/test_verify_pkcs7_sig.c | 2 +- 14 files changed, 112 insertions(+), 33 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/kernel_flag.c create mode 100644 tools/testing/selftests/bpf/progs/test_kernel_flag.c -- 2.48.1

10 months, 1 week

2
3
0 0

[PATCH v5 0/3] printf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and scanf), the rest having been converted to KUnit. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 printf I have also sent out a series converting scanf[0]. Link: https://lore.kernel.org/all/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@… [0] Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v5: - Update `do_test` `__printf` annotation (Rasmus Villemoes). - Link to v4: https://lore.kernel.org/r/20250214-printf-kunit-convert-v4-0-c254572f1565@g… Changes in v4: - Add patch "implicate test line in failure messages". - Rebase on linux-next, move scanf_kunit.c into lib/tests/. - Link to v3: https://lore.kernel.org/r/20250210-printf-kunit-convert-v3-0-ee6ac5500f5e@g… Changes in v3: - Remove extraneous trailing newlines from failure messages. - Replace `pr_warn` with `kunit_warn`. - Drop arch changes. - Remove KUnit boilerplate from CONFIG_PRINTF_KUNIT_TEST help text. - Restore `total_tests` counting. - Remove tc_fail macro in last patch. - Link to v2: https://lore.kernel.org/r/20250207-printf-kunit-convert-v2-0-057b23860823@g… Changes in v2: - Incorporate code review from prior work[0] by Arpitha Raghunandan. - Link to v1: https://lore.kernel.org/r/20250204-printf-kunit-convert-v1-0-ecf1b846a4de@g… Link: https://lore.kernel.org/lkml/20200817043028.76502-1-98.arpi@gmail.com/t/#u [0] --- Tamir Duberstein (3): printf: convert self-test to KUnit printf: break kunit into test cases printf: implicate test line in failure messages Documentation/core-api/printk-formats.rst | 4 +- MAINTAINERS | 2 +- lib/Kconfig.debug | 12 +- lib/Makefile | 1 - lib/tests/Makefile | 1 + lib/{test_printf.c => tests/printf_kunit.c} | 437 ++++++++++++---------------- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/printf.sh | 4 - 8 files changed, 200 insertions(+), 262 deletions(-) --- base-commit: d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa change-id: 20250131-printf-kunit-convert-fd4012aa2ec6 Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

10 months, 1 week

2
16
0 0

[PATCH net-next v8 0/6] tun: Introduce virtio-net hashing feature

by Akihiko Odaki

virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS. Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF. Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it is based on context rewrites, which is in feature freeze. We can adopt kfuncs, but they will not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM and vhost_net). The patches for QEMU to use this new feature was submitted as RFC and is available at: https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/ This work was presented at LPC 2024: https://lpc.events/event/18/contributions/1963/ V1 -> V2: Changed to introduce a new BPF program type. Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com> --- Changes in v8: - Disabled IPv6 to eliminate noises in tests. - Added a branch in tap to avoid unnecessary dissection when hash reporting is disabled. - Removed unnecessary rtnl_lock(). - Extracted code to handle new ioctls into separate functions to avoid adding extra NULL checks to the code handling other ioctls. - Introduced variable named "fd" to __tun_chr_ioctl(). - s/-/=/g in a patch message to avoid confusing Git. - Link to v7: https://lore.kernel.org/r/20250228-rss-v7-0-844205cbbdd6@daynix.com Changes in v7: - Ensured to set hash_report to VIRTIO_NET_HASH_REPORT_NONE for VHOST_NET_F_VIRTIO_NET_HDR. - s/4/sizeof(u32)/ in patch "virtio_net: Add functions for hashing". - Added tap_skb_cb type. - Rebased. - Link to v6: https://lore.kernel.org/r/20250109-rss-v6-0-b1c90ad708f6@daynix.com Changes in v6: - Extracted changes to fill vnet header holes into another series. - Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun: Introduce virtio-net hash reporting feature", and "tun: Introduce virtio-net RSS" into patch "tun: Introduce virtio-net hash feature". - Dropped the RFC tag. - Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com Changes in v5: - Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE. - Optimized the calculation of the hash value according to: https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca… - Added patch "tun: Unify vnet implementation". - Dropped patch "tap: Pad virtio header with zero". - Added patch "selftest: tun: Test vnet ioctls without device". - Reworked selftests to skip for older kernels. - Documented the case when the underlying device is deleted and packets have queue_mapping set by TC. - Reordered test harness arguments. - Added code to handle fragmented packets. - Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com Changes in v4: - Moved tun_vnet_hash_ext to if_tun.h. - Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc(). - Replaced htons() with cpu_to_be16(). - Changed virtio_net_hash_rss() to return void. - Reordered variable declarations in virtio_net_hash_rss(). - Removed virtio_net_hdr_v1_hash_from_skb(). - Updated messages of "tap: Pad virtio header with zero" and "tun: Pad virtio header with zero". - Fixed vnet_hash allocation size. - Ensured to free vnet_hash when destructing tun_struct. - Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com Changes in v3: - Reverted back to add ioctl. - Split patch "tun: Introduce virtio-net hashing feature" into "tun: Introduce virtio-net hash reporting feature" and "tun: Introduce virtio-net RSS". - Changed to reuse hash values computed for automq instead of performing RSS hashing when hash reporting is requested but RSS is not. - Extracted relevant data from struct tun_struct to keep it minimal. - Added kernel-doc. - Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF. - Initialized num_buffers with 1. - Added a test case for unclassified packets. - Fixed error handling in tests. - Changed tests to verify that the queue index will not overflow. - Rebased. - Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com --- Akihiko Odaki (6): virtio_net: Add functions for hashing net: flow_dissector: Export flow_keys_dissector_symmetric tun: Introduce virtio-net hash feature selftest: tun: Test vnet ioctls without device selftest: tun: Add tests for virtio-net hashing vhost/net: Support VIRTIO_NET_F_HASH_REPORT Documentation/networking/tuntap.rst | 7 + drivers/net/Kconfig | 1 + drivers/net/tap.c | 67 +++- drivers/net/tun.c | 98 +++++- drivers/net/tun_vnet.h | 159 ++++++++- drivers/vhost/net.c | 49 +-- include/linux/if_tap.h | 2 + include/linux/skbuff.h | 3 + include/linux/virtio_net.h | 188 ++++++++++ include/net/flow_dissector.h | 1 + include/uapi/linux/if_tun.h | 75 ++++ net/core/flow_dissector.c | 3 +- net/core/skbuff.c | 4 + tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/tun.c | 656 ++++++++++++++++++++++++++++++++++- 15 files changed, 1254 insertions(+), 61 deletions(-) --- base-commit: dd83757f6e686a2188997cb58b5975f744bb7786 change-id: 20240403-rss-e737d89efa77 prerequisite-change-id: 20241230-tun-66e10a49b0c7:v6 prerequisite-patch-id: 871dc5f146fb6b0e3ec8612971a8e8190472c0fb prerequisite-patch-id: 2797ed249d32590321f088373d4055ff3f430a0e prerequisite-patch-id: ea3370c72d4904e2f0536ec76ba5d26784c0cede prerequisite-patch-id: 837e4cf5d6b451424f9b1639455e83a260c4440d prerequisite-patch-id: ea701076f57819e844f5a35efe5cbc5712d3080d prerequisite-patch-id: 701646fb43ad04cc64dd2bf13c150ccbe6f828ce prerequisite-patch-id: 53176dae0c003f5b6c114d43f936cf7140d31bb5 prerequisite-change-id: 20250116-buffers-96e14bf023fc:v2 prerequisite-patch-id: 25fd4f99d4236a05a5ef16ab79f3e85ee57e21cc Best regards, -- Akihiko Odaki <akihiko.odaki(a)daynix.com>

10 months, 1 week

4
9
0 0

[PATCH v2 0/4] RISC-V KVM PMU fix and selftest improvement

by Atish Patra

This series adds a fix for KVM PMU code and improves the pmu selftest by allowing generating precise number of interrupts. It also provided another additional option to the overflow test that allows user to generate custom number of LCOFI interrupts. Signed-off-by: Atish Patra <atishp(a)rivosinc.com> --- Changes in v2: - Initialized the local overflow irq variable to 0 indicate that it's not a allowed value. - Moved the introduction of argument option `n` to the last patch. - Link to v1: https://lore.kernel.org/r/20250226-kvm_pmu_improve-v1-0-74c058c2bf6d@rivosi… --- Atish Patra (4): RISC-V: KVM: Disable the kernel perf counter during configure KVM: riscv: selftests: Do not start the counter in the overflow handler KVM: riscv: selftests: Change command line option KVM: riscv: selftests: Allow number of interrupts to be configurable arch/riscv/kvm/vcpu_pmu.c | 1 + tools/testing/selftests/kvm/riscv/sbi_pmu_test.c | 81 ++++++++++++++++-------- 2 files changed, 57 insertions(+), 25 deletions(-) --- base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319 change-id: 20250225-kvm_pmu_improve-fffd038b2404 -- Regards, Atish patra

10 months, 1 week

3
6
0 0

[PATCH v2 net 0/6] eth: bnxt: fix several bugs in the bnxt module

by Taehee Yoo

The first fixes setting incorrect skb->truesize. When xdp-mb prog returns XDP_PASS, skb is allocated and initialized. Currently, The truesize is calculated as BNXT_RX_PAGE_SIZE * sinfo->nr_frags, but sinfo->nr_frags is flushed by napi_build_skb(). So, it stores sinfo before calling napi_build_skb() and then use it for calculate truesize. The second fixes kernel panic in the bnxt_queue_mem_alloc(). The bnxt_queue_mem_alloc() accesses rx ring descriptor. rx ring descriptors are allocated when the interface is up and it's freed when the interface is down. So, if bnxt_queue_mem_alloc() is called when the interface is down, kernel panic occurs. This patch makes the bnxt_queue_mem_alloc() return -ENETDOWN if rx ring descriptors are not allocated. The third patch fixes kernel panic in the bnxt_queue_{start | stop}(). When a queue is restarted bnxt_queue_{start | stop}() are called. These functions set MRU to 0 to stop packet flow and then to set up the remaining things. MRU variable is a member of vnic_info[] the first vnic_info is for default and the second is for ntuple. The first vnic_info is always allocated when interface is up, but the second is allocated only when ntuple is enabled. (ethtool -K eth0 ntuple <on | off>). Currently, the bnxt_queue_{start | stop}() access vnic_info[BNXT_VNIC_NTUPLE] regardless of whether ntuple is enabled or not. So kernel panic occurs. This patch make the bnxt_queue_{start | stop}() use bp->nr_vnics instead of BNXT_VNIC_NTUPLE. The fourth patch fixes a warning due to checksum state. The bnxt_rx_pkt() checks whether skb->ip_summed is not CHECKSUM_NONE before updating ip_summed. if ip_summed is not CHECKSUM_NONE, it WARNS about it. However, the bnxt_xdp_build_skb() is called in XDP-MB-PASS path and it updates ip_summed earlier than bnxt_rx_pkt(). So, in the XDP-MB-PASS path, the bnxt_rx_pkt() always warns about checksum. Updating ip_summed at the bnxt_xdp_build_skb() is unnecessary and duplicate, so it is removed. The fifth patch makes net_devmem_unbind_dmabuf() ignore -ENETDOWN. When devmem socket is closed, net_devmem_unbind_dmabuf() is called to unbind/release resources. If interface is down, the driver returns -ENETDOWN. The -ENETDOWN return value is not an actual error, because the interface will release resources when the interface is down. So, net_devmem_unbind_dmabuf() needs to ignore -ENETDOWN. The last patch adds XDP testcases to tools/testing/selftests/drivers/net/ping.py. v2: - Do not use num_frags in the bnxt_xdp_build_skb(). (1/6) - Add Review tags from Somnath and Jakub. (2/6) - Add new patch for fixing checksum warning. (4/6) - Add new patch for fixing warning in net_devmem_unbind_dmabuf(). (5/6) - Add new XDP testcases to ping.py (6/6) Taehee Yoo (6): eth: bnxt: fix truesize for mb-xdp-pass case eth: bnxt: return fail if interface is down in bnxt_queue_mem_alloc() eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic eth: bnxt: do not update checksum in bnxt_xdp_build_skb() net: devmem: do not WARN conditionally after netdev_rx_queue_restart() selftests: drv-net: add xdp cases for ping.py drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 ++-- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 18 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 6 +- net/core/devmem.c | 4 +- tools/testing/selftests/drivers/net/ping.py | 200 ++++++++++++++++-- .../testing/selftests/net/lib/xdp_dummy.bpf.c | 6 + 6 files changed, 221 insertions(+), 49 deletions(-) -- 2.34.1

10 months, 1 week

3
9
0 0

[PATCHv4 net 0/3] bond: fix xfrm offload issues

by Hangbin Liu

The first patch fixes the incorrect locks using in bond driver. The second patch fixes the xfrm offload feature during setup active-backup mode. The third patch add a ipsec offload testing. v4: hold xs->lock for bond_ipsec_{del, add}_sa_all (Cosmin Ratiu) use the defer helpers in lib.sh for selftest (Petr Machata) v3: move the ipsec deletion to bond_ipsec_free_sa (Cosmin Ratiu) v2: do not turn carrier on if bond change link failed (Nikolay Aleksandrov) move the mutex lock to a work queue (Cosmin Ratiu) Hangbin Liu (3): bonding: move IPsec deletion to bond_ipsec_free_sa bonding: fix xfrm offload feature setup on active-backup mode selftests: bonding: add ipsec offload test drivers/net/bonding/bond_main.c | 55 +++++-- drivers/net/bonding/bond_netlink.c | 16 +- include/net/bonding.h | 1 + .../selftests/drivers/net/bonding/Makefile | 3 +- .../drivers/net/bonding/bond_ipsec_offload.sh | 154 ++++++++++++++++++ .../selftests/drivers/net/bonding/config | 4 + 6 files changed, 208 insertions(+), 25 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh -- 2.46.0

10 months, 1 week

4
13
0 0

[PATCH net-next v2 1/2] selftests: drv-net: add path helper for net/lib

by Jakub Kicinski

Looks like a lot of users of recently added env.rpath() actually want to access stuff under net/lib. Add another helper. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- tools/testing/selftests/drivers/net/hds.py | 2 +- tools/testing/selftests/drivers/net/hw/csum.py | 2 +- tools/testing/selftests/drivers/net/lib/py/env.py | 7 +++++++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hds.py b/tools/testing/selftests/drivers/net/hds.py index 7cc74faed743..def44c10349a 100755 --- a/tools/testing/selftests/drivers/net/hds.py +++ b/tools/testing/selftests/drivers/net/hds.py @@ -20,7 +20,7 @@ from lib.py import defer, ethtool, ip def _xdp_onoff(cfg): - prog = cfg.rpath("../../net/lib/xdp_dummy.bpf.o") + prog = cfg.lpath("xdp_dummy.bpf.o") ip("link set dev %s xdp obj %s sec xdp" % (cfg.ifname, prog)) ip("link set dev %s xdp off" % cfg.ifname) diff --git a/tools/testing/selftests/drivers/net/hw/csum.py b/tools/testing/selftests/drivers/net/hw/csum.py index 701aca1361e0..49ec98aef579 100755 --- a/tools/testing/selftests/drivers/net/hw/csum.py +++ b/tools/testing/selftests/drivers/net/hw/csum.py @@ -88,7 +88,7 @@ from lib.py import bkg, cmd, wait_port_listen with NetDrvEpEnv(__file__, nsim_test=False) as cfg: check_nic_features(cfg) - cfg.bin_local = cfg.rpath("../../../net/lib/csum") + cfg.bin_local = cfg.lpath("csum") cfg.bin_remote = cfg.remote.deploy(cfg.bin_local) cases = [] diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py index fd4d674e6c72..2a1f8bd0ec19 100644 --- a/tools/testing/selftests/drivers/net/lib/py/env.py +++ b/tools/testing/selftests/drivers/net/lib/py/env.py @@ -30,6 +30,13 @@ from .remote import Remote src_dir = Path(self.src_path).parent.resolve() return (src_dir / path).as_posix() + def lpath(self, path): + """ + Similar to rpath, but for files in net/lib TARGET. + """ + lib_dir = (Path(__file__).parent / "../../../../net/lib").resolve() + return (lib_dir / path).as_posix() + def _load_env_file(self): env = os.environ.copy() -- 2.48.1

10 months, 1 week

3
7
0 0

[PATCH net-next] selftests: openvswitch: don't hardcode the drop reason subsys

by Jakub Kicinski

WiFi removed one of their subsys entries from drop reasons, in commit 286e69677065 ("wifi: mac80211: Drop cooked monitor support") SKB_DROP_REASON_SUBSYS_OPENVSWITCH is now 2 not 3. The drop reasons are not uAPI, read the correct value from debug info. We need to enable vmlinux BTF, otherwise pahole needs a few GB of memory to decode the enum name. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: pshelar(a)ovn.org CC: aconole(a)redhat.com CC: amorenoz(a)redhat.com CC: linux-kselftest(a)vger.kernel.org CC: dev(a)openvswitch.org --- tools/testing/selftests/net/config | 2 ++ .../testing/selftests/net/openvswitch/openvswitch.sh | 11 ++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 5b9baf708950..3365bcc35304 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -18,6 +18,8 @@ CONFIG_DUMMY=y CONFIG_BRIDGE_VLAN_FILTERING=y CONFIG_BRIDGE=y CONFIG_CRYPTO_CHACHA20POLY1305=m +CONFIG_DEBUG_INFO_BTF=y +CONFIG_DEBUG_INFO_BTF_MODULES=n CONFIG_VLAN_8021Q=y CONFIG_GENEVE=m CONFIG_IFB=y diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 960e1ab4dd04..3c8d3455d8e7 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -330,6 +330,11 @@ test_psample() { # - drop packets and verify the right drop reason is reported test_drop_reason() { which perf >/dev/null 2>&1 || return $ksft_skip + which pahole >/dev/null 2>&1 || return $ksft_skip + + ovs_drop_subsys=$(pahole -C skb_drop_reason_subsys | + awk '/OPENVSWITCH/ { print $3; }' | + tr -d ,) sbx_add "test_drop_reason" || return $? @@ -373,7 +378,7 @@ test_drop_reason() { "in_port(2),eth(),eth_type(0x0800),ipv4(src=172.31.110.20,proto=1),icmp()" 'drop' ovs_drop_record_and_run "test_drop_reason" ip netns exec client ping -c 2 172.31.110.20 - ovs_drop_reason_count 0x30001 # OVS_DROP_FLOW_ACTION + ovs_drop_reason_count 0x${ovs_drop_subsys}0001 # OVS_DROP_FLOW_ACTION if [[ "$?" -ne "2" ]]; then info "Did not detect expected drops: $?" return 1 @@ -390,7 +395,7 @@ test_drop_reason() { ovs_drop_record_and_run \ "test_drop_reason" ip netns exec client nc -i 1 -zuv 172.31.110.20 6000 - ovs_drop_reason_count 0x30004 # OVS_DROP_EXPLICIT_ACTION_ERROR + ovs_drop_reason_count 0x${ovs_drop_subsys}0004 # OVS_DROP_EXPLICIT_ACTION_ERROR if [[ "$?" -ne "1" ]]; then info "Did not detect expected explicit error drops: $?" return 1 @@ -398,7 +403,7 @@ test_drop_reason() { ovs_drop_record_and_run \ "test_drop_reason" ip netns exec client nc -i 1 -zuv 172.31.110.20 7000 - ovs_drop_reason_count 0x30003 # OVS_DROP_EXPLICIT_ACTION + ovs_drop_reason_count 0x${ovs_drop_subsys}0003 # OVS_DROP_EXPLICIT_ACTION if [[ "$?" -ne "1" ]]; then info "Did not detect expected explicit drops: $?" return 1 -- 2.48.1

10 months, 1 week

4
3
0 0

[PATCH net-next 1/2] selftests: net: fix error message in bpf_offload

by Jakub Kicinski

We hit a following exception on timeout, nmaps is never set: Test bpftool bound info reporting (own ns)... Traceback (most recent call last): File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 1128, in <module> check_dev_info(False, "") File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 583, in check_dev_info maps = bpftool_map_list_wait(expected=2, ns=ns) File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 215, in bpftool_map_list_wait raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps)) NameError: name 'nmaps' is not defined Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org CC: bpf(a)vger.kernel.org --- tools/testing/selftests/net/bpf_offload.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/bpf_offload.py b/tools/testing/selftests/net/bpf_offload.py index fd0d959914e4..4a9be8c49561 100755 --- a/tools/testing/selftests/net/bpf_offload.py +++ b/tools/testing/selftests/net/bpf_offload.py @@ -207,9 +207,11 @@ netns = [] # net namespaces to be removed raise Exception("Time out waiting for program counts to stabilize want %d, have %d" % (expected, nprogs)) def bpftool_map_list_wait(expected=0, n_retry=20, ns=""): + nmaps = None for i in range(n_retry): maps = bpftool_map_list(ns=ns) - if len(maps) == expected: + nmaps = len(maps) + if nmaps == expected: return maps time.sleep(0.05) raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps)) -- 2.48.1

10 months, 1 week

3
4
0 0

[PATCH net-next v6 0/8] Device memory TCP TX

by Mina Almasry

v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Mina Almasry (7): net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 4 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 23 ++ include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + net/core/datagram.c | 48 ++- net/core/dev.c | 3 + net/core/devmem.c | 115 ++++++- net/core/devmem.h | 77 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 73 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 29 files changed, 950 insertions(+), 73 deletions(-) base-commit: 80c4a0015ce249cf0917a04dbb3cc652a6811079 -- 2.48.1.658.g4767266eb4-goog

10 months, 1 week

6
24
0 0

[PATCH] selftests/bpf: Move test_lwt_ip_encap to test_progs

by Bastien Curutchet (eBPF Foundation)

test_lwt_ip_encap.sh isn't used by the BPF CI. Add a new file in the test_progs framework to migrate the tests done by test_lwt_ip_encap.sh. It uses the same network topology and the same BPF programs located in progs/test_lwt_ip_encap.c. Rework the GSO part to avoid using nc and dd. Remove test_lwt_ip_encap.sh and its Makefile entry. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- tools/testing/selftests/bpf/Makefile | 3 +- .../selftests/bpf/prog_tests/lwt_ip_encap.c | 540 +++++++++++++++++++++ tools/testing/selftests/bpf/test_lwt_ip_encap.sh | 476 ------------------ 3 files changed, 541 insertions(+), 478 deletions(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index e6a02d5b87d123cef7e6b41bfbc96d34083f38e1..df4814b5200a5a0e732b19ab3a5975957fb7cbc9 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -95,7 +95,7 @@ TEST_GEN_PROGS += test_progs-cpuv4 TEST_INST_SUBDIRS += cpuv4 endif -TEST_GEN_FILES = test_lwt_ip_encap.bpf.o test_tc_edt.bpf.o +TEST_GEN_FILES = test_tc_edt.bpf.o TEST_FILES = xsk_prereqs.sh $(wildcard progs/btf_dump_test_case_*.c) # Order correspond to 'make run_tests' order @@ -104,7 +104,6 @@ TEST_PROGS := test_kmod.sh \ test_lirc_mode2.sh \ test_xdp_vlan_mode_generic.sh \ test_xdp_vlan_mode_native.sh \ - test_lwt_ip_encap.sh \ test_tc_tunnel.sh \ test_tc_edt.sh \ test_xdping.sh \ diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c new file mode 100644 index 0000000000000000000000000000000000000000..61fcded43b46cab7775237c6d85de07b5df7d87e --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c @@ -0,0 +1,540 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <netinet/in.h> + +#include "network_helpers.h" +#include "test_progs.h" + +#define BPF_FILE "test_lwt_ip_encap.bpf.o" + +#define NETNS_NAME_SIZE 32 +#define NETNS_BASE "ns-lwt-ip-encap" + +#define IP4_ADDR_1 "172.16.1.100" +#define IP4_ADDR_2 "172.16.2.100" +#define IP4_ADDR_3 "172.16.3.100" +#define IP4_ADDR_4 "172.16.4.100" +#define IP4_ADDR_5 "172.16.5.100" +#define IP4_ADDR_6 "172.16.6.100" +#define IP4_ADDR_7 "172.16.7.100" +#define IP4_ADDR_8 "172.16.8.100" +#define IP4_ADDR_GRE "172.16.16.100" + +#define IP4_ADDR_SRC IP4_ADDR_1 +#define IP4_ADDR_DST IP4_ADDR_4 + +#define IP6_ADDR_1 "fb01::1" +#define IP6_ADDR_2 "fb02::1" +#define IP6_ADDR_3 "fb03::1" +#define IP6_ADDR_4 "fb04::1" +#define IP6_ADDR_5 "fb05::1" +#define IP6_ADDR_6 "fb06::1" +#define IP6_ADDR_7 "fb07::1" +#define IP6_ADDR_8 "fb08::1" +#define IP6_ADDR_GRE "fb10::1" + +#define IP6_ADDR_SRC IP6_ADDR_1 +#define IP6_ADDR_DST IP6_ADDR_4 + +/* Setup/topology: + * + * NS1 NS2 NS3 + * veth1 <---> veth2 veth3 <---> veth4 (the top route) + * veth5 <---> veth6 veth7 <---> veth8 (the bottom route) + * + * Each vethN gets IP[4|6]_ADDR_N address. + * + * IP*_ADDR_SRC = IP*_ADDR_1 + * IP*_ADDR_DST = IP*_ADDR_4 + * + * All tests test pings from IP*_ADDR__SRC to IP*_ADDR_DST. + * + * By default, routes are configured to allow packets to go + * IP*_ADDR_1 <=> IP*_ADDR_2 <=> IP*_ADDR_3 <=> IP*_ADDR_4 (the top route). + * + * A GRE device is installed in NS3 with IP*_ADDR_GRE, and + * NS1/NS2 are configured to route packets to IP*_ADDR_GRE via IP*_ADDR_8 + * (the bottom route). + * + * Tests: + * + * 1. Routes NS2->IP*_ADDR_DST are brought down, so the only way a ping + * from IP*_ADDR_SRC to IP*_ADDR_DST can work is via IP*_ADDR_GRE. + * + * 2a. In an egress test, a bpf LWT_XMIT program is installed on veth1 + * that encaps the packets with an IP/GRE header to route to IP*_ADDR_GRE. + * + * ping: SRC->[encap at veth1:egress]->GRE:decap->DST + * ping replies go DST->SRC directly + * + * 2b. In an ingress test, a bpf LWT_IN program is installed on veth2 + * that encaps the packets with an IP/GRE header to route to IP*_ADDR_GRE. + * + * ping: SRC->[encap at veth2:ingress]->GRE:decap->DST + * ping replies go DST->SRC directly + */ + +static int create_ns(char *name, size_t name_sz) +{ + if (!name) + goto fail; + + if (!ASSERT_OK(append_tid(name, name_sz), "append TID")) + goto fail; + + SYS(fail, "ip netns add %s", name); + + /* rp_filter gets confused by what these tests are doing, so disable it */ + SYS(fail, "ip netns exec %s sysctl -wq net.ipv4.conf.all.rp_filter=0", name); + SYS(fail, "ip netns exec %s sysctl -wq net.ipv4.conf.default.rp_filter=0", name); + /* Disable IPv6 DAD because it sometimes takes too long and fails tests */ + SYS(fail, "ip netns exec %s sysctl -wq net.ipv6.conf.all.accept_dad=0", name); + SYS(fail, "ip netns exec %s sysctl -wq net.ipv6.conf.default.accept_dad=0", name); + + return 0; +fail: + return -1; +} + +static int set_top_addr(const char *ns1, const char *ns2, const char *ns3) +{ + SYS(fail, "ip -n %s a add %s/24 dev veth1", ns1, IP4_ADDR_1); + SYS(fail, "ip -n %s a add %s/24 dev veth2", ns2, IP4_ADDR_2); + SYS(fail, "ip -n %s a add %s/24 dev veth3", ns2, IP4_ADDR_3); + SYS(fail, "ip -n %s a add %s/24 dev veth4", ns3, IP4_ADDR_4); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth1", ns1, IP6_ADDR_1); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth2", ns2, IP6_ADDR_2); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth3", ns2, IP6_ADDR_3); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth4", ns3, IP6_ADDR_4); + + SYS(fail, "ip -n %s link set dev veth1 up", ns1); + SYS(fail, "ip -n %s link set dev veth2 up", ns2); + SYS(fail, "ip -n %s link set dev veth3 up", ns2); + SYS(fail, "ip -n %s link set dev veth4 up", ns3); + + return 0; +fail: + return 1; +} + +static int set_bottom_addr(const char *ns1, const char *ns2, const char *ns3) +{ + SYS(fail, "ip -n %s a add %s/24 dev veth5", ns1, IP4_ADDR_5); + SYS(fail, "ip -n %s a add %s/24 dev veth6", ns2, IP4_ADDR_6); + SYS(fail, "ip -n %s a add %s/24 dev veth7", ns2, IP4_ADDR_7); + SYS(fail, "ip -n %s a add %s/24 dev veth8", ns3, IP4_ADDR_8); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth5", ns1, IP6_ADDR_5); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth6", ns2, IP6_ADDR_6); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth7", ns2, IP6_ADDR_7); + SYS(fail, "ip -n %s -6 a add %s/128 dev veth8", ns3, IP6_ADDR_8); + + SYS(fail, "ip -n %s link set dev veth5 up", ns1); + SYS(fail, "ip -n %s link set dev veth6 up", ns2); + SYS(fail, "ip -n %s link set dev veth7 up", ns2); + SYS(fail, "ip -n %s link set dev veth8 up", ns3); + + return 0; +fail: + return 1; +} + +static int configure_vrf(const char *ns1, const char *ns2) +{ + if (!ns1 || !ns2) + goto fail; + + SYS(fail, "ip -n %s link add red type vrf table 1001", ns1); + SYS(fail, "ip -n %s link set red up", ns1); + SYS(fail, "ip -n %s route add table 1001 unreachable default metric 8192", ns1); + SYS(fail, "ip -n %s -6 route add table 1001 unreachable default metric 8192", ns1); + SYS(fail, "ip -n %s link set veth1 vrf red", ns1); + SYS(fail, "ip -n %s link set veth5 vrf red", ns1); + + SYS(fail, "ip -n %s link add red type vrf table 1001", ns2); + SYS(fail, "ip -n %s link set red up", ns2); + SYS(fail, "ip -n %s route add table 1001 unreachable default metric 8192", ns2); + SYS(fail, "ip -n %s -6 route add table 1001 unreachable default metric 8192", ns2); + SYS(fail, "ip -n %s link set veth2 vrf red", ns2); + SYS(fail, "ip -n %s link set veth3 vrf red", ns2); + SYS(fail, "ip -n %s link set veth6 vrf red", ns2); + SYS(fail, "ip -n %s link set veth7 vrf red", ns2); + + return 0; +fail: + return -1; +} + +static int configure_ns1(const char *ns1, const char *vrf) +{ + struct nstoken *nstoken = NULL; + + if (!ns1 || !vrf) + goto fail; + + nstoken = open_netns(ns1); + if (!ASSERT_OK_PTR(nstoken, "open ns1")) + goto fail; + + /* Top route */ + SYS(fail, "ip route add %s/32 dev veth1 %s", IP4_ADDR_2, vrf); + SYS(fail, "ip route add default dev veth1 via %s %s", IP4_ADDR_2, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth1 %s", IP6_ADDR_2, vrf); + SYS(fail, "ip -6 route add default dev veth1 via %s %s", IP6_ADDR_2, vrf); + /* Bottom route */ + SYS(fail, "ip route add %s/32 dev veth5 %s", IP4_ADDR_6, vrf); + SYS(fail, "ip route add %s/32 dev veth5 via %s %s", IP4_ADDR_7, IP4_ADDR_6, vrf); + SYS(fail, "ip route add %s/32 dev veth5 via %s %s", IP4_ADDR_8, IP4_ADDR_6, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth5 %s", IP6_ADDR_6, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth5 via %s %s", IP6_ADDR_7, IP6_ADDR_6, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth5 via %s %s", IP6_ADDR_8, IP6_ADDR_6, vrf); + + close_netns(nstoken); + return 0; +fail: + close_netns(nstoken); + return -1; +} + +static int configure_ns2(const char *ns2, const char *vrf) +{ + struct nstoken *nstoken = NULL; + + if (!ns2 || !vrf) + goto fail; + + nstoken = open_netns(ns2); + if (!ASSERT_OK_PTR(nstoken, "open ns2")) + goto fail; + + SYS(fail, "ip netns exec %s sysctl -wq net.ipv4.ip_forward=1", ns2); + SYS(fail, "ip netns exec %s sysctl -wq net.ipv6.conf.all.forwarding=1", ns2); + + /* Top route */ + SYS(fail, "ip route add %s/32 dev veth2 %s", IP4_ADDR_1, vrf); + SYS(fail, "ip route add %s/32 dev veth3 %s", IP4_ADDR_4, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth2 %s", IP6_ADDR_1, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth3 %s", IP6_ADDR_4, vrf); + /* Bottom route */ + SYS(fail, "ip route add %s/32 dev veth6 %s", IP4_ADDR_5, vrf); + SYS(fail, "ip route add %s/32 dev veth7 %s", IP4_ADDR_8, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth6 %s", IP6_ADDR_5, vrf); + SYS(fail, "ip -6 route add %s/128 dev veth7 %s", IP6_ADDR_8, vrf); + + close_netns(nstoken); + return 0; +fail: + close_netns(nstoken); + return -1; +} + +static int configure_ns3(const char *ns3) +{ + struct nstoken *nstoken = NULL; + + if (!ns3) + goto fail; + + nstoken = open_netns(ns3); + if (!ASSERT_OK_PTR(nstoken, "open ns3")) + goto fail; + + /* Top route */ + SYS(fail, "ip route add %s/32 dev veth4", IP4_ADDR_3); + SYS(fail, "ip route add %s/32 dev veth4 via %s", IP4_ADDR_1, IP4_ADDR_3); + SYS(fail, "ip route add %s/32 dev veth4 via %s", IP4_ADDR_2, IP4_ADDR_3); + SYS(fail, "ip -6 route add %s/128 dev veth4", IP6_ADDR_3); + SYS(fail, "ip -6 route add %s/128 dev veth4 via %s", IP6_ADDR_1, IP6_ADDR_3); + SYS(fail, "ip -6 route add %s/128 dev veth4 via %s", IP6_ADDR_2, IP6_ADDR_3); + /* Bottom route */ + SYS(fail, "ip route add %s/32 dev veth8", IP4_ADDR_7); + SYS(fail, "ip route add %s/32 dev veth8 via %s", IP4_ADDR_5, IP4_ADDR_7); + SYS(fail, "ip route add %s/32 dev veth8 via %s", IP4_ADDR_6, IP4_ADDR_7); + SYS(fail, "ip -6 route add %s/128 dev veth8", IP6_ADDR_7); + SYS(fail, "ip -6 route add %s/128 dev veth8 via %s", IP6_ADDR_5, IP6_ADDR_7); + SYS(fail, "ip -6 route add %s/128 dev veth8 via %s", IP6_ADDR_6, IP6_ADDR_7); + + /* Configure IPv4 GRE device */ + SYS(fail, "ip tunnel add gre_dev mode gre remote %s local %s ttl 255", + IP4_ADDR_1, IP4_ADDR_GRE); + SYS(fail, "ip link set gre_dev up"); + SYS(fail, "ip a add %s dev gre_dev", IP4_ADDR_GRE); + + /* Configure IPv6 GRE device */ + SYS(fail, "ip tunnel add gre6_dev mode ip6gre remote %s local %s ttl 255", + IP6_ADDR_1, IP6_ADDR_GRE); + SYS(fail, "ip link set gre6_dev up"); + SYS(fail, "ip a add %s dev gre6_dev", IP6_ADDR_GRE); + + close_netns(nstoken); + return 0; +fail: + close_netns(nstoken); + return -1; +} + +static int setup_network(char *ns1, char *ns2, char *ns3, const char *vrf) +{ + if (!ns1 || !ns2 || !ns3 || !vrf) + goto fail; + + SYS(fail, "ip -n %s link add veth1 type veth peer name veth2 netns %s", ns1, ns2); + SYS(fail, "ip -n %s link add veth3 type veth peer name veth4 netns %s", ns2, ns3); + SYS(fail, "ip -n %s link add veth5 type veth peer name veth6 netns %s", ns1, ns2); + SYS(fail, "ip -n %s link add veth7 type veth peer name veth8 netns %s", ns2, ns3); + + if (vrf[0]) { + if (!ASSERT_OK(configure_vrf(ns1, ns2), "configure vrf")) + goto fail; + } + if (!ASSERT_OK(set_top_addr(ns1, ns2, ns3), "set top addresses")) + goto fail; + + if (!ASSERT_OK(set_bottom_addr(ns1, ns2, ns3), "set bottom addresses")) + goto fail; + + if (!ASSERT_OK(configure_ns1(ns1, vrf), "configure ns1 routes")) + goto fail; + + if (!ASSERT_OK(configure_ns2(ns2, vrf), "configure ns2 routes")) + goto fail; + + if (!ASSERT_OK(configure_ns3(ns3), "configure ns3 routes")) + goto fail; + + /* Link bottom route to the GRE tunnels */ + SYS(fail, "ip -n %s route add %s/32 dev veth5 via %s %s", + ns1, IP4_ADDR_GRE, IP4_ADDR_6, vrf); + SYS(fail, "ip -n %s route add %s/32 dev veth7 via %s %s", + ns2, IP4_ADDR_GRE, IP4_ADDR_8, vrf); + SYS(fail, "ip -n %s -6 route add %s/128 dev veth5 via %s %s", + ns1, IP6_ADDR_GRE, IP6_ADDR_6, vrf); + SYS(fail, "ip -n %s -6 route add %s/128 dev veth7 via %s %s", + ns2, IP6_ADDR_GRE, IP6_ADDR_8, vrf); + + return 0; +fail: + return -1; +} + +int remove_routes_to_gredev(const char *ns1, const char *ns2, const char *vrf) +{ + SYS(fail, "ip -n %s route del %s dev veth5 %s", ns1, IP4_ADDR_GRE, vrf); + SYS(fail, "ip -n %s route del %s dev veth7 %s", ns2, IP4_ADDR_GRE, vrf); + SYS(fail, "ip -n %s -6 route del %s/128 dev veth5 %s", ns1, IP6_ADDR_GRE, vrf); + SYS(fail, "ip -n %s -6 route del %s/128 dev veth7 %s", ns2, IP6_ADDR_GRE, vrf); + + return 0; +fail: + return -1; +} + +int add_unreachable_routes_to_gredev(const char *ns1, const char *ns2, const char *vrf) +{ + SYS(fail, "ip -n %s route add unreachable %s/32 %s", ns1, IP4_ADDR_GRE, vrf); + SYS(fail, "ip -n %s route add unreachable %s/32 %s", ns2, IP4_ADDR_GRE, vrf); + SYS(fail, "ip -n %s -6 route add unreachable %s/128 %s", ns1, IP6_ADDR_GRE, vrf); + SYS(fail, "ip -n %s -6 route add unreachable %s/128 %s", ns2, IP6_ADDR_GRE, vrf); + + return 0; +fail: + return -1; +} + +#define GSO_SIZE 5000 +#define GSO_TCP_PORT 9000 +/* This tests the fix from commit ea0371f78799 ("net: fix GSO in bpf_lwt_push_ip_encap") */ +static int test_gso_fix(const char *ns1, const char *ns3, int family) +{ + const char *ip_addr = family == AF_INET ? IP4_ADDR_DST : IP6_ADDR_DST; + char gso_packet[GSO_SIZE] = {}; + struct nstoken *nstoken = NULL; + int sfd, cfd, afd; + ssize_t bytes; + int ret = -1; + + if (!ns1 || !ns3) + return ret; + + nstoken = open_netns(ns3); + if (!ASSERT_OK_PTR(nstoken, "open ns3")) + return ret; + + sfd = start_server_str(family, SOCK_STREAM, ip_addr, GSO_TCP_PORT, NULL); + if (!ASSERT_OK_FD(sfd, "start server")) + goto close_netns; + + close_netns(nstoken); + + nstoken = open_netns(ns1); + if (!ASSERT_OK_PTR(nstoken, "open ns1")) + goto close_server; + + cfd = connect_to_addr_str(family, SOCK_STREAM, ip_addr, GSO_TCP_PORT, NULL); + if (!ASSERT_OK_FD(cfd, "connect to server")) + goto close_server; + + close_netns(nstoken); + nstoken = NULL; + + afd = accept(sfd, NULL, NULL); + if (!ASSERT_OK_FD(afd, "accept")) + goto close_client; + + /* Send a packet larger than MTU */ + bytes = send(cfd, gso_packet, GSO_SIZE, 0); + if (!ASSERT_EQ(bytes, GSO_SIZE, "send packet")) + goto close_accept; + + /* Verify we received all expected bytes */ + bytes = read(afd, gso_packet, GSO_SIZE); + if (!ASSERT_EQ(bytes, GSO_SIZE, "receive packet")) + goto close_accept; + + ret = 0; + +close_accept: + close(afd); +close_client: + close(cfd); +close_server: + close(sfd); +close_netns: + close_netns(nstoken); + + return ret; +} + +static int check_ping_ok(const char *ns1) +{ + SYS(fail, "ip netns exec %s ping -c 1 -W1 -I veth1 %s > /dev/null", ns1, IP4_ADDR_DST); + SYS(fail, "ip netns exec %s ping6 -c 1 -W1 -I veth1 %s > /dev/null", ns1, IP6_ADDR_DST); + return 0; +fail: + return -1; +} + +static int check_ping_fails(const char *ns1) +{ + int ret; + + ret = SYS_NOFAIL("ip netns exec %s ping -c 1 -W1 -I veth1 %s", ns1, IP4_ADDR_DST); + if (!ret) + return -1; + + ret = SYS_NOFAIL("ip netns exec %s ping6 -c 1 -W1 -I veth1 %s", ns1, IP6_ADDR_DST); + if (!ret) + return -1; + + return 0; +} + +#define EGRESS true +#define INGRESS false +#define IPV4_ENCAP true +#define IPV6_ENCAP false +static void lwt_ip_encap(bool ipv4_encap, bool egress, const char *vrf) +{ + char ns1[NETNS_NAME_SIZE] = NETNS_BASE "-1-"; + char ns2[NETNS_NAME_SIZE] = NETNS_BASE "-2-"; + char ns3[NETNS_NAME_SIZE] = NETNS_BASE "-3-"; + char *sec = ipv4_encap ? "encap_gre" : "encap_gre6"; + + if (!vrf) + return; + + if (!ASSERT_OK(create_ns(ns1, NETNS_NAME_SIZE), "create ns1")) + goto out; + if (!ASSERT_OK(create_ns(ns2, NETNS_NAME_SIZE), "create ns2")) + goto out; + if (!ASSERT_OK(create_ns(ns3, NETNS_NAME_SIZE), "create ns3")) + goto out; + + if (!ASSERT_OK(setup_network(ns1, ns2, ns3, vrf), "setup network")) + goto out; + + /* By default, pings work */ + if (!ASSERT_OK(check_ping_ok(ns1), "ping OK")) + goto out; + + /* Remove NS2->DST routes, ping fails */ + SYS(out, "ip -n %s route del %s/32 dev veth3 %s", ns2, IP4_ADDR_DST, vrf); + SYS(out, "ip -n %s -6 route del %s/128 dev veth3 %s", ns2, IP6_ADDR_DST, vrf); + if (!ASSERT_OK(check_ping_fails(ns1), "ping expected fail")) + goto out; + + /* Install replacement routes (LWT/eBPF), pings succeed */ + if (egress) { + SYS(out, "ip -n %s route add %s encap bpf xmit obj %s sec %s dev veth1 %s", + ns1, IP4_ADDR_DST, BPF_FILE, sec, vrf); + SYS(out, "ip -n %s -6 route add %s encap bpf xmit obj %s sec %s dev veth1 %s", + ns1, IP6_ADDR_DST, BPF_FILE, sec, vrf); + } else { + SYS(out, "ip -n %s route add %s encap bpf in obj %s sec %s dev veth2 %s", + ns2, IP4_ADDR_DST, BPF_FILE, sec, vrf); + SYS(out, "ip -n %s -6 route add %s encap bpf in obj %s sec %s dev veth2 %s", + ns2, IP6_ADDR_DST, BPF_FILE, sec, vrf); + } + + if (!ASSERT_OK(check_ping_ok(ns1), "ping OK")) + goto out; + + /* Skip GSO tests with VRF: VRF routing needs properly assigned + * source IP/device, which is easy to do with ping but hard with TCP. + */ + if (egress && !vrf[0]) { + if (!ASSERT_OK(test_gso_fix(ns1, ns3, AF_INET), "test GSO")) + goto out; + } + + /* Negative test: remove routes to GRE devices: ping fails */ + if (!ASSERT_OK(remove_routes_to_gredev(ns1, ns2, vrf), "remove routes to gredev")) + goto out; + if (!ASSERT_OK(check_ping_fails(ns1), "ping expected fail")) + goto out; + + /* Another negative test */ + if (!ASSERT_OK(add_unreachable_routes_to_gredev(ns1, ns2, vrf), + "add unreachable routes")) + goto out; + ASSERT_OK(check_ping_fails(ns1), "ping expected fail"); + +out: + SYS_NOFAIL("ip netns del %s", ns1); + SYS_NOFAIL("ip netns del %s", ns2); + SYS_NOFAIL("ip netns del %s", ns3); +} + +void test_lwt_ip_encap_vrf_ipv6(void) +{ + if (test__start_subtest("egress")) + lwt_ip_encap(IPV6_ENCAP, EGRESS, "vrf red"); + + if (test__start_subtest("ingress")) + lwt_ip_encap(IPV6_ENCAP, INGRESS, "vrf red"); +} + +void test_lwt_ip_encap_vrf_ipv4(void) +{ + if (test__start_subtest("egress")) + lwt_ip_encap(IPV4_ENCAP, EGRESS, "vrf red"); + + if (test__start_subtest("ingress")) + lwt_ip_encap(IPV4_ENCAP, INGRESS, "vrf red"); +} + +void test_lwt_ip_encap_ipv6(void) +{ + if (test__start_subtest("egress")) + lwt_ip_encap(IPV6_ENCAP, EGRESS, ""); + + if (test__start_subtest("ingress")) + lwt_ip_encap(IPV6_ENCAP, INGRESS, ""); +} + +void test_lwt_ip_encap_ipv4(void) +{ + if (test__start_subtest("egress")) + lwt_ip_encap(IPV4_ENCAP, EGRESS, ""); + + if (test__start_subtest("ingress")) + lwt_ip_encap(IPV4_ENCAP, INGRESS, ""); +} diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh deleted file mode 100755 index 1e565f47aca940d8dc7235d823c48537d7a708b8..0000000000000000000000000000000000000000 --- a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh +++ /dev/null @@ -1,476 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0 -# -# Setup/topology: -# -# NS1 NS2 NS3 -# veth1 <---> veth2 veth3 <---> veth4 (the top route) -# veth5 <---> veth6 veth7 <---> veth8 (the bottom route) -# -# each vethN gets IPv[4|6]_N address -# -# IPv*_SRC = IPv*_1 -# IPv*_DST = IPv*_4 -# -# all tests test pings from IPv*_SRC to IPv*_DST -# -# by default, routes are configured to allow packets to go -# IP*_1 <=> IP*_2 <=> IP*_3 <=> IP*_4 (the top route) -# -# a GRE device is installed in NS3 with IPv*_GRE, and -# NS1/NS2 are configured to route packets to IPv*_GRE via IP*_8 -# (the bottom route) -# -# Tests: -# -# 1. routes NS2->IPv*_DST are brought down, so the only way a ping -# from IP*_SRC to IP*_DST can work is via IPv*_GRE -# -# 2a. in an egress test, a bpf LWT_XMIT program is installed on veth1 -# that encaps the packets with an IP/GRE header to route to IPv*_GRE -# -# ping: SRC->[encap at veth1:egress]->GRE:decap->DST -# ping replies go DST->SRC directly -# -# 2b. in an ingress test, a bpf LWT_IN program is installed on veth2 -# that encaps the packets with an IP/GRE header to route to IPv*_GRE -# -# ping: SRC->[encap at veth2:ingress]->GRE:decap->DST -# ping replies go DST->SRC directly - -BPF_FILE="test_lwt_ip_encap.bpf.o" -if [[ $EUID -ne 0 ]]; then - echo "This script must be run as root" - echo "FAIL" - exit 1 -fi - -readonly NS1="ns1-$(mktemp -u XXXXXX)" -readonly NS2="ns2-$(mktemp -u XXXXXX)" -readonly NS3="ns3-$(mktemp -u XXXXXX)" - -readonly IPv4_1="172.16.1.100" -readonly IPv4_2="172.16.2.100" -readonly IPv4_3="172.16.3.100" -readonly IPv4_4="172.16.4.100" -readonly IPv4_5="172.16.5.100" -readonly IPv4_6="172.16.6.100" -readonly IPv4_7="172.16.7.100" -readonly IPv4_8="172.16.8.100" -readonly IPv4_GRE="172.16.16.100" - -readonly IPv4_SRC=$IPv4_1 -readonly IPv4_DST=$IPv4_4 - -readonly IPv6_1="fb01::1" -readonly IPv6_2="fb02::1" -readonly IPv6_3="fb03::1" -readonly IPv6_4="fb04::1" -readonly IPv6_5="fb05::1" -readonly IPv6_6="fb06::1" -readonly IPv6_7="fb07::1" -readonly IPv6_8="fb08::1" -readonly IPv6_GRE="fb10::1" - -readonly IPv6_SRC=$IPv6_1 -readonly IPv6_DST=$IPv6_4 - -TEST_STATUS=0 -TESTS_SUCCEEDED=0 -TESTS_FAILED=0 - -TMPFILE="" - -process_test_results() -{ - if [[ "${TEST_STATUS}" -eq 0 ]] ; then - echo "PASS" - TESTS_SUCCEEDED=$((TESTS_SUCCEEDED+1)) - else - echo "FAIL" - TESTS_FAILED=$((TESTS_FAILED+1)) - fi -} - -print_test_summary_and_exit() -{ - echo "passed tests: ${TESTS_SUCCEEDED}" - echo "failed tests: ${TESTS_FAILED}" - if [ "${TESTS_FAILED}" -eq "0" ] ; then - exit 0 - else - exit 1 - fi -} - -setup() -{ - set -e # exit on error - TEST_STATUS=0 - - # create devices and namespaces - ip netns add "${NS1}" - ip netns add "${NS2}" - ip netns add "${NS3}" - - # rp_filter gets confused by what these tests are doing, so disable it - ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0 - ip netns exec ${NS2} sysctl -wq net.ipv4.conf.all.rp_filter=0 - ip netns exec ${NS3} sysctl -wq net.ipv4.conf.all.rp_filter=0 - ip netns exec ${NS1} sysctl -wq net.ipv4.conf.default.rp_filter=0 - ip netns exec ${NS2} sysctl -wq net.ipv4.conf.default.rp_filter=0 - ip netns exec ${NS3} sysctl -wq net.ipv4.conf.default.rp_filter=0 - - # disable IPv6 DAD because it sometimes takes too long and fails tests - ip netns exec ${NS1} sysctl -wq net.ipv6.conf.all.accept_dad=0 - ip netns exec ${NS2} sysctl -wq net.ipv6.conf.all.accept_dad=0 - ip netns exec ${NS3} sysctl -wq net.ipv6.conf.all.accept_dad=0 - ip netns exec ${NS1} sysctl -wq net.ipv6.conf.default.accept_dad=0 - ip netns exec ${NS2} sysctl -wq net.ipv6.conf.default.accept_dad=0 - ip netns exec ${NS3} sysctl -wq net.ipv6.conf.default.accept_dad=0 - - ip link add veth1 type veth peer name veth2 - ip link add veth3 type veth peer name veth4 - ip link add veth5 type veth peer name veth6 - ip link add veth7 type veth peer name veth8 - - ip netns exec ${NS2} sysctl -wq net.ipv4.ip_forward=1 - ip netns exec ${NS2} sysctl -wq net.ipv6.conf.all.forwarding=1 - - ip link set veth1 netns ${NS1} - ip link set veth2 netns ${NS2} - ip link set veth3 netns ${NS2} - ip link set veth4 netns ${NS3} - ip link set veth5 netns ${NS1} - ip link set veth6 netns ${NS2} - ip link set veth7 netns ${NS2} - ip link set veth8 netns ${NS3} - - if [ ! -z "${VRF}" ] ; then - ip -netns ${NS1} link add red type vrf table 1001 - ip -netns ${NS1} link set red up - ip -netns ${NS1} route add table 1001 unreachable default metric 8192 - ip -netns ${NS1} -6 route add table 1001 unreachable default metric 8192 - ip -netns ${NS1} link set veth1 vrf red - ip -netns ${NS1} link set veth5 vrf red - - ip -netns ${NS2} link add red type vrf table 1001 - ip -netns ${NS2} link set red up - ip -netns ${NS2} route add table 1001 unreachable default metric 8192 - ip -netns ${NS2} -6 route add table 1001 unreachable default metric 8192 - ip -netns ${NS2} link set veth2 vrf red - ip -netns ${NS2} link set veth3 vrf red - ip -netns ${NS2} link set veth6 vrf red - ip -netns ${NS2} link set veth7 vrf red - fi - - # configure addesses: the top route (1-2-3-4) - ip -netns ${NS1} addr add ${IPv4_1}/24 dev veth1 - ip -netns ${NS2} addr add ${IPv4_2}/24 dev veth2 - ip -netns ${NS2} addr add ${IPv4_3}/24 dev veth3 - ip -netns ${NS3} addr add ${IPv4_4}/24 dev veth4 - ip -netns ${NS1} -6 addr add ${IPv6_1}/128 nodad dev veth1 - ip -netns ${NS2} -6 addr add ${IPv6_2}/128 nodad dev veth2 - ip -netns ${NS2} -6 addr add ${IPv6_3}/128 nodad dev veth3 - ip -netns ${NS3} -6 addr add ${IPv6_4}/128 nodad dev veth4 - - # configure addresses: the bottom route (5-6-7-8) - ip -netns ${NS1} addr add ${IPv4_5}/24 dev veth5 - ip -netns ${NS2} addr add ${IPv4_6}/24 dev veth6 - ip -netns ${NS2} addr add ${IPv4_7}/24 dev veth7 - ip -netns ${NS3} addr add ${IPv4_8}/24 dev veth8 - ip -netns ${NS1} -6 addr add ${IPv6_5}/128 nodad dev veth5 - ip -netns ${NS2} -6 addr add ${IPv6_6}/128 nodad dev veth6 - ip -netns ${NS2} -6 addr add ${IPv6_7}/128 nodad dev veth7 - ip -netns ${NS3} -6 addr add ${IPv6_8}/128 nodad dev veth8 - - ip -netns ${NS1} link set dev veth1 up - ip -netns ${NS2} link set dev veth2 up - ip -netns ${NS2} link set dev veth3 up - ip -netns ${NS3} link set dev veth4 up - ip -netns ${NS1} link set dev veth5 up - ip -netns ${NS2} link set dev veth6 up - ip -netns ${NS2} link set dev veth7 up - ip -netns ${NS3} link set dev veth8 up - - # configure routes: IP*_SRC -> veth1/IP*_2 (= top route) default; - # the bottom route to specific bottom addresses - - # NS1 - # top route - ip -netns ${NS1} route add ${IPv4_2}/32 dev veth1 ${VRF} - ip -netns ${NS1} route add default dev veth1 via ${IPv4_2} ${VRF} # go top by default - ip -netns ${NS1} -6 route add ${IPv6_2}/128 dev veth1 ${VRF} - ip -netns ${NS1} -6 route add default dev veth1 via ${IPv6_2} ${VRF} # go top by default - # bottom route - ip -netns ${NS1} route add ${IPv4_6}/32 dev veth5 ${VRF} - ip -netns ${NS1} route add ${IPv4_7}/32 dev veth5 via ${IPv4_6} ${VRF} - ip -netns ${NS1} route add ${IPv4_8}/32 dev veth5 via ${IPv4_6} ${VRF} - ip -netns ${NS1} -6 route add ${IPv6_6}/128 dev veth5 ${VRF} - ip -netns ${NS1} -6 route add ${IPv6_7}/128 dev veth5 via ${IPv6_6} ${VRF} - ip -netns ${NS1} -6 route add ${IPv6_8}/128 dev veth5 via ${IPv6_6} ${VRF} - - # NS2 - # top route - ip -netns ${NS2} route add ${IPv4_1}/32 dev veth2 ${VRF} - ip -netns ${NS2} route add ${IPv4_4}/32 dev veth3 ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_1}/128 dev veth2 ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_4}/128 dev veth3 ${VRF} - # bottom route - ip -netns ${NS2} route add ${IPv4_5}/32 dev veth6 ${VRF} - ip -netns ${NS2} route add ${IPv4_8}/32 dev veth7 ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_5}/128 dev veth6 ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_8}/128 dev veth7 ${VRF} - - # NS3 - # top route - ip -netns ${NS3} route add ${IPv4_3}/32 dev veth4 - ip -netns ${NS3} route add ${IPv4_1}/32 dev veth4 via ${IPv4_3} - ip -netns ${NS3} route add ${IPv4_2}/32 dev veth4 via ${IPv4_3} - ip -netns ${NS3} -6 route add ${IPv6_3}/128 dev veth4 - ip -netns ${NS3} -6 route add ${IPv6_1}/128 dev veth4 via ${IPv6_3} - ip -netns ${NS3} -6 route add ${IPv6_2}/128 dev veth4 via ${IPv6_3} - # bottom route - ip -netns ${NS3} route add ${IPv4_7}/32 dev veth8 - ip -netns ${NS3} route add ${IPv4_5}/32 dev veth8 via ${IPv4_7} - ip -netns ${NS3} route add ${IPv4_6}/32 dev veth8 via ${IPv4_7} - ip -netns ${NS3} -6 route add ${IPv6_7}/128 dev veth8 - ip -netns ${NS3} -6 route add ${IPv6_5}/128 dev veth8 via ${IPv6_7} - ip -netns ${NS3} -6 route add ${IPv6_6}/128 dev veth8 via ${IPv6_7} - - # configure IPv4 GRE device in NS3, and a route to it via the "bottom" route - ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local ${IPv4_GRE} ttl 255 - ip -netns ${NS3} link set gre_dev up - ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev - ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6} ${VRF} - ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8} ${VRF} - - - # configure IPv6 GRE device in NS3, and a route to it via the "bottom" route - ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote ${IPv6_1} local ${IPv6_GRE} ttl 255 - ip -netns ${NS3} link set gre6_dev up - ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev - ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6} ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8} ${VRF} - - TMPFILE=$(mktemp /tmp/test_lwt_ip_encap.XXXXXX) - - sleep 1 # reduce flakiness - set +e -} - -cleanup() -{ - if [ -f ${TMPFILE} ] ; then - rm ${TMPFILE} - fi - - ip netns del ${NS1} 2> /dev/null - ip netns del ${NS2} 2> /dev/null - ip netns del ${NS3} 2> /dev/null -} - -trap cleanup EXIT - -remove_routes_to_gredev() -{ - ip -netns ${NS1} route del ${IPv4_GRE} dev veth5 ${VRF} - ip -netns ${NS2} route del ${IPv4_GRE} dev veth7 ${VRF} - ip -netns ${NS1} -6 route del ${IPv6_GRE}/128 dev veth5 ${VRF} - ip -netns ${NS2} -6 route del ${IPv6_GRE}/128 dev veth7 ${VRF} -} - -add_unreachable_routes_to_gredev() -{ - ip -netns ${NS1} route add unreachable ${IPv4_GRE}/32 ${VRF} - ip -netns ${NS2} route add unreachable ${IPv4_GRE}/32 ${VRF} - ip -netns ${NS1} -6 route add unreachable ${IPv6_GRE}/128 ${VRF} - ip -netns ${NS2} -6 route add unreachable ${IPv6_GRE}/128 ${VRF} -} - -test_ping() -{ - local readonly PROTO=$1 - local readonly EXPECTED=$2 - local RET=0 - - if [ "${PROTO}" == "IPv4" ] ; then - ip netns exec ${NS1} ping -c 1 -W 1 -I veth1 ${IPv4_DST} 2>&1 > /dev/null - RET=$? - elif [ "${PROTO}" == "IPv6" ] ; then - ip netns exec ${NS1} ping6 -c 1 -W 1 -I veth1 ${IPv6_DST} 2>&1 > /dev/null - RET=$? - else - echo " test_ping: unknown PROTO: ${PROTO}" - TEST_STATUS=1 - fi - - if [ "0" != "${RET}" ]; then - RET=1 - fi - - if [ "${EXPECTED}" != "${RET}" ] ; then - echo " test_ping failed: expected: ${EXPECTED}; got ${RET}" - TEST_STATUS=1 - fi -} - -test_gso() -{ - local readonly PROTO=$1 - local readonly PKT_SZ=5000 - local IP_DST="" - : > ${TMPFILE} # trim the capture file - - # check that nc is present - command -v nc >/dev/null 2>&1 || \ - { echo >&2 "nc is not available: skipping TSO tests"; return; } - - # listen on port 9000, capture TCP into $TMPFILE - if [ "${PROTO}" == "IPv4" ] ; then - IP_DST=${IPv4_DST} - ip netns exec ${NS3} bash -c \ - "nc -4 -l -p 9000 > ${TMPFILE} &" - elif [ "${PROTO}" == "IPv6" ] ; then - IP_DST=${IPv6_DST} - ip netns exec ${NS3} bash -c \ - "nc -6 -l -p 9000 > ${TMPFILE} &" - RET=$? - else - echo " test_gso: unknown PROTO: ${PROTO}" - TEST_STATUS=1 - fi - sleep 1 # let nc start listening - - # send a packet larger than MTU - ip netns exec ${NS1} bash -c \ - "dd if=/dev/zero bs=$PKT_SZ count=1 > /dev/tcp/${IP_DST}/9000 2>/dev/null" - sleep 2 # let the packet get delivered - - # verify we received all expected bytes - SZ=$(stat -c %s ${TMPFILE}) - if [ "$SZ" != "$PKT_SZ" ] ; then - echo " test_gso failed: ${PROTO}" - TEST_STATUS=1 - fi -} - -test_egress() -{ - local readonly ENCAP=$1 - echo "starting egress ${ENCAP} encap test ${VRF}" - setup - - # by default, pings work - test_ping IPv4 0 - test_ping IPv6 0 - - # remove NS2->DST routes, ping fails - ip -netns ${NS2} route del ${IPv4_DST}/32 dev veth3 ${VRF} - ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3 ${VRF} - test_ping IPv4 1 - test_ping IPv6 1 - - # install replacement routes (LWT/eBPF), pings succeed - if [ "${ENCAP}" == "IPv4" ] ; then - ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj \ - ${BPF_FILE} sec encap_gre dev veth1 ${VRF} - ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj \ - ${BPF_FILE} sec encap_gre dev veth1 ${VRF} - elif [ "${ENCAP}" == "IPv6" ] ; then - ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj \ - ${BPF_FILE} sec encap_gre6 dev veth1 ${VRF} - ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj \ - ${BPF_FILE} sec encap_gre6 dev veth1 ${VRF} - else - echo " unknown encap ${ENCAP}" - TEST_STATUS=1 - fi - test_ping IPv4 0 - test_ping IPv6 0 - - # skip GSO tests with VRF: VRF routing needs properly assigned - # source IP/device, which is easy to do with ping and hard with dd/nc. - if [ -z "${VRF}" ] ; then - test_gso IPv4 - test_gso IPv6 - fi - - # a negative test: remove routes to GRE devices: ping fails - remove_routes_to_gredev - test_ping IPv4 1 - test_ping IPv6 1 - - # another negative test - add_unreachable_routes_to_gredev - test_ping IPv4 1 - test_ping IPv6 1 - - cleanup - process_test_results -} - -test_ingress() -{ - local readonly ENCAP=$1 - echo "starting ingress ${ENCAP} encap test ${VRF}" - setup - - # need to wait a bit for IPv6 to autoconf, otherwise - # ping6 sometimes fails with "unable to bind to address" - - # by default, pings work - test_ping IPv4 0 - test_ping IPv6 0 - - # remove NS2->DST routes, pings fail - ip -netns ${NS2} route del ${IPv4_DST}/32 dev veth3 ${VRF} - ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3 ${VRF} - test_ping IPv4 1 - test_ping IPv6 1 - - # install replacement routes (LWT/eBPF), pings succeed - if [ "${ENCAP}" == "IPv4" ] ; then - ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj \ - ${BPF_FILE} sec encap_gre dev veth2 ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj \ - ${BPF_FILE} sec encap_gre dev veth2 ${VRF} - elif [ "${ENCAP}" == "IPv6" ] ; then - ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj \ - ${BPF_FILE} sec encap_gre6 dev veth2 ${VRF} - ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj \ - ${BPF_FILE} sec encap_gre6 dev veth2 ${VRF} - else - echo "FAIL: unknown encap ${ENCAP}" - TEST_STATUS=1 - fi - test_ping IPv4 0 - test_ping IPv6 0 - - # a negative test: remove routes to GRE devices: ping fails - remove_routes_to_gredev - test_ping IPv4 1 - test_ping IPv6 1 - - # another negative test - add_unreachable_routes_to_gredev - test_ping IPv4 1 - test_ping IPv6 1 - - cleanup - process_test_results -} - -VRF="" -test_egress IPv4 -test_egress IPv6 -test_ingress IPv4 -test_ingress IPv6 - -VRF="vrf red" -test_egress IPv4 -test_egress IPv6 -test_ingress IPv4 -test_ingress IPv6 - -print_test_summary_and_exit --- base-commit: 5fd21aaac37919abc5c5d0df1eb06a9f02518f27 change-id: 20250206-lwt_ip-b6a91d2787bf Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

10 months, 1 week

2
1
0 0

[PATCH] selftests/nolibc: stop testing constructor order

by Thomas Weißschuh

The execution order of constructors in undefined and depends on the toolchain. While recent toolchains seems to have a stable order, it doesn't work for older ones and may also change at any time. Stop validating the order and instead only validate that all constructors are executed. Reported-by: Willy Tarreau <w(a)1wt.eu> Closes: https://lore.kernel.org/lkml/20250301110735.GA18621@1wt.eu/ Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- tools/testing/selftests/nolibc/nolibc-test-linkage.c | 6 +++--- tools/testing/selftests/nolibc/nolibc-test.c | 8 ++++---- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/nolibc/nolibc-test-linkage.c b/tools/testing/selftests/nolibc/nolibc-test-linkage.c index 5ff4c8a1db2a46cf3f8cb55bdabaa5e8819b344c..a7ca8325863face9cd4134a717fe4c7761bdeb7f 100644 --- a/tools/testing/selftests/nolibc/nolibc-test-linkage.c +++ b/tools/testing/selftests/nolibc/nolibc-test-linkage.c @@ -11,16 +11,16 @@ void *linkage_test_errno_addr(void) return &errno; } -int linkage_test_constructor_test_value; +int linkage_test_constructor_test_value = 0; __attribute__((constructor)) static void constructor1(void) { - linkage_test_constructor_test_value = 2; + linkage_test_constructor_test_value |= 1 << 0; } __attribute__((constructor)) static void constructor2(void) { - linkage_test_constructor_test_value *= 3; + linkage_test_constructor_test_value |= 1 << 1; } diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c index a5abf16dbfe0f2aed286964fdfc391bc6201ef3b..5884a891c491544050fc35b07322c73a1a9dbaf3 100644 --- a/tools/testing/selftests/nolibc/nolibc-test.c +++ b/tools/testing/selftests/nolibc/nolibc-test.c @@ -692,14 +692,14 @@ int expect_strtox(int llen, void *func, const char *input, int base, intmax_t ex __attribute__((constructor)) static void constructor1(void) { - constructor_test_value = 1; + constructor_test_value |= 1 << 0; } __attribute__((constructor)) static void constructor2(int argc, char **argv, char **envp) { if (argc && argv && envp) - constructor_test_value *= 2; + constructor_test_value |= 1 << 1; } int run_startup(int min, int max) @@ -738,9 +738,9 @@ int run_startup(int min, int max) CASE_TEST(environ_HOME); EXPECT_PTRNZ(1, getenv("HOME")); break; CASE_TEST(auxv_addr); EXPECT_PTRGT(test_auxv != (void *)-1, test_auxv, brk); break; CASE_TEST(auxv_AT_UID); EXPECT_EQ(1, getauxval(AT_UID), getuid()); break; - CASE_TEST(constructor); EXPECT_EQ(is_nolibc, constructor_test_value, 2); break; + CASE_TEST(constructor); EXPECT_EQ(is_nolibc, constructor_test_value, 0x3); break; CASE_TEST(linkage_errno); EXPECT_PTREQ(1, linkage_test_errno_addr(), &errno); break; - CASE_TEST(linkage_constr); EXPECT_EQ(is_nolibc, linkage_test_constructor_test_value, 6); break; + CASE_TEST(linkage_constr); EXPECT_EQ(1, linkage_test_constructor_test_value, 0x3); break; case __LINE__: return ret; /* must be last */ /* note: do not set any defaults so as to permit holes above */ --- base-commit: 6e406202a44a1a37176da0333cec10d5320c4b33 change-id: 20250306-nolibc-constructor-order-6921e8c93591 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

10 months, 1 week

2
1
0 0

[PATCH v2] selftests: livepatch: test if ftrace can trace a livepatched function

by Filipe Xavier

This new test makes sure that ftrace can trace a function that was introduced by a livepatch. Signed-off-by: Filipe Xavier <felipeaggger(a)gmail.com> --- Changes in v2: - functions.sh: added reset tracing on push and pop_config. - test-ftrace.sh: enabled tracing_on before test init. - nitpick: added double quotations on filenames and fixed some wording. - Link to v1: https://lore.kernel.org/r/20250102-ftrace-selftest-livepatch-v1-1-84880baef… --- tools/testing/selftests/livepatch/functions.sh | 14 ++++++++++ tools/testing/selftests/livepatch/test-ftrace.sh | 33 ++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh index e5d06fb402335d85959bafe099087effc6ddce12..e6c13514002dae5f8d7461f90b8241ab43024ea4 100644 --- a/tools/testing/selftests/livepatch/functions.sh +++ b/tools/testing/selftests/livepatch/functions.sh @@ -62,6 +62,9 @@ function push_config() { awk -F'[: ]' '{print "file " $1 " line " $2 " " $4}') FTRACE_ENABLED=$(sysctl --values kernel.ftrace_enabled) KPROBE_ENABLED=$(cat "$SYSFS_KPROBES_DIR/enabled") + TRACING_ON=$(cat "$SYSFS_DEBUG_DIR/tracing/tracing_on") + CURRENT_TRACER=$(cat "$SYSFS_DEBUG_DIR/tracing/current_tracer") + FTRACE_FILTER=$(cat "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter") } function pop_config() { @@ -74,6 +77,17 @@ function pop_config() { if [[ -n "$KPROBE_ENABLED" ]]; then echo "$KPROBE_ENABLED" > "$SYSFS_KPROBES_DIR/enabled" fi + if [[ -n "$TRACING_ON" ]]; then + echo "$TRACING_ON" > "$SYSFS_DEBUG_DIR/tracing/tracing_on" + fi + if [[ -n "$CURRENT_TRACER" ]]; then + echo "$CURRENT_TRACER" > "$SYSFS_DEBUG_DIR/tracing/current_tracer" + fi + if [[ "$FTRACE_FILTER" == *"#"* ]]; then + echo > "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter" + elif [[ -n "$FTRACE_FILTER" ]]; then + echo "$FTRACE_FILTER" > "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter" + fi } function set_dynamic_debug() { diff --git a/tools/testing/selftests/livepatch/test-ftrace.sh b/tools/testing/selftests/livepatch/test-ftrace.sh index fe14f248913acbec46fb6c0fec38a2fc84209d39..66af5d726c52e48e5177804e182b4ff31784d5ac 100755 --- a/tools/testing/selftests/livepatch/test-ftrace.sh +++ b/tools/testing/selftests/livepatch/test-ftrace.sh @@ -61,4 +61,37 @@ livepatch: '$MOD_LIVEPATCH': unpatching complete % rmmod $MOD_LIVEPATCH" +# - verify livepatch can load +# - check if traces have a patched function +# - unload livepatch and reset trace + +start_test "trace livepatched function and check that the live patch remains in effect" + +TRACE_FILE="$SYSFS_DEBUG_DIR/tracing/trace" +FUNCTION_NAME="livepatch_cmdline_proc_show" + +load_lp $MOD_LIVEPATCH + +echo 1 > "$SYSFS_DEBUG_DIR/tracing/tracing_on" +echo $FUNCTION_NAME > "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter" +echo "function" > "$SYSFS_DEBUG_DIR/tracing/current_tracer" +echo "" > "$TRACE_FILE" + +if [[ "$(cat /proc/cmdline)" != "$MOD_LIVEPATCH: this has been live patched" ]] ; then + echo -e "FAIL\n\n" + die "livepatch kselftest(s) failed" +fi + +grep -q $FUNCTION_NAME "$TRACE_FILE" +FOUND=$? + +disable_lp $MOD_LIVEPATCH +unload_lp $MOD_LIVEPATCH + +if [ "$FOUND" -eq 1 ]; then + echo -e "FAIL\n\n" + die "livepatch kselftest(s) failed" +fi + + exit 0 --- base-commit: fc033cf25e612e840e545f8d5ad2edd6ba613ed5 change-id: 20250101-ftrace-selftest-livepatch-161fb77dbed8 Best regards, -- Filipe Xavier <felipeaggger(a)gmail.com>

10 months, 1 week

4
4
0 0

[PATCH v3 1/5] tools/nolibc: add support for openat(2)

by Louis Taylor

openat is useful to avoid needing to construct relative paths, so expose a wrapper for using it directly. Signed-off-by: Louis Taylor <louis(a)kragniz.eu> --- tools/include/nolibc/sys.h | 25 ++++++++++++++++++++ tools/testing/selftests/nolibc/nolibc-test.c | 17 +++++++++++++ 2 files changed, 42 insertions(+) diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 8f44c33b1213..3cd938f9abda 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -765,6 +765,31 @@ int mount(const char *src, const char *tgt, return __sysret(sys_mount(src, tgt, fst, flags, data)); } +/* + * int openat(int dirfd, const char *path, int flags[, mode_t mode]); + */ + +static __attribute__((unused)) +int sys_openat(int dirfd, const char *path, int flags, mode_t mode) +{ + return my_syscall4(__NR_openat, dirfd, path, flags, mode); +} + +static __attribute__((unused)) +int openat(int dirfd, const char *path, int flags, ...) +{ + mode_t mode = 0; + + if (flags & O_CREAT) { + va_list args; + + va_start(args, flags); + mode = va_arg(args, mode_t); + va_end(args); + } + + return __sysret(sys_openat(dirfd, path, flags, mode)); +} /* * int open(const char *path, int flags[, mode_t mode]); diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c index 79c3e6a845f3..e8faddcecf9d 100644 --- a/tools/testing/selftests/nolibc/nolibc-test.c +++ b/tools/testing/selftests/nolibc/nolibc-test.c @@ -1028,6 +1028,22 @@ int test_rlimit(void) return 0; } +int test_openat(void) +{ + int dev, null; + + dev = openat(AT_FDCWD, "/dev", O_DIRECTORY); + if (dev < 0) + return -1; + + null = openat(dev, "null", O_RDONLY); + close(dev); + if (null < 0) + return -1; + + close(null); + return 0; +} /* Run syscall tests between IDs <min> and <max>. * Return 0 on success, non-zero on failure. @@ -1116,6 +1132,7 @@ int run_syscall(int min, int max) CASE_TEST(mmap_munmap_good); EXPECT_SYSZR(1, test_mmap_munmap()); break; CASE_TEST(open_tty); EXPECT_SYSNE(1, tmp = open("/dev/null", 0), -1); if (tmp != -1) close(tmp); break; CASE_TEST(open_blah); EXPECT_SYSER(1, tmp = open("/proc/self/blah", 0), -1, ENOENT); if (tmp != -1) close(tmp); break; + CASE_TEST(openat_dir); EXPECT_SYSZR(1, test_openat()); break; CASE_TEST(pipe); EXPECT_SYSZR(1, test_pipe()); break; CASE_TEST(poll_null); EXPECT_SYSZR(1, poll(NULL, 0, 0)); break; CASE_TEST(poll_stdout); EXPECT_SYSNE(1, ({ struct pollfd fds = { 1, POLLOUT, 0}; poll(&fds, 1, 0); }), -1); break; -- 2.45.2

10 months, 1 week

2
2
0 0

[PATCH bpf-next v5 0/6] XDP metadata support for tun driver

by Marcus Wichelmann

Hi all, this v5 of the patch series is very similar to v4, but rebased onto the bpf-next/net branch instead of bpf-next/master. Because the commit c047e0e0e435 ("selftests/bpf: Optionally open a dedicated namespace to run test in it") is not yet included in this branch, I changed the xdp_context_tuntap test to manually create a namespace to run the test in. Not so successful pipeline: https://github.com/kernel-patches/bpf/actions/runs/13682405154 The CI pipeline failed because of veristat changes in seemingly unrelated eBPF programs. I don't think this has to do with the changes from this patch series, but if it does, please let me know what I may have to do different to make the CI pass. --- v5: - rebase onto bpf-next/net - resolve rebase conflicts - change xdp_context_tuntap test to manually create and open a network namespace using netns_new v4: https://lore.kernel.org/bpf/20250227142330.1605996-1-marcus.wichelmann@hetz… - strip unrelated changes from the selftest patches - extend commit message for "selftests/bpf: refactor xdp_context_functional test and bpf program" - the NOARP flag was not effective to prevent other packets from interfering with the tests, add a filter to the XDP program instead - run xdp_context_tuntap in a separate namespace to avoid conflicts with other tests v3: https://lore.kernel.org/bpf/20250224152909.3911544-1-marcus.wichelmann@hetz… - change the condition to handle xdp_buffs without metadata support, as suggested by Willem de Bruijn <willemb(a)google.com> - add clarifying comment why that condition is needed - set NOARP flag in selftests to ensure that the kernel does not send packets on the test interfaces that may interfere with the tests v2: https://lore.kernel.org/bpf/20250217172308.3291739-1-marcus.wichelmann@hetz… - submit against bpf-next subtree - split commits and improved commit messages - remove redundant metasize check and add clarifying comment instead - use max() instead of ternary operator - add selftest for metadata support in the tun driver v1: https://lore.kernel.org/all/20250130171614.1657224-1-marcus.wichelmann@hetz… Marcus Wichelmann (6): net: tun: enable XDP metadata support net: tun: enable transfer of XDP metadata to skb selftests/bpf: move open_tuntap to network helpers selftests/bpf: refactor xdp_context_functional test and bpf program selftests/bpf: add test for XDP metadata support in tun driver selftests/bpf: fix file descriptor assertion in open_tuntap helper drivers/net/tun.c | 28 +++- tools/testing/selftests/bpf/network_helpers.c | 28 ++++ tools/testing/selftests/bpf/network_helpers.h | 3 + .../selftests/bpf/prog_tests/lwt_helpers.h | 29 ---- .../bpf/prog_tests/xdp_context_test_run.c | 145 +++++++++++++++++- .../selftests/bpf/progs/test_xdp_meta.c | 53 +++++-- 6 files changed, 230 insertions(+), 56 deletions(-) -- 2.43.0

10 months, 1 week

2
7
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror