- Linux-kselftest-mirror - lists.linaro.org

[PATCH nf-next v8 0/3] Add IPIP flowtable SW acceleration

by Lorenzo Bianconi

Introduce SW acceleration for IPIP tunnels in the netfilter flowtable infrastructure. This series introduces basic infrastructure to accelerate other tunnel types (e.g. IP6IP6). --- Changes in v8: - Rebase on top of the following series (not yet applied) https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=477081 - Link to v7: https://lore.kernel.org/r/20251021-nf-flowtable-ipip-v7-0-a45214896106@kern… Changes in v7: - Introduce sw acceleration for tx path of IPIP tunnels - Rely on exact match during flowtable entry lookup - Fix typos - Link to v6: https://lore.kernel.org/r/20250818-nf-flowtable-ipip-v6-0-eda90442739c@kern… Changes in v6: - Rebase on top of nf-next main branch - Link to v5: https://lore.kernel.org/r/20250721-nf-flowtable-ipip-v5-0-0865af9e58c6@kern… Changes in v5: - Rely on __ipv4_addr_hash() to compute the hash used as encap ID - Remove unnecessary pskb_may_pull() in nf_flow_tuple_encap() - Add nf_flow_ip4_ecanp_pop utility routine - Link to v4: https://lore.kernel.org/r/20250718-nf-flowtable-ipip-v4-0-f8bb1c18b986@kern… Changes in v4: - Use the hash value of the saddr, daddr and protocol of outer IP header as encapsulation id. - Link to v3: https://lore.kernel.org/r/20250703-nf-flowtable-ipip-v3-0-880afd319b9f@kern… Changes in v3: - Add outer IP header sanity checks - target nf-next tree instead of net-next - Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kern… Changes in v2: - Introduce IPIP flowtable selftest - Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kern… --- Lorenzo Bianconi (3): net: netfilter: Add IPIP flowtable rx sw acceleration net: netfilter: Add IPIP flowtable tx sw acceleration selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest include/linux/netdevice.h | 16 +++ include/net/netfilter/nf_flow_table.h | 22 ++++ net/ipv4/ipip.c | 29 +++++ net/netfilter/nf_flow_table_core.c | 3 + net/netfilter/nf_flow_table_ip.c | 117 ++++++++++++++++++++- net/netfilter/nf_flow_table_path.c | 86 +++++++++++++-- .../selftests/net/netfilter/nft_flowtable.sh | 40 +++++++ 7 files changed, 298 insertions(+), 15 deletions(-) --- base-commit: 32e4b1bf1bbfe63e52e2fff7ade0aaeb805defe3 change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067 Best regards, -- Lorenzo Bianconi <lorenzo(a)kernel.org>

2 months, 1 week

3
7
0 0

[PATCH net] selftests/vsock: avoid false-positives when checking dmesg

by Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman(a)meta.com> Sometimes VMs will have some intermittent dmesg warnings that are unrelated to vsock. Change the dmesg parsing to filter on strings containing 'vsock' to avoid false positive failures that are unrelated to vsock. The downside is that it is possible for some vsock related warnings to not contain the substring 'vsock', so those will be missed. Fixes: a4a65c6fe08b ("selftests/vsock: add initial vmtest.sh for vsock") Reviewed-by: Simon Horman <horms(a)kernel.org> Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com> --- Previously was part of the series: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… --- tools/testing/selftests/vsock/vmtest.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index edacebfc1632..e1732f236d14 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -389,9 +389,9 @@ run_test() { local rc host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') - host_warn_cnt_before=$(dmesg --level=warn | wc -l) + host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock') vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') - vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock') name=$(echo "${1}" | awk '{ print $1 }') eval test_"${name}" @@ -403,7 +403,7 @@ run_test() { rc=$KSFT_FAIL fi - host_warn_cnt_after=$(dmesg --level=warn | wc -l) + host_warn_cnt_after=$(dmesg --level=warn | grep -c -i vsock) if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on host" | log_host "${name}" rc=$KSFT_FAIL @@ -415,7 +415,7 @@ run_test() { rc=$KSFT_FAIL fi - vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i vsock) if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on vm" | log_host "${name}" rc=$KSFT_FAIL --- base-commit: 255d75ef029f33f75fcf5015052b7302486f7ad2 change-id: 20251104-vsock-vmtest-dmesg-fix-b2c59e1d9c38 Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

2 months, 1 week

2
2
0 0

[PATCH net v8 0/4] net: netpoll: fix memory leak and add comprehensive selftests

by Breno Leitao

Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled. This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted. Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation. Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1] Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Changes in v8: - Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the carrier when the device goes up") has landed in net - Created one namespace for TX and one for RX (Paolo) - Used additional helpers to create and delete netdevsim (Paolo) - Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb… Changes in v7: - Rebased on top of `net` - Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb… Changes in v6: - Expand the tests even more and some small fixups - Moved the test to bonding selftests - Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb… Changes in v5: - Set CONFIG_BONDING=m in selftests/drivers/net/config. - Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb… Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb… Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub. Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb… --- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 1 + .../testing/selftests/drivers/net/bonding/Makefile | 2 + tools/testing/selftests/drivers/net/bonding/config | 4 + .../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++ .../selftests/drivers/net/lib/sh/lib_netcons.sh | 82 ++++- .../selftests/drivers/net/netcons_torture.sh | 130 ++++++++ 7 files changed, 569 insertions(+), 18 deletions(-) --- base-commit: e120f46768d98151ece8756ebd688b0e43dc8b29 change-id: 20250902-netconsole_torture-8fc23f0aca99 Best regards, -- Breno Leitao <leitao(a)debian.org>

2 months, 1 week

1
5
0 0

[PATCH net-next 0/4] mptcp: pm: in-kernel: fullmesh endp nb + bind cases

by Matthieu Baerts (NGI0)

Here is a small optimisation for the in-kernel PM, joined by a small behavioural change to avoid confusions, and followed by a few more tests. - Patch 1: record fullmesh endpoints numbers, not to iterate over all endpoints to check if one is marked as fullmesh. - Patch 2: when at least one endpoint is marked as fullmesh, only use these endpoints when reacting to an ADD_ADDR, even if there are no endpoints for this IP family: this is less confusing. - Patch 3: reduce duplicated code to prepare the next patch. - Patch 4: extra "bind" cases: the listen socket restrict the bind to one IP address, not allowing MP_JOIN to extra IP addresses, except if another listening socket accepts them. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (4): mptcp: pm: in-kernel: record fullmesh endp nb mptcp: pm: in kernel: only use fullmesh endp if any selftests: mptcp: join: do_transfer: reduce code dup selftests: mptcp: join: validate extra bind cases include/uapi/linux/mptcp.h | 3 +- net/mptcp/pm_kernel.c | 36 ++++- net/mptcp/protocol.h | 1 + net/mptcp/sockopt.c | 2 + tools/testing/selftests/net/mptcp/mptcp_connect.c | 10 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 187 +++++++++++++++++++--- 6 files changed, 213 insertions(+), 26 deletions(-) --- base-commit: 01cc760632b875c4ad0d8fec0b0c01896b8a36d4 change-id: 20251101-net-next-mptcp-fm-endp-nb-bind-cf7ab688d9f1 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

2 months, 1 week

2
5
0 0

[PATCH net] selftests: netdevsim: Fix ethtool-features.sh fail

by Wang Liang

The test 'ethtool-features.sh' failed with the below output: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-features.sh # Warning: file ethtool-features.sh is not executable # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # ethtool: bad command line argument(s) # For more information run ethtool -h # FAILED 10/10 checks not ok 1 selftests: drivers/net/netdevsim: ethtool-features.sh # exit=1 Similar to commit 18378b0e49d9 ("selftests/damon: Add executable permission to test scripts"), the script 'ethtool-features.sh' has no executable permission, which leads to the warning 'file ethtool-features.sh is not executable'. Old version ethtool (my ethtool version is 5.16) does not support command 'ethtool --json -k enp1s0', which leads to the output 'ethtool: bad command line argument(s)'. This patch adds executable permission to script 'ethtool-features.sh', and check 'ethtool --json -k' support. After this patch: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-features.sh # SKIP: No --json -k support in ethtool ok 1 selftests: drivers/net/netdevsim: ethtool-features.sh Fixes: 0189270117c3 ("selftests: netdevsim: add a test checking ethtool features") Signed-off-by: Wang Liang <wangliang74(a)huawei.com> --- .../selftests/drivers/net/netdevsim/ethtool-features.sh | 5 +++++ 1 file changed, 5 insertions(+) mode change 100644 => 100755 tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh old mode 100644 new mode 100755 index bc210dc6ad2d..f771dc6839ea --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh +++ b/tools/testing/selftests/drivers/net/netdevsim/ethtool-features.sh @@ -7,6 +7,11 @@ NSIM_NETDEV=$(make_netdev) set -o pipefail +if ! ethtool --json -k $NSIM_NETDEV > /dev/null 2>&1; then + echo "SKIP: No --json -k support in ethtool" + exit $ksft_skip +fi + FEATS=" tx-checksum-ip-generic tx-scatter-gather -- 2.34.1

2 months, 1 week

4
11
0 0

[PATCH v4 nf-next] selftests: netfilter: Add bridge_fastpath.sh

by Eric Woudstra

Add a script to test various scenarios where a bridge is involved in the fastpath. It runs tests in the forward path, and also in a bridged path. The setup is similar to a basic home router with multiple lan ports. It uses 3 pairs of veth-devices. Each or all pairs can be replaced by a pair of real interfaces, interconnected by wire. This is necessary to test the behavior when dealing with dsa ports, foreign (dsa) ports and switchdev userports that support SWITCHDEV_OBJ_ID_PORT_VLAN. See the head of the script for a detailed description. Run without arguments to perform all tests on veth-devices. Signed-off-by: Eric Woudstra <ericwouds(a)gmail.com> --- This test script is written first for the proposed bridge-fastpath patch-sets, but it's use is more general and can easily be expanded. Changes in v4: - Also only match ct state in rule without fastpath. - Dropped RFC - Cosmetics Changes in v3: - Removed all warnings reported by shellcheck -x -e SC2317 - Improved del_pppoe(), check if interfaces are removed - Added is_known_issue() to warn instead of error for known issues - Link down and (hardware) interfaces to default netns at end of script - Removed matching ip(v6) address Changes in v2: - Moved test-series to functions - Moved code to set_pair_link() up/down - Added conntrack zone to bridged traffic - Test bridge chain prerouting in test without fastpath and bridge chain forward in tests with fastpath Some example outputs of this last version of patches from different hardware, without and with patches: ALL VETH: ========= ./bridge_fastpath.sh -t Setup: CLIENT 0 veth0cl | veth0rt WAN ROUTER LAN1 LAN2 veth1rt veth2rt | | veth1cl veth2cl CLIENT 1 CLIENT 2 Without patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, with single vlan encap, without fastpath WARN: unaware bridge, with double q vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with 802.1ad vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with pppoe encap, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with pppoe-in-q encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath WARN: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: all tests passed With patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, without encaps, with fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: unaware bridge, with single vlan encap, with fastpath PASS: unaware bridge, with double q vlan encaps, without fastpath PASS: unaware bridge, with double q vlan encaps, with fastpath PASS: unaware bridge, with 802.1ad vlan encaps, without fastpath PASS: unaware bridge, with 802.1ad vlan encaps, with fastpath PASS: unaware bridge, with pppoe encap, without fastpath PASS: unaware bridge, with pppoe encap, with fastpath PASS: unaware bridge, with pppoe-in-q encaps, without fastpath PASS: unaware bridge, with pppoe-in-q encaps, with fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: all tests passed BANANAPI-R3 (lan1 & lan2 are dsa): ============ Without patches: ./bridge_fastpath.sh -t -0 enu1u2,lan2 -1 enu1u1,lan1 -2 lan4,eth1 Setup: CLIENT 0 enu1u2 | lan2 WAN ROUTER LAN1 LAN2 lan1 eth1 | | enu1u1 lan4 CLIENT 1 CLIENT 2 PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, with single vlan encap, without fastpath WARN: unaware bridge, with pppoe encap, without fastpath: ipv4/6: established bytes 0 < 4194304 WARN: unaware bridge, with pppoe-in-q encaps, without fastpath: ipv4/6: established bytes 0 < 4194304 PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath WARN: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv4: counted bytes 2110480 > 2097152 WARN: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv6: counted bytes 2116104 > 2097152 PASS: forward, without vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath WARN: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken WARN: forward, without vlan-device, with vlan encap, client1, with hw_fastpath: ipv4/6: tcp broken PASS: forward, without vlan-device, with vlan encap, client2, without fastpath WARN: forward, without vlan-device, with vlan encap, client2, with fastpath: ipv4/6: tcp broken WARN: forward, without vlan-device, with vlan encap, client2, with hw_fastpath: ipv4/6: tcp broken PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath WARN: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv4: counted bytes 2122388 > 2097152 WARN: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv6: counted bytes 2129280 > 2097152 WARN: forward, with vlan-device, without vlan encap, client2, with hw_fastpath: ipv4: counted bytes 2110428 > 2097152 WARN: forward, with vlan-device, without vlan encap, client2, with hw_fastpath: ipv6: counted bytes 2140144 > 2097152 PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath PASS: all tests passed With patches: PASS: unaware bridge, without encaps, without fastpath PASS: unaware bridge, without encaps, with fastpath PASS: unaware bridge, without encaps, with hw_fastpath PASS: unaware bridge, with single vlan encap, without fastpath PASS: unaware bridge, with single vlan encap, with fastpath PASS: unaware bridge, with single vlan encap, with hw_fastpath PASS: unaware bridge, with pppoe encap, without fastpath PASS: unaware bridge, with pppoe encap, with fastpath PASS: unaware bridge, with pppoe encap, with hw_fastpath PASS: unaware bridge, with pppoe-in-q encaps, without fastpath PASS: unaware bridge, with pppoe-in-q encaps, with fastpath PASS: unaware bridge, with pppoe-in-q encaps, with hw_fastpath PASS: aware bridge, without/without vlan encap, without fastpath PASS: aware bridge, without/without vlan encap, with fastpath PASS: aware bridge, without/without vlan encap, with hw_fastpath PASS: aware bridge, with/without vlan encap, without fastpath PASS: aware bridge, with/without vlan encap, with fastpath PASS: aware bridge, with/without vlan encap, with hw_fastpath PASS: aware bridge, with/with vlan encap, without fastpath PASS: aware bridge, with/with vlan encap, with fastpath PASS: aware bridge, with/with vlan encap, with hw_fastpath PASS: aware bridge, without/with vlan encap, without fastpath PASS: aware bridge, without/with vlan encap, with fastpath PASS: aware bridge, without/with vlan encap, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client1, without fastpath PASS: forward, without vlan-device, without vlan encap, client1, with fastpath PASS: forward, without vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, without vlan encap, client2, without fastpath PASS: forward, without vlan-device, without vlan encap, client2, with fastpath PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client1, without fastpath PASS: forward, without vlan-device, with vlan encap, client1, with fastpath PASS: forward, without vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, without vlan-device, with vlan encap, client2, without fastpath PASS: forward, without vlan-device, with vlan encap, client2, with fastpath PASS: forward, without vlan-device, with vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client1, without fastpath PASS: forward, with vlan-device, without vlan encap, client1, with fastpath PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, without vlan encap, client2, without fastpath PASS: forward, with vlan-device, without vlan encap, client2, with fastpath PASS: forward, with vlan-device, without vlan encap, client2, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client1, without fastpath PASS: forward, with vlan-device, with vlan encap, client1, with fastpath PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath PASS: forward, with vlan-device, with vlan encap, client2, without fastpath PASS: forward, with vlan-device, with vlan encap, client2, with fastpath PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath PASS: all tests passed .../testing/selftests/net/netfilter/Makefile | 1 + .../net/netfilter/bridge_fastpath.sh | 1050 +++++++++++++++++ 2 files changed, 1051 insertions(+) create mode 100755 tools/testing/selftests/net/netfilter/bridge_fastpath.sh diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile index ee2d1a5254f8..a7edc6654040 100644 --- a/tools/testing/selftests/net/netfilter/Makefile +++ b/tools/testing/selftests/net/netfilter/Makefile @@ -10,6 +10,7 @@ TEST_PROGS := \ br_netfilter.sh \ br_netfilter_queue.sh \ bridge_brouter.sh \ + bridge_fastpath.sh \ conntrack_clash.sh \ conntrack_dump_flush.sh \ conntrack_icmp_related.sh \ diff --git a/tools/testing/selftests/net/netfilter/bridge_fastpath.sh b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh new file mode 100755 index 000000000000..d09b704d7bc6 --- /dev/null +++ b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh @@ -0,0 +1,1050 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Check if conntrack, nft chain and fastpath is functional in setups +# where a bridge is in the fastpath. +# +# Commandline options make it possible to use real ethernet pairs +# instead of veth-device pairs. Any, or all, pairs can be tested using +# real hardware pairs. This is can be useful to test dsa-ports, +# switchdev (dsa) foreign ports and switchdev ports supporting +# SWITCHDEV_OBJ_ID_PORT_VLAN. +# +# First tcp is tested. Conntrack and nft chain are tested using a counter. +# When there is a fastpath possible between the interfaces then the +# fastpath is also tested. +# When there is a hardware offloaded fastpath possible between the +# interfaces then the hardware offloaded path is also tested. +# +# Setup is as a typical router: +# +# nsclientwan +# | +# nsrt +# | | +# nsclient1 nsclient2 +# +# Masquerading for ipv4 only. +# +# First check if a bridge table forward chain can be setup, skip +# these tests if this is not possible. +# Then check if a inet table forward chain can be setup, skip +# these tests if this is not possible. +# +# Different setups of paths are tested that involve a bridge in the +# fastpath. This can be in the forward-fastpath or in the bridge-fastpath. +# +# The first series, in the bridge-fastpath, using a vlan-unaware bridge. +# Traffic with the following vlan-tags is checked: +# a. without vlan +# b. single vlan +# c. double q vlan (only on veth-devices) +# d. 802.1ad vlan (only on veth-devices) +# e. pppoe (when available) +# f. pppoe-in-q (when available) +# +# (for items c to f fastpath can only work when a conntrack zone is set) +# (double tag testing results in broken tcp traffic on most hardware, +# in this test setup, use '-a' argument to test it anyway) +# (pppoe testing takes place if pppd and pppoe-server are installed) +# +# The second series, in the bridge-fastpath, using a vlan-aware bridge. +# Here we test all combinations of ingress/egress with or without single +# vlan encaps. +# +# The third series, in the forward-fastpath, using a vlan-aware bridge, +# without a vlan-device linked to the master port. We test the same combinations +# of ingress/egress with or without single vlan encaps. +# +# The fourth series, in the forward-fastpath, using a vlan-aware bridge, +# with a vlan-device linked to the master port. We test the same combinations +# of ingress/egress with or without single vlan encaps. +# +# Note 1: Using dsa userports on both sides of eth-pairs client1 or client2 +# gives erratic and unpredictable results. Use, for example, an usb-eth device +# on the client side to test a dsa-userport. +# +# Note 2: Testing the hardware offloaded fastpath, it is not checked if the +# packets do not follow the software fastpath instead. A universal way to +# check this should be added at some point. +# +# Note 3: Some interfaces to test on the router side, are netns immutable. +# Use the -d or --defaultnsrouter option so that the interfaces of the router +# do not have to change netns. The router is build up in the default netns. +# + +source lib.sh + +checktool "nft --version" "run test without nft" +checktool "socat -h" "run test without socat" +checktool "bridge -V" "run test without bridge" + +NR_OF_TESTS=4 +VID1=100 +VID2=101 +BRWAN=brwan +BRLAN=brlan +BRCL=brcl +LINKUP_TIMEOUT=10 +PING_TIMEOUT=10 +SOCAT_TIMEOUT=10 +filesize=$((2 * 1024 * 1024)) + +filein=$(mktemp) +file1out=$(mktemp) +file2out=$(mktemp) +pppoeserveroptions=$(mktemp) +pppoeserverpid=$(mktemp) + +setup_ns nsclientwan nsclientlan1 nsclientlan2 + + WAN=0 ; LAN1=1 ; LAN2=2 ; ADWAN=3 ; ADLAN=4 +nsa=( "$nsclientwan" "$nsclientlan1" "$nsclientlan2" ) # $nsrt $nsrt +AD4=( '192.168.1.1' '192.168.2.101' '192.168.2.102' '192.168.1.2' '192.168.2.1' ) +AD6=( 'dead:1::1' 'dead:2::101' 'dead:2::102' 'dead:1::2' 'dead:2::1' ) + +tests_string=$(seq 1 $NR_OF_TESTS) + +while [ "${1:-}" != '' ]; do + case "$1" in + '-0' | '--pairwan') + shift + vethcl[WAN]="${1%,*}" + vethrt[WAN]="${1#*,}" + ;; + '-1' | '--pairlan1') + shift + vethcl[LAN1]="${1%,*}" + vethrt[LAN1]="${1#*,}" + ;; + '-2' | '--pairlan2') + shift + vethcl[LAN2]="${1%,*}" + vethrt[LAN2]="${1#*,}" + ;; + '-s' | '--filesize') + shift + filesize=$1 + ;; + '-p' | '--parts') + shift + tests_string=$1 + ;; + '-4' | '--ipv4') + do_ipv4=1 + ;; + '-6' | '--ipv6') + do_ipv6=1 + ;; + '-n' | '--noskip') + noskip=1 + ;; + '-d' | '--defaultnsrouter') + defaultnsrouter=1 + ;; + '-f' | '--fixmac') + fixmac=1 + ;; + '-t' | '--showtree') + showtree=1 + ;; + *) + cat <<-EOF + Usage: $(basename "$0") [OPTION]... + -0 --pairwan eth0cl,eth0rt pair of real interfaces to use on wan side + -1 --pairlan1 eth1cl,eth1rt pair of real interfaces to use on lan1 side + -2 --pairlan2 eth2cl,eth2rt pair of real interfaces to use on lan2 side + -s --filesize filesize to use for testing in bytes + -p --parts partnumbers of tests to run, comma separated + -4|-6 --ipv4|--ipv6 test ipv4/6 only + -d --defaultnsrouter router in default network namespace, caution! + -f --fixmac change mac address when conflict found + -n --noskip also perform the normally skipped tests + -t --showtree show the tree of used interfaces + EOF + exit "$ksft_skip" + ;; + esac + shift +done + +for i in ${tests_string//','/' '}; do + tests[i]="yes" +done + +if [ -n "$defaultnsrouter" ]; then + nsrt="nsrt-$(mktemp -u XXXXXX)" + touch "/var/run/netns/$nsrt" + mount --bind /proc/1/ns/net "/var/run/netns/$nsrt" +else + setup_ns nsrt +fi +nsa+=("$nsrt" "$nsrt") + +cleanup() { + if [ -n "$defaultnsrouter" ]; then + umount "/var/run/netns/$nsrt" + rm -f "/var/run/netns/$nsrt" + fi + cleanup_all_ns + rm -f "$filein" "$file1out" "$file2out" "$pppoeserveroptions" "$pppoeserverpid" +} + +trap cleanup EXIT + +head -c "$filesize" < /dev/urandom > "$filein" + +check_mac() +{ + local ns=$1 + local dev=$2 + local othermacs=$3 + local mac + + mac=$(ip -net "$ns" -br link show dev "$dev" | \ + grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') + + if [[ ! "$othermacs" =~ $mac ]]; then + echo "$mac" + return 0 + fi + echo "WARN: Conflicting mac address $dev $mac" 1>&2 + + [ -z "$fixmac" ] && return 1 + + for (( j = 0 ; j < 10 ; j++ )); do + mac="${mac::6}$(printf %02x:%02x:%02x:%02x $((RANDOM%256)) \ + $((RANDOM%256)) $((RANDOM%256)) $((RANDOM%256)))" + [[ "$othermacs" =~ $mac ]] && continue + echo "$mac" + ip -net "$ns" link set dev "$dev" address "$mac" 1>&2 + return $? + done + return 1 +} + +is_link() +{ + local updown=$1 + local ns=$2 + local dev=$3 + + if ip -net "$ns" link show dev "$dev" "${updown,,}" 2>/dev/null | \ + grep -q "state ${updown^^}" + then + return 0 + fi + return 1 +} + +set_pair_link() +{ + local updown=$1 + local all="${*:2}" + local lret=0 + local i j + + for i in $all; do + ns="${nsa[$i]}" + ip -net "$ns" link set "${vethcl[$i]}" "$updown" + lret=$((lret | $?)) + ip -net "$nsrt" link set "${vethrt[$i]}" "$updown" + lret=$((lret | $?)) + done + [ $lret -ne 0 ] && return 1 + + for j in $(seq 1 $((LINKUP_TIMEOUT * 5 ))); do + lret=0 + for i in $all; do + ns="${nsa[$i]}" + is_link "$updown" "$ns" "${vethcl[$i]}" + lret=$((lret | $?)) + is_link "$updown" "$nsrt" "${vethrt[$i]}" + lret=$((lret | $?)) + done + [ $lret -eq 0 ] && break + sleep 0.2 + done + return $lret +} + +wait_ping() +{ + local i1=$1 + local i2=$2 + local ns1=${nsa[$i1]} + local j + local lret + + for j in $(seq 1 $((PING_TIMEOUT * 5 ))); do + ip netns exec "$ns1" ping -c 1 -w $PING_TIMEOUT -i 0.2 \ + -q "${AD4[$i2]}" >/dev/null 2>&1 + lret=$? + [ $lret -le 1 ] && return $lret + sleep 0.2 + done + return 1 +} + +add_addr() +{ + local i=$1 + local dev=$2 + local ns=${nsa[$i]} + local ad4=${AD4[$i]} + local ad6=${AD6[$i]} + + ip -net "$ns" addr add "${ad4}/24" dev "$dev" + ip -net "$ns" addr add "${ad6}/64" dev "$dev" nodad + if [[ "$ns" == "nsclientlan"* ]]; then + ip -net "$ns" route add default via "${AD4[$ADLAN]}" + ip -net "$ns" route add default via "${AD6[$ADLAN]}" + elif [[ "$ns" == "nsclientwan"* ]]; then + ip -net "$ns" route add default via "${AD6[$ADWAN]}" + fi + +} + +del_addr() +{ + local i=$1 + local dev=$2 + local ns=${nsa[$i]} + local ad4=${AD4[$i]} + local ad6=${AD6[$i]} + + if [[ "$ns" == "nsclientlan"* ]]; then + ip -net "$ns" route del default via "${AD6[$ADLAN]}" + ip -net "$ns" route del default via "${AD4[$ADLAN]}" + elif [[ "$ns" == "nsclientwan"* ]]; then + ip -net "$ns" route del default via "${AD6[$ADWAN]}" + fi + ip -net "$ns" addr del "${ad6}/64" dev "$dev" nodad + ip -net "$ns" addr del "${ad4}/24" dev "$dev" +} + +set_client() +{ + local i=$1 + local vlan=$2 + local arg=$3 + local ns=${nsa[$i]} + local vdev="${vethcl[$i]}" + local brdev="$BRCL" + local proto="" + local pvidslave="" + + unset_client "$i" + + if [[ "$vlan" == "qq" ]]; then + ip -net "$ns" link add link "$vdev" name "$vdev.$VID1" type vlan id $VID1 + ip -net "$ns" link add link "$vdev.$VID1" name "$vdev.$VID1.$VID2" \ + type vlan id $VID2 + ip -net "$ns" link set "$vdev.$VID1" up + ip -net "$ns" link set "$vdev.$VID1.$VID2" up + add_addr "$i" "$vdev.$VID1.$VID2" + return + fi + + [[ "$vlan" == "none" ]] && pvidslave="pvid untagged" + [[ "$vlan" == "ad" ]] && proto="vlan_protocol 802.1ad" + + # shellcheck disable=SC2086 + ip -net "$ns" link add "$brdev" type bridge vlan_filtering 1 vlan_default_pvid 0 $proto + ip -net "$ns" link set "$vdev" master "$brdev" + ip -net "$ns" link set "$brdev" up + + # shellcheck disable=SC2086 + bridge -net "$ns" vlan add dev "$vdev" vid $VID1 $pvidslave + bridge -net "$ns" vlan add dev "$brdev" vid $VID1 pvid untagged self + + if [[ "$vlan" == "ad" ]]; then + ip -net "$ns" link add link "$brdev" name "$brdev.$VID2" type vlan id $VID2 + brdev="$brdev.$VID2" + ip -net "$ns" link set "$brdev" up + fi + + if [[ "$arg" != "noaddress" ]]; then + add_addr "$i" "$brdev" + fi +} + +unset_client() +{ + local i=$1 + local ns=${nsa[$i]} + local vdev="${vethcl[$i]}" + local brdev="$BRCL" + + ip -net "$ns" link del "$brdev" type bridge 2>/dev/null + ip -net "$ns" link del "$vdev.$VID1" 2>/dev/null +} + +add_pppoe() +{ + local i1=$1 + local i2=$2 + local dev1=$3 + local dev2=$4 + local desc=$5 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + + ppp1=0 + while [ -n "$(ip -net "$ns1" link show ppp$ppp1 2>/dev/null)" ] + do ((ppp1++)); done + echo "noauth defaultroute noipdefault unit $ppp1" >"$pppoeserveroptions" + ppp1="ppp$ppp1" + + if ! ip netns exec "$ns1" pppoe-server -k -L "${AD4[$i1]}" -R "${AD4[$i2]}" \ + -I "$dev1" -X "$pppoeserverpid" -O "$pppoeserveroptions" >/dev/null; then + echo "ERROR: $desc: failed to setup pppoe server" 1>&2 + return 1 + fi + + if ! ip netns exec "$ns2" pppd plugin pppoe.so nic-"$dev2" persist holdoff 0 noauth \ + defaultroute noipdefault noaccomp nodeflate noproxyarp nopcomp \ + novj novjccomp linkname "selftest-$$" >/dev/null; then + echo "ERROR: $desc: failed to setup pppoe client" 1>&2 + return 1 + fi + + if ! wait_ping "$i1" "$i2"; then + echo "ERROR: $desc: failed to setup functional pppoe connection" 1>&2 + return 1 + fi + + ppp2=$(tail -n 1 < "/run/pppd/ppp-selftest-$$.pid") + + ip -net "$ns1" addr add "${AD6[$i1]}/64" dev "$ppp1" nodad + ip -net "$ns2" addr add "${AD6[$i2]}/64" dev "$ppp2" nodad + + return 0 +} + +del_pppoe() +{ + local i1=$1 + local i2=$2 + local dev1=$3 + local dev2=$4 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + local i serverpid clientpid + + serverpid="$(head -n 1 < "$pppoeserverpid")" + clientpid="$(head -n 1 < "/run/pppd/ppp-selftest-$$.pid")" + + [[ -n "$ppp1" ]] && ip -net "$ns1" addr del "${AD6[$i1]}/64" dev "$ppp1" + [[ -n "$ppp2" ]] && ip -net "$ns2" addr del "${AD6[$i2]}/64" dev "$ppp2" + + for i in $(seq 1 $((PING_TIMEOUT * 5 ))); do + if ip -net "$ns2" link show dev "$ppp2" 1>/dev/null 2>/dev/null; then + kill -9 "$clientpid" 2>/dev/null + elif ip -net "$ns1" link show dev "$ppp1" 1>/dev/null 2>/dev/null; then + kill -SIGTERM "$serverpid" 2>/dev/null + else return 0 + fi + sleep 0.2 + done + echo "ERROR: failed to remove pppoe connection" 1>&2 + return 1 +} + +listener_ready() +{ + local ns=$1 + local ipv=$2 + + ss -N "$ns" --ipv"$ipv" -lnt -o "sport = :8080" | grep -q 8080 +} + +test_tcp() { + local i1=$1 + local i2=$2 + local dofast=$3 + local desc=$4 + local ns1=${nsa[$i1]} + local ns2=${nsa[$i2]} + local i=-1 + local lret=0 + local ads="" + local ipv ad a lpid bytes error + + if [ -n "$do_ipv4" ]; then ads="${AD4[$i2]}" + elif [ -n "$do_ipv6" ]; then ads="${AD6[$i2]}" + else ads="${AD4[$i2]} ${AD6[$i2]}" + fi + for ad in $ads; do + ((i++)) + if [[ "$ad" =~ ":" ]] + then ipv="6"; a="[${ad}]" + else ipv="4"; a="${ad}" + fi + + rm -f "$file1out" "$file2out" + + # ip netns exec "$nsrt" nft reset counters >/dev/null + # But on some systems this results in 4GB values in packet and byte count, so: + (echo "flush ruleset"; ip netns exec "$nsrt" nft --stateless list ruleset) | \ + ip netns exec "$nsrt" nft -f - + + timeout "$SOCAT_TIMEOUT" ip netns exec "$ns2" socat TCP$ipv-LISTEN:8080,reuseaddr \ + STDIO <"$filein" >"$file2out" 2>/dev/null & + lpid=$! + busywait 1000 listener_ready "$ns2" "$ipv" + + timeout "$SOCAT_TIMEOUT" ip netns exec "$ns1" socat TCP$ipv:"$a":8080 \ + STDIO <"$filein" >"$file1out" 2>/dev/null + + if ! wait $lpid; then + error[i]="tcp broken" + continue + fi + if ! cmp "$filein" "$file1out" >/dev/null 2>&1; then + error[i]="file mismatch to ${ad}" + continue + fi + if ! cmp "$filein" "$file2out" >/dev/null 2>&1; then + error[i]="file mismatch from ${ad}" + continue + fi + + bytes=$(ip netns exec "$nsrt" nft list counter $family filter "check" | \ + grep "packets" | cut -d' ' -f4) + if [ -z "$dofast" ] && [ "$bytes" -lt "$((2 * filesize))" ]; then + + error[i]="established bytes $bytes < $((2 * filesize))" + continue + fi + if [ -n "$dofast" ] && [ "$bytes" -gt "$filesize" ]; then + # Significant reduction of bytes expected + error[i]="counted bytes $bytes > $filesize" + continue + fi + + done + + if [ -n "${error[0]}" ]; then + if [[ "${error[0]}" == "${error[1]}" ]]; then + error[0]="$desc: ipv4/6: ${error[0]}" + error[1]="" + else + error[0]="$desc: ipv4: ${error[0]}" + fi + fi + if [ -n "${error[1]}" ]; then + error[1]="$desc: ipv6: ${error[1]}" + fi + + for i in 0 1; do + if [ -n "${error[i]}" ]; then + if is_known_issue "$desc: ${error[i]}"; then + echo "WARN: ${error[i]}" 1>&2 + lret=$((lret | 1)) + else + echo "ERROR: ${error[i]}" 1>&2 + lret=$((lret | 2)) + fi + fi + done + if [ $lret -eq 0 ]; then + echo "PASS: $desc" + fi + return $(( lret & 2 )) +} + +known_issues=( +'*unaware bridge,*with double q vlan encaps,*without fastpath*established*' # 1 +'*unaware bridge,*with 802.1ad vlan encaps,*without fastpath*established*' # 1 +'*unaware bridge,*with pppoe encap,*without fastpath*established*' # 1 +'*unaware bridge,*with pppoe-in-q encaps,*without fastpath*established*' # 1 +'*forward,*without vlan-device, without vlan encap,*with *fastpath:*counted*' # 2 +'*forward,*without vlan-device, with vlan encap,*with *fastpath:*tcp broken*' # 3 +'*forward,*with vlan-device, without vlan encap,*with *fastpath:*counted*' # 4 +) + +is_known_issue() { + local err=$1 + for issue in "${known_issues[@]}"; do + # shellcheck disable=SC2053 + [[ "$err" == $issue ]] && return 0 + done + return 1 +} + +test_paths() { + local i1=$1 + local i2=$2 + local desc=$3 + + if ! setup_nftables "$i1" "$i2"; then + echo "ERROR: $desc: cannot setup nftables" 1>&2 + return 1 + fi + if ! test_tcp "$i1" "$i2" "" "$desc without fastpath"; then + return 1 + fi + + if ! setup_fastpath "$i1" "$i2" "" 2>/dev/null; then + return 0 + fi + if ! test_tcp "$i1" "$i2" "fast" "$desc with fastpath"; then + return 1 + fi + + if ! setup_fastpath "$i1" "$i2" "hw" 2>/dev/null; then + return 0 + fi + if ! test_tcp "$i1" "$i2" "fast" "$desc with hw_fastpath"; then + return 1 + fi + + return 0 + +} + +add_masq() +{ + if [[ $family != "bridge" ]]; then + ip netns exec "$nsrt" nft -f - <<-EOF + table ip nat { + chain postrouting { + type nat hook postrouting priority 0; + oifname ${BRWAN} masquerade + } + } + EOF + else + return 0 + fi +} + +add_zone() +{ + local devs=$1 + + if [[ $family == "bridge" ]]; then + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + chain preroutingzones { + type filter hook prerouting priority -300; + iif ${devs} ct zone set 23 + } + } + EOF + fi +} + +setup_nftables() +{ + local devs="{ ${vethrt[$1]} , ${vethrt[$2]} }" + local i1=$1 + local i2=$2 + + ip netns exec "$nsrt" nft flush ruleset + + if ! add_masq; then + return 1 + fi + + add_zone "${devs}" 2>/dev/null + + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + counter check { } + chain prerouting { + type filter hook prerouting priority 0; policy accept; + ct state established counter name "check" + } + } + EOF +} + +setup_fastpath() +{ + local devs="{ ${vethrt[$1]} , ${vethrt[$2]} }" + local arg=$3 + local flags="" + + [[ "$arg" == "hw" ]] && flags="flags offload" + + ip netns exec "$nsrt" nft flush ruleset + + if ! add_masq; then + return 1 + fi + + add_zone "${devs}" 2>/dev/null + + ip netns exec "$nsrt" nft -f - <<-EOF + table ${family} filter { + counter check { } + flowtable f { + hook ingress priority filter + devices = ${devs} + ${flags} + } + chain forward { + type filter hook forward priority 0; policy accept; + counter name "check" + ct state established flow add @f + } + } + EOF +} + +test_unaware_bridge() +{ + local lret=0 + local i + + for i in $LAN1 $LAN2; do + set_client "$i" none + done + + test_paths $LAN1 $LAN2 "unaware bridge, without encaps, " + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" q + done + + test_paths $LAN1 $LAN2 "unaware bridge, with single vlan encap, " + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" qq + done + + # Skip testing double tagged packets on real hardware + if [ -n "$lan_all_veth" ] || [ -n "$noskip" ]; then + + test_paths $LAN1 $LAN2 "unaware bridge, with double q vlan encaps, " + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" ad + done + + test_paths $LAN1 $LAN2 "unaware bridge, with 802.1ad vlan encaps, " + lret=$((lret | $?)) + + fi + # End Skip testing double tagged packets + + if [ -n "$(command -v pppd 2>/dev/null)" ] && + [ -n "$(command -v pppoe-server 2>/dev/null)" ]; then + # Start pppoe + + for i in $LAN1 $LAN2; do + set_client "$i" none noaddress + done + + if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe encap"; then + test_paths $LAN1 $LAN2 "unaware bridge, with pppoe encap, " + lret=$((lret | $?)) + fi + + del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" + lret=$((lret | $?)) + + for i in $LAN1 $LAN2; do + set_client "$i" q noaddress + done + + if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe-in-q encaps"; then + test_paths $LAN1 $LAN2 "unaware bridge, with pppoe-in-q encaps, " + lret=$((lret | $?)) + fi + + del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" + lret=$((lret | $?)) + + # End pppoe + fi + + for i in $LAN1 $LAN2; do + unset_client "$i" + done + return $lret +} + +test_aware_bridge() +{ + local lret=0 + local i + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged + set_client "$i" none + done + test_paths $LAN1 $LAN2 "aware bridge, without/without vlan encap," + lret=$((lret | $?)) + + i=$LAN1 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 + set_client $i q + + test_paths $LAN1 $LAN2 "aware bridge, with/without vlan encap, " + lret=$((lret | $?)) + + i=$LAN2 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 + set_client $i q + + test_paths $LAN1 $LAN2 "aware bridge, with/with vlan encap, " + lret=$((lret | $?)) + + i=$LAN1 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged + set_client $i none + + test_paths $LAN1 $LAN2 "aware bridge, without/with vlan encap, " + lret=$((lret | $?)) + + i=$LAN1 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + unset_client $i + i=$LAN2 + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 + unset_client $i + + return $lret +} + +test_forward_without_vlandev() +{ + local wo=$1 + local lret=0 + local i + + [[ "$wo" == "" ]] && wo="without" + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged + set_client "$i" none + done + + test_paths $LAN1 $WAN "forward, $wo vlan-device, without vlan encap, client1," + lret=$((lret | $?)) + if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then + test_paths $LAN2 $WAN "forward, $wo vlan-device, without vlan encap, client2," + lret=$((lret | $?)) + fi + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged + bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 + set_client "$i" q + done + + test_paths $LAN1 $WAN "forward, $wo vlan-device, with vlan encap, client1," + lret=$((lret | $?)) + if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then + test_paths $LAN2 $WAN "forward, $wo vlan-device, with vlan encap, client2," + lret=$((lret | $?)) + fi + + for i in $LAN1 $LAN2; do + bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 + unset_client "$i" + done + return $lret +} + +test_forward_with_vlandev() +{ + test_forward_without_vlandev "with" + return $? +} + +ret=0 +### Start Initial Setup ### + +for i in 4 6; do + ip netns exec "$nsrt" sysctl -q net.ipv$i.conf.all.forwarding=1 +done + +### Use brwan to make sure software fastpath is ### +### direct xmit in other direction also ### + +ip -net "$nsrt" link add $BRWAN type bridge +ret=$((ret | $?)) +ip -net "$nsrt" link set $BRWAN up +ret=$((ret | $?)) +if [ $ret -ne 0 ]; then + echo "SKIP: Can't create bridge" + exit "$ksft_skip" +fi + +# If both lan clients are veth-devices, only test 1 in the forward path +if [ -z "${vethcl[$LAN1]}" ] && [ -z "${vethcl[$LAN2]}" ]; then + lan_all_veth=1 +fi + +for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + if [ -z "${vethcl[$i]}" ]; then + vethcl[i]="veth${i}cl" + vethrt[i]="veth${i}rt" + ip link add "${vethcl[$i]}" netns "$ns" type veth \ + peer name "${vethrt[$i]}" netns "$nsrt" + ret=$((ret | $?)) + else # Use pair of interconnected hardware interfaces + ip link set "${vethrt[$i]}" netns "$nsrt" + ret=$((ret | $?)) + ip link set "${vethcl[$i]}" netns "$ns" + ret=$((ret | $?)) + fi +done +if [ $ret -ne 0 ]; then + echo "SKIP: (v)eth pairs cannot be used" + exit "$ksft_skip" +fi + +if [ -n "$showtree" ]; then + cat <<-EOF + Setup: + CLIENT 0 + ${vethcl[$WAN]} + | + ${vethrt[$WAN]} + WAN + ROUTER + LAN1 LAN2 + $(printf "%14.14s" "${vethrt[$LAN1]}") ${vethrt[$LAN2]} + | | + $(printf "%14.14s" "${vethcl[$LAN1]}") ${vethcl[$LAN2]} + CLIENT 1 CLIENT 2 + + EOF +fi + +for n in nsclientwan nsclientlan; do + routerside=""; clientside="" + for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + [[ "$ns" != "$n"* ]] && continue + mac=$(check_mac "$ns" "${vethcl[$i]}" "$routerside $clientside") + ret=$((ret | $?)) + clientside+=" $mac" + mac=$(check_mac "$nsrt" "${vethrt[$i]}" "$clientside") + ret=$((ret | $?)) + routerside+=" $mac" + done +done +if [ $ret -ne 0 ]; then + echo "SKIP: conflicting mac address" + exit "$ksft_skip" +fi + +set_pair_link up $WAN $LAN1 $LAN2 +ret=$((ret | $?)) +if [ $ret -ne 0 ]; then + echo "SKIP: setting (v)eth pairs link up failed" + exit "$ksft_skip" +fi + +i=$WAN +ip -net "$nsrt" link set "${vethrt[$i]}" master $BRWAN +set_client $i none +add_addr $ADWAN "$BRWAN" + +family="bridge" +if ! setup_nftables $LAN1 $LAN2 2>/dev/null; then + echo "INFO: Cannot add nftables table $family" + tests[1]=""; tests[2]="" +fi +family="inet" +if ! setup_nftables $WAN $LAN1 2>/dev/null; then + echo "INFO: Cannot add nftables table $family" + tests[3]=""; tests[4]="" +fi + +### End Initial Setup ### + +if [ -n "${tests[1]}" ]; then + # Setup brlan as vlan unaware bridge + family="bridge" + ip -net "$nsrt" link add $BRLAN type bridge + ip -net "$nsrt" link set $BRLAN up + for i in $LAN1 $LAN2; do + ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN + done + test_unaware_bridge + ret=$((ret | $?)) + ip -net "$nsrt" link del $BRLAN type bridge +fi + +if [ -n "${tests[2]}" ] || [ -n "${tests[3]}" ] || [ -n "${tests[4]}" ]; then + # Setup brlan as vlan aware bridge + family="bridge" + + ip -net "$nsrt" link add $BRLAN type bridge vlan_filtering 1 vlan_default_pvid 0 + ip -net "$nsrt" link set $BRLAN up + bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 pvid untagged self + add_addr $ADLAN "$BRLAN" + for i in $LAN1 $LAN2; do + ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN + done + + if [ -n "${tests[2]}" ]; then + test_aware_bridge + ret=$((ret | $?)) + fi + + family="inet" + + if [ -n "${tests[3]}" ]; then + test_forward_without_vlandev + ret=$((ret | $?)) + fi + + if [ -n "${tests[4]}" ]; then + # Setup vlan-device linked to brlan master port + del_addr $ADLAN "$BRLAN" + ip -net "$nsrt" link set $BRLAN down + bridge -net "$nsrt" vlan del dev $BRLAN vid $VID1 pvid untagged self + bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 self + ip -net "$nsrt" link add link $BRLAN name $BRLAN.$VID1 type vlan id $VID1 + ip -net "$nsrt" link set $BRLAN up + ip -net "$nsrt" link set "$BRLAN.$VID1" up + add_addr $ADLAN "$BRLAN.$VID1" + test_forward_with_vlandev + ret=$((ret | $?)) + fi + + ip -net "$nsrt" link del $BRLAN type bridge +fi + +### Finish tests ### + +ip -net "$nsrt" link del $BRWAN type bridge + +for i in $WAN $LAN1 $LAN2; do + unset_client "$i" +done + +set_pair_link down $WAN $LAN1 $LAN2 + +for i in $WAN $LAN1 $LAN2; do + ns="${nsa[$i]}" + if [[ "${vethcl[$i]:0:4}" != "veth" ]]; then + ip netns exec "$ns" ip link set "${vethcl[$i]}" netns 1 + fi + if [[ "${vethrt[$i]:0:4}" != "veth" ]]; then + ip netns exec "$nsrt" ip link set "${vethrt[$i]}" netns 1 + fi +done + +if [ $ret -eq 0 ]; then + echo "PASS: all tests passed" +else + echo "ERROR: bridge fastpath test has failed" +fi + +exit $ret -- 2.50.0

2 months, 1 week

1
0
0 0

[PATCH v2 7/7] KVM: LoongArch: selftests: Add time counter test

by Bibo Mao

With time counter test, it is to verify that time count starts from 0 and always grows up then. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../selftests/kvm/lib/loongarch/processor.c | 9 ++++++ .../selftests/kvm/loongarch/arch_timer.c | 29 +++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c index 436990258068..ac2ffd076bff 100644 --- a/tools/testing/selftests/kvm/lib/loongarch/processor.c +++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c @@ -3,6 +3,7 @@ #include <assert.h> #include <linux/compiler.h> +#include <asm/kvm.h> #include "kvm_util.h" #include "processor.h" #include "ucall_common.h" @@ -256,6 +257,11 @@ static void loongarch_set_csr(struct kvm_vcpu *vcpu, uint64_t id, uint64_t val) __vcpu_set_reg(vcpu, csrid, val); } +static void loongarch_set_reg(struct kvm_vcpu *vcpu, uint64_t id, uint64_t val) +{ + __vcpu_set_reg(vcpu, id, val); +} + static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu) { int width; @@ -279,6 +285,9 @@ static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu) loongarch_set_csr(vcpu, LOONGARCH_CSR_ECFG, 0); loongarch_set_csr(vcpu, LOONGARCH_CSR_TCFG, 0); loongarch_set_csr(vcpu, LOONGARCH_CSR_ASID, 1); + /* time count start from 0 */ + val = 0; + loongarch_set_reg(vcpu, KVM_REG_LOONGARCH_COUNTER, val); val = 0; width = vm->page_shift - 3; diff --git a/tools/testing/selftests/kvm/loongarch/arch_timer.c b/tools/testing/selftests/kvm/loongarch/arch_timer.c index 579132a082cd..f3a25a0163fc 100644 --- a/tools/testing/selftests/kvm/loongarch/arch_timer.c +++ b/tools/testing/selftests/kvm/loongarch/arch_timer.c @@ -133,10 +133,39 @@ static void guest_test_emulate_timer(uint32_t cpu) local_irq_enable(); } +static void guest_time_count_test(uint32_t cpu) +{ + uint32_t config_iter; + unsigned long start, end, prev, us; + + /* Assuming that test case starts to run in 1 second */ + start = timer_get_cycles(); + us = msec_to_cycles(1000); + __GUEST_ASSERT(start <= us, + "start = 0x%lx, us = 0x%lx.\n", + start, us); + + us = msec_to_cycles(test_args.timer_period_ms); + for (config_iter = 0; config_iter < test_args.nr_iter; config_iter++) { + start = timer_get_cycles(); + end = start + us; + /* test time count growing up always */ + while (start < end) { + prev = start; + start = timer_get_cycles(); + __GUEST_ASSERT(prev <= start, + "prev = 0x%lx, start = 0x%lx.\n", + prev, start); + } + } +} + static void guest_code(void) { uint32_t cpu = guest_get_vcpuid(); + /* must run at first */ + guest_time_count_test(cpu); timer_irq_enable(); local_irq_enable(); guest_test_oneshot_timer(cpu); -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 6/7] KVM: LoongArch: selftests: Add SW emulated timer test

by Bibo Mao

This test case setup one-shot timer and execute idle instruction immediately to indicate giving up CPU, hypervisor will emulate SW hrtimer and wakeup vCPU when SW hrtimer is fired. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../selftests/kvm/loongarch/arch_timer.c | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/tools/testing/selftests/kvm/loongarch/arch_timer.c b/tools/testing/selftests/kvm/loongarch/arch_timer.c index a4a39f24bb7e..579132a082cd 100644 --- a/tools/testing/selftests/kvm/loongarch/arch_timer.c +++ b/tools/testing/selftests/kvm/loongarch/arch_timer.c @@ -94,6 +94,45 @@ static void guest_test_period_timer(uint32_t cpu) irq_iter); } +static void do_idle(void) +{ + unsigned int intid; + unsigned long estat; + + __asm__ __volatile__("idle 0" : : : "memory"); + + estat = csr_read(LOONGARCH_CSR_ESTAT); + intid = !!(estat & BIT(INT_TI)); + + /* Make sure pending timer IRQ arrived */ + GUEST_ASSERT_EQ(intid, 1); + csr_write(CSR_TINTCLR_TI, LOONGARCH_CSR_TINTCLR); +} + +static void guest_test_emulate_timer(uint32_t cpu) +{ + uint32_t config_iter; + uint64_t xcnt_diff_us, us; + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu]; + + local_irq_disable(); + shared_data->nr_iter = 0; + us = msecs_to_usecs(test_args.timer_period_ms); + for (config_iter = 0; config_iter < test_args.nr_iter; config_iter++) { + shared_data->xcnt = timer_get_cycles(); + + /* Setup the next interrupt */ + timer_set_next_cmp_ms(test_args.timer_period_ms, false); + do_idle(); + + xcnt_diff_us = cycles_to_usec(timer_get_cycles() - shared_data->xcnt); + __GUEST_ASSERT(xcnt_diff_us >= us, + "xcnt_diff_us = 0x%lx, us = 0x%lx.\n", + xcnt_diff_us, us); + } + local_irq_enable(); +} + static void guest_code(void) { uint32_t cpu = guest_get_vcpuid(); @@ -102,6 +141,7 @@ static void guest_code(void) local_irq_enable(); guest_test_oneshot_timer(cpu); guest_test_period_timer(cpu); + guest_test_emulate_timer(cpu); GUEST_DONE(); } -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 5/7] KVM: LoongArch: selftests: Add period mode timer test

by Bibo Mao

Period mode timer is added. Timer only need program once with period mode, its compared tick value will reload when timer is fired. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../kvm/include/loongarch/arch_timer.h | 5 ++++ .../selftests/kvm/loongarch/arch_timer.c | 28 +++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/tools/testing/selftests/kvm/include/loongarch/arch_timer.h b/tools/testing/selftests/kvm/include/loongarch/arch_timer.h index 94b1cba2744d..b6399e748f72 100644 --- a/tools/testing/selftests/kvm/include/loongarch/arch_timer.h +++ b/tools/testing/selftests/kvm/include/loongarch/arch_timer.h @@ -36,6 +36,11 @@ static inline void timer_set_next_cmp_ms(unsigned int msec, bool period) csr_write(val, LOONGARCH_CSR_TCFG); } +static inline void disable_timer(void) +{ + csr_write(0, LOONGARCH_CSR_TCFG); +} + static inline unsigned long timer_get_val(void) { return csr_read(LOONGARCH_CSR_TVAL); diff --git a/tools/testing/selftests/kvm/loongarch/arch_timer.c b/tools/testing/selftests/kvm/loongarch/arch_timer.c index 2a2cebcf3885..a4a39f24bb7e 100644 --- a/tools/testing/selftests/kvm/loongarch/arch_timer.c +++ b/tools/testing/selftests/kvm/loongarch/arch_timer.c @@ -23,6 +23,13 @@ static void guest_irq_handler(struct ex_regs *regs) GUEST_ASSERT_EQ(intid, 1); cfg = timer_get_cfg(); + if (cfg & CSR_TCFG_PERIOD) { + WRITE_ONCE(shared_data->nr_iter, shared_data->nr_iter - 1); + if (shared_data->nr_iter == 0) + disable_timer(); + csr_write(CSR_TINTCLR_TI, LOONGARCH_CSR_TINTCLR); + return; + } /* * On physical machine, value of LOONGARCH_CSR_TVAL is BIT_ULL(48) - 1 @@ -67,6 +74,26 @@ static void guest_test_oneshot_timer(uint32_t cpu) } } +static void guest_test_period_timer(uint32_t cpu) +{ + uint32_t irq_iter; + uint64_t us; + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu]; + + shared_data->nr_iter = test_args.nr_iter; + shared_data->xcnt = timer_get_cycles(); + us = msecs_to_usecs(test_args.timer_period_ms) + test_args.timer_err_margin_us; + timer_set_next_cmp_ms(test_args.timer_period_ms, true); + /* Setup a timeout for the interrupt to arrive */ + udelay(us * test_args.nr_iter); + irq_iter = READ_ONCE(shared_data->nr_iter); + __GUEST_ASSERT(irq_iter == 0, + "irq_iter = 0x%x.\n" + " Guest period timer interrupt was not triggered within the specified\n" + " interval, try to increase the error margin by [-e] option.\n", + irq_iter); +} + static void guest_code(void) { uint32_t cpu = guest_get_vcpuid(); @@ -74,6 +101,7 @@ static void guest_code(void) timer_irq_enable(); local_irq_enable(); guest_test_oneshot_timer(cpu); + guest_test_period_timer(cpu); GUEST_DONE(); } -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 2/7] KVM: LoongArch: selftests: Add exception handler register interface

by Bibo Mao

Add interrupt and exception handler register interface. When exception happens, execute registered exception handler if exists, else report error. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- .../kvm/include/loongarch/processor.h | 14 +++++++++ .../selftests/kvm/lib/loongarch/processor.c | 29 +++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/tools/testing/selftests/kvm/include/loongarch/processor.h b/tools/testing/selftests/kvm/include/loongarch/processor.h index 374caddfb0db..a18ac7bff303 100644 --- a/tools/testing/selftests/kvm/include/loongarch/processor.h +++ b/tools/testing/selftests/kvm/include/loongarch/processor.h @@ -84,6 +84,11 @@ #define LOONGARCH_CSR_EUEN 0x2 #define LOONGARCH_CSR_ECFG 0x4 #define LOONGARCH_CSR_ESTAT 0x5 /* Exception status */ +#define CSR_ESTAT_EXC_SHIFT 16 +#define CSR_ESTAT_EXC_WIDTH 6 +#define CSR_ESTAT_EXC (0x3f << CSR_ESTAT_EXC_SHIFT) +#define EXCCODE_INT 0 /* Interrupt */ +#define INT_TI 11 /* Timer interrupt*/ #define LOONGARCH_CSR_ERA 0x6 /* ERA */ #define LOONGARCH_CSR_BADV 0x7 /* Bad virtual address */ #define LOONGARCH_CSR_EENTRY 0xc @@ -133,6 +138,15 @@ struct ex_regs { #define PRMD_OFFSET_EXREGS offsetof(struct ex_regs, prmd) #define EXREGS_SIZE sizeof(struct ex_regs) +#define VECTOR_NUM 64 +typedef void(*handler_fn)(struct ex_regs *); +struct handlers { + handler_fn exception_handlers[VECTOR_NUM]; +}; + +void vm_init_descriptor_tables(struct kvm_vm *vm); +void vm_install_exception_handler(struct kvm_vm *vm, int vector, handler_fn handler); + #else #define PC_OFFSET_EXREGS ((EXREGS_GPRS + 0) * 8) #define ESTAT_OFFSET_EXREGS ((EXREGS_GPRS + 1) * 8) diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c index 0ac1abcb71cb..be537c5ff74e 100644 --- a/tools/testing/selftests/kvm/lib/loongarch/processor.c +++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c @@ -11,6 +11,7 @@ #define LOONGARCH_GUEST_STACK_VADDR_MIN 0x200000 static vm_paddr_t invalid_pgtable[4]; +static vm_vaddr_t exception_handlers; static uint64_t virt_pte_index(struct kvm_vm *vm, vm_vaddr_t gva, int level) { @@ -184,6 +185,13 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu) void route_exception(struct ex_regs *regs) { unsigned long pc, estat, badv; + int vector; + struct handlers *handlers; + + handlers = (struct handlers *)exception_handlers; + vector = (regs->estat & CSR_ESTAT_EXC) >> CSR_ESTAT_EXC_SHIFT; + if (handlers && handlers->exception_handlers[vector]) + return handlers->exception_handlers[vector](regs); pc = regs->pc; badv = regs->badv; @@ -192,6 +200,27 @@ void route_exception(struct ex_regs *regs) while (1) ; } +void vm_init_descriptor_tables(struct kvm_vm *vm) +{ + void *addr; + + vm->handlers = __vm_vaddr_alloc(vm, sizeof(struct handlers), + LOONGARCH_GUEST_STACK_VADDR_MIN, MEM_REGION_DATA); + + addr = addr_gva2hva(vm, vm->handlers); + memset(addr, 0, vm->page_size); + exception_handlers = vm->handlers; + sync_global_to_guest(vm, exception_handlers); +} + +void vm_install_exception_handler(struct kvm_vm *vm, int vector, handler_fn handler) +{ + struct handlers *handlers = addr_gva2hva(vm, vm->handlers); + + assert(vector < VECTOR_NUM); + handlers->exception_handlers[vector] = handler; +} + void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...) { int i; -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v2 1/7] KVM: LoongArch: selftests: Add system registers save and restore on exception

by Bibo Mao

When system returns from exception with ertn instruction, PC comes from LOONGARCH_CSR_ERA, and CSR_CRMD comes LOONGARCH_CSR_PRMD. Here save CSR register CSR_ERA and CSR_PRMD in stack, and restore them from stack. So it can be modified by exception handler in future. Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- tools/testing/selftests/kvm/include/loongarch/processor.h | 5 ++++- tools/testing/selftests/kvm/lib/loongarch/exception.S | 6 ++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/include/loongarch/processor.h b/tools/testing/selftests/kvm/include/loongarch/processor.h index 6427a3275e6a..374caddfb0db 100644 --- a/tools/testing/selftests/kvm/include/loongarch/processor.h +++ b/tools/testing/selftests/kvm/include/loongarch/processor.h @@ -124,18 +124,21 @@ struct ex_regs { unsigned long pc; unsigned long estat; unsigned long badv; + unsigned long prmd; }; #define PC_OFFSET_EXREGS offsetof(struct ex_regs, pc) #define ESTAT_OFFSET_EXREGS offsetof(struct ex_regs, estat) #define BADV_OFFSET_EXREGS offsetof(struct ex_regs, badv) +#define PRMD_OFFSET_EXREGS offsetof(struct ex_regs, prmd) #define EXREGS_SIZE sizeof(struct ex_regs) #else #define PC_OFFSET_EXREGS ((EXREGS_GPRS + 0) * 8) #define ESTAT_OFFSET_EXREGS ((EXREGS_GPRS + 1) * 8) #define BADV_OFFSET_EXREGS ((EXREGS_GPRS + 2) * 8) -#define EXREGS_SIZE ((EXREGS_GPRS + 3) * 8) +#define PRMD_OFFSET_EXREGS ((EXREGS_GPRS + 3) * 8) +#define EXREGS_SIZE ((EXREGS_GPRS + 4) * 8) #endif #endif /* SELFTEST_KVM_PROCESSOR_H */ diff --git a/tools/testing/selftests/kvm/lib/loongarch/exception.S b/tools/testing/selftests/kvm/lib/loongarch/exception.S index 88bfa505c6f5..3f1e4b67c5ae 100644 --- a/tools/testing/selftests/kvm/lib/loongarch/exception.S +++ b/tools/testing/selftests/kvm/lib/loongarch/exception.S @@ -51,9 +51,15 @@ handle_exception: st.d t0, sp, ESTAT_OFFSET_EXREGS csrrd t0, LOONGARCH_CSR_BADV st.d t0, sp, BADV_OFFSET_EXREGS + csrrd t0, LOONGARCH_CSR_PRMD + st.d t0, sp, PRMD_OFFSET_EXREGS or a0, sp, zero bl route_exception + ld.d t0, sp, PC_OFFSET_EXREGS + csrwr t0, LOONGARCH_CSR_ERA + ld.d t0, sp, PRMD_OFFSET_EXREGS + csrwr t0, LOONGARCH_CSR_PRMD restore_gprs sp csrrd sp, LOONGARCH_CSR_KS0 ertn -- 2.39.3

2 months, 1 week

1
0
0 0

[PATCH v10 0/9] support FEAT_LSUI

by Yeoreum Yun

Since Armv9.6, FEAT_LSUI supplies the load/store instructions for previleged level to access to access user memory without clearing PSTATE.PAN bit. This patchset support FEAT_LSUI and applies in futex atomic operation and user_swpX emulation where can replace from ldxr/st{l}xr pair implmentation with clearing PSTATE.PAN bit to correspondant load/store unprevileged atomic operation without clearing PSTATE.PAN bit. Patch Sequences ================ Patch #1 adds cpufeature for FEAT_LSUI Patch #2-#3 expose FEAT_LSUI to guest Patch #4 adds Kconfig for FEAT_LSUI Patch #5-#6 support futex atomic-op with FEAT_LSUI Patch #7-#9 support user_swpX emulation with FEAT_LSUI Patch History ============== from v9 to v10: - apply FEAT_LSUI to user_swpX emulation. - add test coverage for LSUI bit in ID_AA64ISAR3_EL1 - rebase to v6.18-rc4 - https://lore.kernel.org/all/20250922102244.2068414-1-yeoreum.yun@arm.com/ from v8 to v9: - refotoring __lsui_cmpxchg64() - rebase to v6.17-rc7 - https://lore.kernel.org/all/20250917110838.917281-1-yeoreum.yun@arm.com/ from v7 to v8: - implements futex_atomic_eor() and futex_atomic_cmpxchg() with casalt with C helper. - Drop the small optimisation on ll/sc futex_atomic_set operation. - modify some commit message. - https://lore.kernel.org/all/20250816151929.197589-1-yeoreum.yun@arm.com/ from v6 to v7: - wrap FEAT_LSUI with CONFIG_AS_HAS_LSUI in cpufeature - remove unnecessary addition of indentation. - remove unnecessary mte_tco_enable()/disable() on LSUI operation. - https://lore.kernel.org/all/20250811163635.1562145-1-yeoreum.yun@arm.com/ from v5 to v6: - rebase to v6.17-rc1 - https://lore.kernel.org/all/20250722121956.1509403-1-yeoreum.yun@arm.com/ from v4 to v5: - remove futex_ll_sc.h futext_lsui and lsui.h and move them to futex.h - reorganize the patches. - https://lore.kernel.org/all/20250721083618.2743569-1-yeoreum.yun@arm.com/ from v3 to v4: - rebase to v6.16-rc7 - modify some patch's title. - https://lore.kernel.org/all/20250617183635.1266015-1-yeoreum.yun@arm.com/ from v2 to v3: - expose FEAT_LUSI to guest - add help section for LUSI Kconfig - https://lore.kernel.org/all/20250611151154.46362-1-yeoreum.yun@arm.com/ from v1 to v2: - remove empty v9.6 menu entry - locate HAS_LUSI in cpucaps in order - https://lore.kernel.org/all/20250611104916.10636-1-yeoreum.yun@arm.com/ Yeoreum Yun (9): arm64: cpufeature: add FEAT_LSUI KVM: arm64: expose FEAT_LSUI to guest KVM: arm64: kselftest: set_id_regs: add test for FEAT_LSUI arm64: Kconfig: Detect toolchain support for LSUI arm64: futex: refactor futex atomic operation arm64: futex: support futex with FEAT_LSUI arm64: separate common LSUI definitions into lsui.h arm64: armv8_deprecated: convert user_swpX to inline function arm64: armv8_deprecated: apply FEAT_LSUI for swpX emulation. arch/arm64/Kconfig | 5 + arch/arm64/include/asm/futex.h | 291 +++++++++++++++--- arch/arm64/include/asm/lsui.h | 25 ++ arch/arm64/kernel/armv8_deprecated.c | 86 +++++- arch/arm64/kernel/cpufeature.c | 10 + arch/arm64/kvm/sys_regs.c | 3 +- arch/arm64/tools/cpucaps | 1 + .../testing/selftests/kvm/arm64/set_id_regs.c | 1 + 8 files changed, 360 insertions(+), 62 deletions(-) create mode 100644 arch/arm64/include/asm/lsui.h base-commit: 6146a0f1dfae5d37442a9ddcba012add260bceb0 -- LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}

2 months, 1 week

2
13
0 0

[PATCH] kunit: Implement ftrace-based stubbing

by Eddie Phillips

Allow function redirection using ftrace. This is basically equivalent to the static_stub support in the previous patch, but does not require the function being replaced to be modified (save for the addition of KUNIT_STUBBABLE/noinline). This is hidden behind the CONFIG_KUNIT_FTRACE_STUBS option, and has a number of dependencies, including ftrace and CONFIG_KALLSYMS_ALL. As a result, it only works on architectures where these are available. You can run the KUnit example tests with the following: $ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit/stubs_example.kunitconfig --arch=x86_64 To the end user, replacing a function is very simple, e.g. KUNIT_STUBBABLE void real_func(int n); void replacement_func(int n); /* in tests */ kunit_activate_ftrace_stub(test, real_func, replacement_func); The implementation is inspired by Steven's snippet here [1]. Some more details: * stubbing is automatically undone at the end of tests * it can also be manually undone with kunit_deactive_ftrace_stub() * stubbing only applies when current->kunit_test == test * note: currently can't have more than one test running at a time * KUNIT_STUBBABLE marks tests as noinline when CONFIG_KUNIT_STUBS is set * this ensures we can actually stub all calls * KUNIT_STUBBABLE_TRAMPOLINE is a version that evaluates to __always_inline when stubbing is not enabled * This may need to be used with a wrapper function. * See the doc comment for more details. Sharp-edges: * kernel livepatch only works on some arches (not UML) * if you don't use noinline/KUNIT_STUBBABLE, functions might be inlined and thus none of this works: * if it's always inlined, at least the attempt to stub will fail * if it's sometimes inlined, then the stub silently won't work [1] https://lore.kernel.org/lkml/20220224091550.2b7e8784@gandalf.local.home Co-developed-by: Daniel Latypov <dlatypov(a)google.com> Signed-off-by: Eddie Phillips <eddiephillips(a)google.com> --- Link to original: https://lore.kernel.org/all/20220910212804.670622-3-davidgow@google.com/ include/kunit/ftrace_stub.h | 84 ++++++++++++++++ lib/kunit/Kconfig | 11 +++ lib/kunit/Makefile | 4 + lib/kunit/ftrace_stub.c | 146 ++++++++++++++++++++++++++++ lib/kunit/kunit-example-test.c | 29 +++++- lib/kunit/stubs_example.kunitconfig | 10 ++ 6 files changed, 282 insertions(+), 2 deletions(-) create mode 100644 include/kunit/ftrace_stub.h create mode 100644 lib/kunit/ftrace_stub.c create mode 100644 lib/kunit/stubs_example.kunitconfig diff --git a/include/kunit/ftrace_stub.h b/include/kunit/ftrace_stub.h new file mode 100644 index 000000000000..bfd57ea6289c --- /dev/null +++ b/include/kunit/ftrace_stub.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _KUNIT_FTRACE_STUB_H +#define _KUNIT_FTRACE_STUB_H + +/** KUNIT_STUBBABLE - marks a function as stubbable when stubbing support is + * enabled. + * + * Stubbing uses ftrace internally, so we can only stub out functions when they + * are not inlined. This macro eavlautes to noinline when stubbing support is + * enabled to thus make it safe. + * + * If you cannot add this annotation to the function, you can instead use + * KUNIT_STUBBABLE_TRAMPOLINE, which is the same, but evaluates to + * __always_inline when stubbing is not enabled. + * + * Consider copy_to_user, which is marked as __always_inline: + * + * .. code-block:: c + * static KUNIT_STUBBABLE_TRAMPOLINE unsigned long + * copy_to_user_trampoline(void __user *to, const void *from, unsigned long n) + * { + * return copy_to_user(to, from, n); + * } + * + * Then we simply need to update our code to go through this function instead + * (in the places where we want to stub it out). + */ +#if IS_ENABLED(CONFIG_KUNIT_FTRACE_STUBS) +#define KUNIT_STUBBABLE noinline +#define KUNIT_STUBBABLE_TRAMPOLINE noinline +#else +#define KUNIT_STUBBABLE +#define KUNIT_STUBBABLE_TRAMPOLINE __always_inline +#endif + +struct kunit; + +/** + * kunit_activate_ftrace_stub() - makes all calls to @func go to @replacement during @test. + * @test: The test context object. + * @func: The function to stub out, must be annotated with KUNIT_STUBBABLE. + * @replacement: The function to replace @func with. + * + * All calls to @func will instead call @replacement for the duration of the + * current test. If called from outside the test's thread, the function will + * not be redirected. + * + * The redirection can be disabled again with kunit_deactivate_ftrace_stub(). + * + * Example: + * + * .. code-block:: c + * KUNIT_STUBBABLE int real_func(int n) + * { + * pr_info("real_func() called with %d", n); + * return 0; + * } + * + * void replacement_func(int n) + * { + * pr_info("replacement_func() called with %d", n); + * return 42; + * } + * + * void example_test(struct kunit *test) + * { + * kunit_active_ftrace_stub(test, real_func, replacement_func); + * KUNIT_EXPECT_EQ(test, real_func(1), 42); + * } + * + */ +#define kunit_activate_ftrace_stub(test, real_fn_addr, replacement_addr) do { \ + typecheck_fn(typeof(&replacement_addr), real_fn_addr); \ + __kunit_activate_ftrace_stub(test, #real_fn_addr, real_fn_addr, replacement_addr); \ +} while (0) + +void __kunit_activate_ftrace_stub(struct kunit *test, + const char *name, + void *real_fn_addr, + void *replacement_addr); + + +void kunit_deactivate_ftrace_stub(struct kunit *test, void *real_fn_addr); +#endif /* _KUNIT_STUB_H */ diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig index 7a6af361d2fc..8a629017b917 100644 --- a/lib/kunit/Kconfig +++ b/lib/kunit/Kconfig @@ -70,6 +70,17 @@ config KUNIT_ALL_TESTS If unsure, say N. +config KUNIT_FTRACE_STUBS + bool "Support for stubbing out functions in KUnit tests with ftrace and kernel livepatch" + depends on FTRACE=y && FUNCTION_TRACER=y && MODULES=y && DEBUG_KERNEL=y && KALLSYMS_ALL=y + help + Builds support for stubbing out functions for the duration of KUnit + test cases or suites using ftrace. + See KUNIT_EXAMPLE_TEST for an example. + + NOTE: this does not work on all architectures (like UML) and + relies on a lot of magic (see the dependencies list). + config KUNIT_DEFAULT_ENABLED bool "Default value of kunit.enable" default y diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile index 656f1fa35abc..f04f6ea4d6a8 100644 --- a/lib/kunit/Makefile +++ b/lib/kunit/Makefile @@ -29,3 +29,7 @@ obj-$(CONFIG_KUNIT_TEST) += assert_test.o endif obj-$(CONFIG_KUNIT_EXAMPLE_TEST) += kunit-example-test.o + +ifeq ($(CONFIG_KUNIT_FTRACE_STUBS),y) +kunit-objs += ftrace_stub.o +endif \ No newline at end of file diff --git a/lib/kunit/ftrace_stub.c b/lib/kunit/ftrace_stub.c new file mode 100644 index 000000000000..b19eaa35f5ed --- /dev/null +++ b/lib/kunit/ftrace_stub.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <kunit/ftrace_stub.h> +#include <kunit/test.h> + +#include <linux/typecheck.h> + +#include <linux/ftrace.h> +#include <linux/livepatch.h> +#include <linux/sched.h> + + +struct kunit_ftrace_stub_ctx { + struct kunit *test; + unsigned long real_fn_addr; /* used as a key to lookup the stub */ + unsigned long replacement_addr; + struct ftrace_ops ops; /* a copy of kunit_stub_base_ops with .private set */ +}; + +static void kunit_stub_trampoline(unsigned long ip, unsigned long parent_ip, + struct ftrace_ops *ops, + struct ftrace_regs *fregs) +{ + struct kunit_ftrace_stub_ctx *ctx = ops->private; + int lock_bit; + + if (current->kunit_test != ctx->test) + return; + + lock_bit = ftrace_test_recursion_trylock(ip, parent_ip); + KUNIT_ASSERT_GE(ctx->test, lock_bit, 0); + + ftrace_regs_set_instruction_pointer(fregs, ctx->replacement_addr); + + ftrace_test_recursion_unlock(lock_bit); +} + +static struct ftrace_ops kunit_stub_base_ops = { + .func = &kunit_stub_trampoline, + .flags = FTRACE_OPS_FL_IPMODIFY | +#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS + FTRACE_OPS_FL_SAVE_REGS | +#endif + FTRACE_OPS_FL_DYNAMIC +}; + +static void __kunit_ftrace_stub_resource_free(struct kunit_resource *res) +{ + struct kunit_ftrace_stub_ctx *ctx = res->data; + + unregister_ftrace_function(&ctx->ops); + kfree(ctx); +} + +/* Matching function for kunit_find_resource(). match_data is real_fn_addr. */ +static bool __kunit_static_stub_resource_match(struct kunit *test, + struct kunit_resource *res, + void *match_real_fn_addr) +{ + /* This pointer is only valid if res is a static stub resource. */ + struct kunit_ftrace_stub_ctx *ctx = res->data; + + /* Make sure the resource is a static stub resource. */ + if (res->free != &__kunit_ftrace_stub_resource_free) + return false; + + return ctx->real_fn_addr == (unsigned long)match_real_fn_addr; +} + +void kunit_deactivate_ftrace_stub(struct kunit *test, void *real_fn_addr) +{ + struct kunit_resource *res; + + KUNIT_ASSERT_PTR_NE_MSG(test, real_fn_addr, NULL, + "Tried to deactivate a NULL stub."); + + /* Look up the existing stub for this function. */ + res = kunit_find_resource(test, + __kunit_static_stub_resource_match, + real_fn_addr); + + /* Error out if the stub doesn't exist. */ + KUNIT_ASSERT_PTR_NE_MSG(test, res, NULL, + "Tried to deactivate a nonexistent stub."); + + /* Free the stub. We 'put' twice, as we got a reference + * from kunit_find_resource(). The free function will deactivate the + * ftrace stub. + */ + kunit_remove_resource(test, res); + kunit_put_resource(res); +} +EXPORT_SYMBOL_GPL(kunit_deactivate_ftrace_stub); + +void __kunit_activate_ftrace_stub(struct kunit *test, + const char *name, + void *real_fn_addr, + void *replacement_addr) +{ + unsigned long ftrace_ip; + struct kunit_ftrace_stub_ctx *ctx; + int ret; + + ftrace_ip = ftrace_location((unsigned long)real_fn_addr); + if (!ftrace_ip) + KUNIT_FAIL_ASSERTION(test, KUNIT_ASSERTION, + "%s ip is invalid: not a function, or is marked notrace or inline", name); + + /* Allocate the stub context, which contains pointers to the replacement + * function and the test object. It's also registered as a KUnit + * resource which can be looked up by address (to deactivate manually) + * and is destroyed automatically on test exit. + */ + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + KUNIT_ASSERT_PTR_NE_MSG(test, ctx, NULL, "failed to allocate kunit stub for %s", name); + + ctx->test = test; + ctx->ops = kunit_stub_base_ops; + ctx->ops.private = ctx; + ctx->real_fn_addr = (unsigned long)real_fn_addr; + ctx->replacement_addr = (unsigned long)replacement_addr; + + ret = ftrace_set_filter_ip(&ctx->ops, ftrace_ip, 0, 0); + if (ret) { + kfree(ctx); + KUNIT_FAIL_ASSERTION(test, KUNIT_ASSERTION, + "failed to set filter ip for %s: %d", name, ret); + } + + ret = register_ftrace_function(&ctx->ops); + if (ret) { + kfree(ctx); + if (ret == -EBUSY) + KUNIT_FAIL_ASSERTION( + test, KUNIT_ASSERTION, + "failed to register stub (-EBUSY) for %s, likely due to already stubbing it?", + name); + KUNIT_FAIL_ASSERTION(test, KUNIT_ASSERTION, + "failed to register stub for %s: %d", name, + ret); + } + + kunit_alloc_resource(test, NULL, + __kunit_ftrace_stub_resource_free, + GFP_KERNEL, ctx); +} +EXPORT_SYMBOL_GPL(__kunit_activate_ftrace_stub); diff --git a/lib/kunit/kunit-example-test.c b/lib/kunit/kunit-example-test.c index 9452b163956f..676ad552ae7b 100644 --- a/lib/kunit/kunit-example-test.c +++ b/lib/kunit/kunit-example-test.c @@ -6,8 +6,9 @@ * Author: Brendan Higgins <brendanhiggins(a)google.com> */ -#include <kunit/test.h> +#include <kunit/ftrace_stub.h> #include <kunit/static_stub.h> +#include <kunit/test.h> /* * This is the most fundamental element of KUnit, the test case. A test case @@ -152,7 +153,7 @@ static void example_all_expect_macros_test(struct kunit *test) } /* This is a function we'll replace with static stubs. */ -static int add_one(int i) +static KUNIT_STUBBABLE int add_one(int i) { /* This will trigger the stub if active. */ KUNIT_STATIC_STUB_REDIRECT(add_one, i); @@ -221,6 +222,29 @@ static void example_static_stub_using_fn_ptr_test(struct kunit *test) KUNIT_EXPECT_EQ(test, add_one(1), 2); } +/* + * This test shows the use of dynamic stubs. + */ +static void example_ftrace_stub_test(struct kunit *test) +{ +#if !IS_ENABLED(CONFIG_KUNIT_FTRACE_STUBS) + kunit_skip(test, "KUNIT_FTRACE_STUBS not enabled"); +#else + /* By default, function is not stubbed. */ + KUNIT_EXPECT_EQ(test, add_one(1), 2); + + /* Replace add_one() with subtract_one(). */ + kunit_activate_ftrace_stub(test, add_one, subtract_one); + + /* add_one() is now replaced. */ + KUNIT_EXPECT_EQ(test, add_one(1), 0); + + /* Return add_one() to normal. */ + kunit_deactivate_ftrace_stub(test, add_one); + KUNIT_EXPECT_EQ(test, add_one(1), 2); +#endif +} + static const struct example_param { int value; } example_params_array[] = { @@ -506,6 +530,7 @@ static struct kunit_case example_test_cases[] = { KUNIT_CASE(example_all_expect_macros_test), KUNIT_CASE(example_static_stub_test), KUNIT_CASE(example_static_stub_using_fn_ptr_test), + KUNIT_CASE(example_ftrace_stub_test), KUNIT_CASE(example_priv_test), KUNIT_CASE_PARAM(example_params_test, example_gen_params), KUNIT_CASE_PARAM_WITH_INIT(example_params_test_with_init, kunit_array_gen_params, diff --git a/lib/kunit/stubs_example.kunitconfig b/lib/kunit/stubs_example.kunitconfig new file mode 100644 index 000000000000..20af4da9bc75 --- /dev/null +++ b/lib/kunit/stubs_example.kunitconfig @@ -0,0 +1,10 @@ +CONFIG_KUNIT=y +CONFIG_KUNIT_FTRACE_STUBS=y +CONFIG_KUNIT_EXAMPLE_TEST=y + +# Depedencies +CONFIG_FTRACE=y +CONFIG_FUNCTION_TRACER=y +CONFIG_MODULES=y +CONFIG_DEBUG_KERNEL=y +CONFIG_KALLSYMS_ALL=y -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

1
0
0 0

[PATCH bpf-next v4 0/2] bpf: Skip bounds adjustment for conditional jumps on same scalar register

by KaFai Wan

This small patchset is about avoid verifier bug warning when conditional jumps on same register when the register holds a scalar with range. v4: - make code better. (Alexei) v3: https://lore.kernel.org/bpf/20251031154107.403054-1-kafai.wan@linux.dev/ - Enhance is_scalar_branch_taken() to handle scalar case. (Eduard) - Update the selftest to cover all conditional jump opcodes. (Eduard) v2: https://lore.kernel.org/bpf/20251025053017.2308823-1-kafai.wan@linux.dev/ - Enhance is_branch_taken() and is_scalar_branch_taken() to handle branch direction computation for same register. (Eduard and Alexei) - Update the selftest. v1: https://lore.kernel.org/bpf/20251022164457.1203756-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Skip bounds adjustment for conditional jumps on same scalar register selftests/bpf: Add test for conditional jumps on same scalar register kernel/bpf/verifier.c | 31 ++++ .../selftests/bpf/progs/verifier_bounds.c | 154 ++++++++++++++++++ 2 files changed, 185 insertions(+) -- 2.43.0

2 months, 1 week

4
6
0 0

[PATCH v4 0/5] mm: Refactor and improve VMA count limit code

by Kalesh Singh

Hi all, This series refactors the VMA count limit code to improve clarity, test coverage, and observability. The VMA count limit, controlled by sysctl_max_map_count, is a safeguard that prevents a single process from consuming excessive kernel memory by creating too many memory mappings. A major change since v3 is the first patch in the series which instead of attempting to fix overshooting the limit now documents that this is the intended behavior. As Hugh pointed out, the lenient check (>) in do_mmap() and do_brk_flags() is intentional to allow for potential VMA merges or expansions when the process is at the sysctl_max_map_count limit. The consensus is that this historical behavior is correct but non-obvious. This series now focuses on making that behavior clear and the surrounding code more robust. Based on feedback from Lorenzo and David, this series retains the helper function and the rename of map_count. The refined v4 series is now structured as follows: 1. Documents the lenient VMA count checks with comments to clarify their purpose. 2. Adds a comprehensive selftest to codify the expected behavior at the limit, including the lenient mmap case. 3. Introduces max_vma_count() to abstract the max map count sysctl, making the sysctl static and converting all callers to use the new helper. 4. Renames mm_struct->map_count to the more explicit vma_count for better code clarity. 5. Adds a tracepoint for observability when a process fails to allocate a VMA due to the count limit. Tested on x86_64 and arm64: 1. Build test: allyesconfig for rename 2. Selftests: cd tools/testing/selftests/mm && \ make && \ ./run_vmtests.sh -t max_vma_count 3. vma tests: cd tools/testing/vma && \ make && \ ./vma Link to v3: https://lore.kernel.org/r/20251013235259.589015-1-kaleshsingh@google.com/ Thanks to everyone for the valuable discussion on previous revisions. -- Kalesh Kalesh Singh (5): mm: Document lenient map_count checks mm/selftests: add max_vma_count tests mm: Introduce max_vma_count() to abstract the max map count sysctl mm: rename mm_struct::map_count to vma_count mm/tracing: introduce trace_mm_insufficient_vma_slots event MAINTAINERS | 2 + fs/binfmt_elf.c | 2 +- fs/coredump.c | 2 +- include/linux/mm.h | 2 - include/linux/mm_types.h | 2 +- include/trace/events/vma.h | 32 + kernel/fork.c | 2 +- mm/debug.c | 2 +- mm/internal.h | 3 + mm/mmap.c | 25 +- mm/mremap.c | 13 +- mm/nommu.c | 8 +- mm/util.c | 1 - mm/vma.c | 42 +- mm/vma_internal.h | 2 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/max_vma_count_tests.c | 716 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + tools/testing/vma/vma.c | 32 +- tools/testing/vma/vma_internal.h | 13 +- 21 files changed, 856 insertions(+), 52 deletions(-) create mode 100644 include/trace/events/vma.h create mode 100644 tools/testing/selftests/mm/max_vma_count_tests.c base-commit: b227c04932039bccc21a0a89cd6df50fa57e4716 -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

2
8
0 0

[PATCH bpf-next 0/3] selftests/bpf: small improvements on tc_tunnel

by Alexis Lothoré (eBPF Foundation)

Hello, this series is a small follow-up to the test_tc_tunnel recent integration, to address some small missing details raised during the final review ([1]). This is mostly about adding some missing checks on net namespaces management. [1] https://lore.kernel.org/bpf/1ac9d14e-4250-480c-b863-410be78ac6c6@linux.dev/ Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (3): selftests/bpf: skip tc_tunnel subtest if its setup fails selftests/bpf: add checks in tc_tunnel when entering net namespaces selftests/bpf: use start_server_str rather than start_reuseport_server in tc_tunnel .../selftests/bpf/prog_tests/test_tc_tunnel.c | 162 ++++++++++++++------- 1 file changed, 107 insertions(+), 55 deletions(-) --- base-commit: 1e2d874b04ba46a3b9fe6697097aa437641f4339 change-id: 20251030-tc_tunnel_improv-6b9d1c22c6f6 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 1 week

3
5
0 0

[PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap

by Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. MPTCP creates subflows for data transmission between two endpoints. However, BPF can use sockops to perform additional operations when TCP completes the three-way handshake. The issue arose because we used sockmap in sockops, which replaces sk->sk_prot and some handlers. Since subflows also have their own specialized handlers, this creates a conflict and leads to traffic failure. Therefore, we need to reject operations targeting subflows. This patchset simply prevents the combination of subflows and sockmap without changing any functionality. A complete integration of MPTCP and sockmap would require more effort, for example, we would need to retrieve the parent socket from subflows in sockmap and implement handlers like read_skb. If maintainers don't object, we can further improve this in subsequent work. [1] truncated warning: [ 18.234652] ------------[ cut here ]------------ [ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380 [ 18.234726] Modules linked in: [ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380 [ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 [ 18.234800] PKRU: 55555554 [ 18.234806] Call Trace: [ 18.234810] <TASK> [ 18.234837] do_accept+0xeb/0x190 [ 18.234861] ? __x64_sys_pselect6+0x61/0x80 [ 18.234898] ? _raw_spin_unlock+0x12/0x30 [ 18.234915] ? alloc_fd+0x11e/0x190 [ 18.234925] __sys_accept4+0x8c/0x100 [ 18.234930] __x64_sys_accept+0x1f/0x30 [ 18.234933] x64_sys_call+0x202f/0x20f0 [ 18.234966] do_syscall_64+0x72/0x9a0 [ 18.234979] ? switch_fpu_return+0x60/0xf0 [ 18.234993] ? irqentry_exit_to_user_mode+0xdb/0x1e0 [ 18.235002] ? irqentry_exit+0x3f/0x50 [ 18.235005] ? clear_bhb_loop+0x50/0xa0 [ 18.235022] ? clear_bhb_loop+0x50/0xa0 [ 18.235025] ? clear_bhb_loop+0x50/0xa0 [ 18.235028] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 18.235066] </TASK> [ 18.235109] ---[ end trace 0000000000000000 ]--- --- v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/… Some advice suggested by Jakub Sitnicki v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linu… Some advice from Matthieu Baerts. Jiayuan Chen (3): net,mptcp: fix proto fallback detection with BPF sockmap bpf,sockmap: disallow MPTCP sockets from sockmap selftests/bpf: Add mptcp test with sockmap net/core/sock_map.c | 27 ++++ net/mptcp/protocol.c | 9 +- .../testing/selftests/bpf/prog_tests/mptcp.c | 150 ++++++++++++++++++ .../selftests/bpf/progs/mptcp_sockmap.c | 43 +++++ 4 files changed, 227 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c -- 2.43.0

2 months, 1 week

3
16
0 0

[PATCH] selftests/timers: Skip some posix_timers tests on kernels < 6.13

by Wake Liu

Several tests in the posix_timers selftest fail on kernels older than 6.13. These tests check for timer behavior related to SIG_IGN, which was refactored in the 6.13 kernel cycle, notably by commit caf77435dd8a ("signal: Handle ignored signals in do_sigaction(action != SIG_IGN)"). To ensure the selftests pass on older, stable kernels, gate the affected tests with a ksft_min_kernel_version(6, 13) check. Signed-off-by: Wake Liu <wakel(a)google.com> --- tools/testing/selftests/timers/posix_timers.c | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c index f0eceb0faf34..f228e51f8b58 100644 --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -256,6 +256,11 @@ static void *ignore_thread(void *arg) static void check_sig_ign(int thread) { + if (!ksft_min_kernel_version(6, 13)) { + // see caf77435dd8a + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct tmrsig tsig = { }; struct itimerspec its; unsigned int tid = 0; @@ -342,6 +347,10 @@ static void check_sig_ign(int thread) static void check_rearm(void) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct tmrsig tsig = { }; struct itimerspec its; struct sigaction sa; @@ -398,6 +407,10 @@ static void check_rearm(void) static void check_delete(void) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct tmrsig tsig = { }; struct itimerspec its; struct sigaction sa; @@ -455,6 +468,10 @@ static inline int64_t calcdiff_ns(struct timespec t1, struct timespec t2) static void check_sigev_none(int which, const char *name) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct timespec start, now; struct itimerspec its; struct sigevent sev; @@ -493,6 +510,10 @@ static void check_sigev_none(int which, const char *name) static void check_gettime(int which, const char *name) { + if (!ksft_min_kernel_version(6, 13)) { + ksft_test_result_skip("Depends on refactor of posix timers in 6.13\n"); + return; + } struct itimerspec its, prev; struct timespec start, now; struct sigevent sev; -- 2.50.1.703.g449372360f-goog

2 months, 1 week

2
2
0 0

[PATCH bpf-next 0/4] selftests/bpf: convert test_tc_edt.sh into test_progs

by Alexis Lothoré (eBPF Foundation)

Hello, this is yet another conversion series, this time tackling the test_tc_edt.sh. This one was at the bottom of our list due to the fact that it is based on some bandwith measurement (and so, increasing the risk to make it flaky in CI), but here is an attempt anyway, as it also showcases a nice example of BPF-based rate shaping. The converted test roughly follows the original script logic, with two veths in two namespaces, a TCP connection between a client and a server, and the client pushing as much data as possible during a specific period. We then compute the effective data rate, shaped by the eBPF program, by reading the RX interface stats, and compare it to the target rate. The test passes if the measured rate is within a defined error margin. There are two knobs driving the robustness of the test in CI: - the test duration (the higher, the more precise is the effective rate) - the tolerated error margin The original test was configured with a 20s duration and a 1% error margin. The new test is configured with a 2s duration and a 2% error margin, to: - make the duration tolerable in CI - while keeping enough margin for rate measure fluctuations depending on the CI machines load This has been run multiple times locally to ensure that those values are sane, and once in CI before sending the series, but I suggest to let it live a few days in CI to see how it really behaves. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Alexis Lothoré (eBPF Foundation) (4): selftests/bpf: rename test_tc_edt.bpf.c section to expose program type selftests/bpf: integrate test_tc_edt into test_progs selftests/bpf: remove test_tc_edt.sh selftests/bpf: do not hardcode target rate in test_tc_edt BPF program tools/testing/selftests/bpf/Makefile | 2 - .../testing/selftests/bpf/prog_tests/test_tc_edt.c | 274 +++++++++++++++++++++ tools/testing/selftests/bpf/progs/test_tc_edt.c | 9 +- tools/testing/selftests/bpf/test_tc_edt.sh | 100 -------- 4 files changed, 279 insertions(+), 106 deletions(-) --- base-commit: 1e2d874b04ba46a3b9fe6697097aa437641f4339 change-id: 20251030-tc_edt-3ea8e8d3d14e Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 1 week

4
7
0 0

[PATCH bpf-next v3 0/2] bpf: Skip bounds adjustment for conditional jumps on same scalar register

by KaFai Wan

This small patchset is about avoid verifier bug warning when conditional jumps on same register when the register holds a scalar with range. v3: - Enhance is_scalar_branch_taken() to handle scalar case. (Eduard) - Update the selftest to cover all conditional jump opcodes. (Eduard) v2: https://lore.kernel.org/bpf/20251025053017.2308823-1-kafai.wan@linux.dev/ - Enhance is_branch_taken() and is_scalar_branch_taken() to handle branch direction computation for same register. (Eduard and Alexei) - Update the selftest. v1: https://lore.kernel.org/bpf/20251022164457.1203756-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Skip bounds adjustment for conditional jumps on same scalar register selftests/bpf: Add test for conditional jumps on same scalar register kernel/bpf/verifier.c | 33 ++++ .../selftests/bpf/progs/verifier_bounds.c | 154 ++++++++++++++++++ 2 files changed, 187 insertions(+) -- 2.43.0

2 months, 1 week

2
4
0 0

[PATCH 00/12] tools/nolibc: always use 64-bit ino_t, off_t and time-related types

by Thomas Weißschuh

nolibc currently uses 32-bit types for various APIs. These are problematic as their reduced value range can lead to truncated values. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Thomas Weißschuh (12): tools/nolibc: use 64-bit ino_t tools/nolibc: handle 64-bit off_t for llseek tools/nolibc: prefer the llseek syscall tools/nolibc: use 64-bit off_t tools/nolibc: remove now superfluous overflow check in llseek tools/nolibc: remove more __nolibc_enosys() fallbacks tools/nolibc: prefer explicit 64-bit time-related system calls tools/nolibc: gettimeofday(): avoid libgcc 64-bit divisions tools/nolibc: use a custom struct timespec tools/nolibc: always use 64-bit time types selftests/nolibc: test compatibility of timespec and __kernel_timespec tools/nolibc: remove time conversions tools/include/nolibc/arch-s390.h | 3 + tools/include/nolibc/poll.h | 12 ++-- tools/include/nolibc/std.h | 6 +- tools/include/nolibc/sys.h | 21 +++--- tools/include/nolibc/sys/time.h | 2 +- tools/include/nolibc/sys/timerfd.h | 20 +----- tools/include/nolibc/time.h | 96 ++++++---------------------- tools/include/nolibc/types.h | 9 ++- tools/testing/selftests/nolibc/nolibc-test.c | 18 ++++++ 9 files changed, 68 insertions(+), 119 deletions(-) --- base-commit: 90ee85c0e1e4b5804ceebbd731653e10ef3849a6 change-id: 20251001-nolibc-uapi-types-1c072d10fcc7 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

2 months, 1 week

3
26
0 0

NEW PO 83199 Saturday, November 1, 2025 at 08:22:32 PM

by Procurement 05471

Hi Linux-kselftest, Please provide a quote for your products: Include: 1.Pricing (per unit) 2.Delivery cost & timeline 3.Quote expiry date Deadline: October Thanks! Danny Peddinti PathnSitu Trading

2 months, 1 week

1
0
0 0

[PATCH v2 0/2] Print map ID on successful creation

by Harshit Mogalapalli

Hi all, I have tried looking at an issue from the bpftool repository: https://github.com/libbpf/bpftool/issues/121 and this RFC tries to add that enhancement. Summary: Currently when a map creation is successful there is no message on the terminal, printing IDs on successful creation of maps can help notify the user and can be used in CI/CD. The first patch adds the logic for printing and the second patch adds a simple selftest for the same. The github issue is not fully solved with these two patches, as there are other bpf objects that might need similar additions. Would appreciate any inputs on this. Thank you very much. V1 --> V2: PATCH 1 updated [Thanks Yonghong for suggesting better way of error handling with a new label for close(fd); instead of calling multiple times] Regards, Harshit Harshit Mogalapalli (2): bpftool: Print map ID upon creation and support JSON output selftests/bpf: Add test for bpftool map ID printing tools/bpf/bpftool/map.c | 21 ++++++++--- .../testing/selftests/bpf/test_bpftool_map.sh | 36 +++++++++++++++++++ 2 files changed, 53 insertions(+), 4 deletions(-) -- 2.50.1

2 months, 1 week

3
8
0 0

[PATCH 00/22] mm/damon/tests: fix memory bugs in kunit tests

by SeongJae Park

DAMON kunit tests were initially written assuming those will be run on environments that are well controlled and therefore tolerant to transient test failures and bugs in the test code itself. The user-mode linux based manual run of the tests is one example of such an environment. And the test code was written for adding more test coverage as fast as possible, over making those safe and reliable. As a result, the tests resulted in having a number of bugs including real memory leaks, theoretical unhandled memory allocation failures, and unused memory allocations. The allocation failures that are not handled well are unlikely in the real world, since those allocations are too small to fail. But in theory, it can happen and cause inappropriate memory access. It is arguable if bugs in test code can really harm users. But, anyway bugs are bugs that need to be fixed. Fix the bugs one by one. Also Cc stable@ for the fixes of memory leak and unhandled memory allocation failures. The unused memory allocations are only a matter of memory efficiency, so not Cc-ing stable@. The first patch fixes memory leaks in the test code for the DAMON core layer. Following fifteen, three, and one patches respectively fix unhandled memory allocation failures in the test code for DAMON core layer, virtual address space DAMON operation set, and DAMON sysfs interface, one by one per test function. Final two patches remove memory allocations that are correctly deallocated at the end, but not really being used by any code. SeongJae Park (22): mm/damon/tests/core-kunit: fix memory leak in damon_test_set_filters_default_reject() mm/damon/tests/core-kunit: handle allocation failures in damon_test_regions() mm/damon/tests/core-kunit: handle memory failure from damon_test_target() mm/damon/tests/core-kunit: handle memory alloc failure from damon_test_aggregate() mm/damon/tests/core-kunit: handle alloc failures on damon_test_split_at() mm/damon/tests/core-kunit: handle alloc failures on damon_test_merge_two() mm/damon/tests/core-kunit: handle alloc failures on dasmon_test_merge_regions_of() mm/damon/tests/core-kunit: handle alloc failures on damon_test_split_regions_of() mm/damon/tests/core-kunit: handle alloc failures in damon_test_ops_registration() mm/damon/tests/core-kunit: handle alloc failures in damon_test_set_regions() mm/damon/tests/core-kunit: handle alloc failures in damon_test_update_monitoring_result() mm/damon/tests/core-kunit: handle alloc failure on damon_test_set_attrs() mm/damon/tests/core-kunit: handle alloc failres in damon_test_new_filter() mm/damon/tests/core-kunit: handle alloc failure on damos_test_commit_filter() mm/damon/tests/core-kunit: handle alloc failures on damos_test_filter_out() mm/damon/tests/core-kunit: handle alloc failures on damon_test_set_filters_default_reject() mm/damon/tests/vaddr-kunit: handle alloc failures on damon_do_test_apply_three_regions() mm/damon/tests/vaddr-kunit: handle alloc failures in damon_test_split_evenly_fail() mm/damon/tests/vaddr-kunit: handle alloc failures on damon_test_split_evenly_succ() mm/damon/tests/sysfs-kunit: handle alloc failures on damon_sysfs_test_add_targets() mm/damon/tests/core-kunit: remove unnecessary damon_ctx variable on damon_test_split_at() mm/damon/tests/core-kunit: remove unused ctx in damon_test_split_regions_of() mm/damon/tests/core-kunit.h | 125 ++++++++++++++++++++++++++++++++--- mm/damon/tests/sysfs-kunit.h | 25 +++++++ mm/damon/tests/vaddr-kunit.h | 26 +++++++- 3 files changed, 163 insertions(+), 13 deletions(-) base-commit: 75f0c76bb8c01fdea838a601dc3326b11177c0d8 -- 2.47.3

2 months, 1 week

1
22
0 0

[PATCH] selftests/user_events: Avoid taking address of packed member in perf_test

by Ankit Khushwaha

Accessing 'reg.write_index' directly triggers a -Waddress-of-packed-member warning due to potential unaligned pointer access: perf_test.c:239:38: warning: taking address of packed member 'write_index' of class or structure 'user_reg' may result in an unaligned pointer value [-Waddress-of-packed-member] 239 | ASSERT_NE(-1, write(self->data_fd, &reg.write_index, | ^~~~~~~~~~~~~~~ Use memcpy() instead to safely copy the value and avoid unaligned pointer access across architectures. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/user_events/perf_test.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/user_events/perf_test.c b/tools/testing/selftests/user_events/perf_test.c index 201459d8094d..e4385f4aa231 100644 --- a/tools/testing/selftests/user_events/perf_test.c +++ b/tools/testing/selftests/user_events/perf_test.c @@ -201,6 +201,7 @@ TEST_F(user, perf_empty_events) { struct perf_event_mmap_page *perf_page; int page_size = sysconf(_SC_PAGESIZE); int id, fd; + __u32 write_index; __u32 *val; reg.size = sizeof(reg); @@ -236,7 +237,8 @@ TEST_F(user, perf_empty_events) { ASSERT_EQ(1 << reg.enable_bit, self->check); /* Ensure write shows up at correct offset */ - ASSERT_NE(-1, write(self->data_fd, &reg.write_index, + memcpy(&write_index, &reg.write_index, sizeof(reg.write_index)); + ASSERT_NE(-1, write(self->data_fd, &write_index, sizeof(reg.write_index))); val = (void *)(((char *)perf_page) + perf_page->data_offset); ASSERT_EQ(PERF_RECORD_SAMPLE, *val); -- 2.51.0

2 months, 1 week

2
5
0 0

[PATCH net v2] selftests: netdevsim: Fix ethtool-coalesce.sh fail by installing ethtool-common.sh

by Wang Liang

The script "ethtool-common.sh" is not installed in INSTALL_PATH, and triggers some errors when I try to run the test 'drivers/net/netdevsim/ethtool-coalesce.sh': TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-coalesce.sh # ./ethtool-coalesce.sh: line 4: ethtool-common.sh: No such file or directory # ./ethtool-coalesce.sh: line 25: make_netdev: command not found # ethtool: bad command line argument(s) # ./ethtool-coalesce.sh: line 124: check: command not found # ./ethtool-coalesce.sh: line 126: [: -eq: unary operator expected # FAILED /0 checks not ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh # exit=1 Install this file to avoid this error. After this patch: TAP version 13 1..1 # timeout set to 600 # selftests: drivers/net/netdevsim: ethtool-coalesce.sh # PASSED all 22 checks ok 1 selftests: drivers/net/netdevsim: ethtool-coalesce.sh Fixes: fbb8531e58bd ("selftests: extract common functions in ethtool-common.sh") Signed-off-by: Wang Liang <wangliang74(a)huawei.com> --- tools/testing/selftests/drivers/net/netdevsim/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools/testing/selftests/drivers/net/netdevsim/Makefile index daf51113c827..df10c7243511 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -20,4 +20,8 @@ TEST_PROGS := \ udp_tunnel_nic.sh \ # end of TEST_PROGS +TEST_FILES := \ + ethtool-common.sh +# end of TEST_FILES + include ../../../lib.mk -- 2.34.1

2 months, 1 week

2
1
0 0

[PATCH net] selftests/net: use destination options instead of hop-by-hop

by Anubhav Singh

The GRO self-test, gro.c, currently constructs IPv6 packets containing a Hop-by-Hop Options header (IPPROTO_HOPOPTS) to ensure the GRO path correctly handles IPv6 extension headers. However, network elements may be configured to drop packets with the Hop-by-Hop Options header (HBH). This causes the self-test to fail in environments where such network elements are present. To improve the robustness and reliability of this test in diverse network environments, switch from using IPPROTO_HOPOPTS to IPPROTO_DSTOPTS (Destination Options). The Destination Options header is less likely to be dropped by intermediate routers and still serves the core purpose of the test: validating GRO's handling of an IPv6 extension header. This change ensures the test can execute successfully without being incorrectly failed by network policies outside the kernel's control. Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn <willemb(a)google.com> Signed-off-by: Anubhav Singh <anubhavsinggh(a)google.com> --- tools/testing/selftests/net/gro.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c index 2b1d9f2b3e9e..d8c29fe39c1d 100644 --- a/tools/testing/selftests/net/gro.c +++ b/tools/testing/selftests/net/gro.c @@ -754,11 +754,11 @@ static void send_ipv6_exthdr(int fd, struct sockaddr_ll *daddr, char *ext_data1, static char exthdr_pck[sizeof(buf) + MIN_EXTHDR_SIZE]; create_packet(buf, 0, 0, PAYLOAD_LEN, 0); - add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_HOPOPTS, ext_data1); + add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_DSTOPTS, ext_data1); write_packet(fd, exthdr_pck, total_hdr_len + PAYLOAD_LEN + MIN_EXTHDR_SIZE, daddr); create_packet(buf, PAYLOAD_LEN * 1, 0, PAYLOAD_LEN, 0); - add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_HOPOPTS, ext_data2); + add_ipv6_exthdr(buf, exthdr_pck, IPPROTO_DSTOPTS, ext_data2); write_packet(fd, exthdr_pck, total_hdr_len + PAYLOAD_LEN + MIN_EXTHDR_SIZE, daddr); } -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

2
1
0 0

[PATCH net] selftests/net: fix out-of-order delivery of FIN in gro:tcp test

by Anubhav Singh

Due to the gro_sender sending data packets and FIN packets in very quick succession, these are received almost simultaneously by the gro_receiver. FIN packets are sometimes processed before the data packets leading to intermittent (~1/100) test failures. This change adds a delay of 100ms before sending FIN packets in gro:tcp test to avoid the out-of-order delivery. The same mitigation already exists for the gro:ip test. Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn <willemb(a)google.com> Signed-off-by: Anubhav Singh <anubhavsinggh(a)google.com> --- tools/testing/selftests/net/gro.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c index 2b1d9f2b3e9e..3fa63bd85dea 100644 --- a/tools/testing/selftests/net/gro.c +++ b/tools/testing/selftests/net/gro.c @@ -989,6 +989,7 @@ static void check_recv_pkts(int fd, int *correct_payload, static void gro_sender(void) { + const int fin_delay_us = 100 * 1000; static char fin_pkt[MAX_HDR_LEN]; struct sockaddr_ll daddr = {}; int txfd = -1; @@ -1032,15 +1033,22 @@ static void gro_sender(void) write_packet(txfd, fin_pkt, total_hdr_len, &daddr); } else if (strcmp(testname, "tcp") == 0) { send_changed_checksum(txfd, &daddr); + /* Adding sleep before sending FIN so that it is not + * received prior to other packets. + */ + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); send_changed_seq(txfd, &daddr); + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); send_changed_ts(txfd, &daddr); + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); send_diff_opt(txfd, &daddr); + usleep(fin_delay_us); write_packet(txfd, fin_pkt, total_hdr_len, &daddr); } else if (strcmp(testname, "ip") == 0) { send_changed_ECN(txfd, &daddr); -- 2.51.1.851.g4ebd6896fd-goog

2 months, 1 week

3
3
0 0

[RFC PATCH 00/21] VFIO live update support

by Vipin Sharma

Hello, This series adds the live update support in the VFIO PCI subsystem on top of Live Update Orchestrator (LUO) [1]. This series can also be found on GitHub: https://github.com/shvipin/linux vfio/liveupdate/rfc-v1 Goal of live update in VFIO subsystem is to preserve VFIO PCI devices while the host kernel is going through a live update. A preserved device means it can continue to work, perform DMA, not get reset while host under live update gets rebooted via kexec. This series registers VFIO with LUO, implements LUO callbacks, skip DMA clear, skip device reset, preserves and restores a device virtual config during live update. I have added a selftest towards the end of this series, vfio_pci_liveupdate_test, which sets certain properties of a VFIO PCI device, performs a live update, and then validates those properties are still same on the device. Overall flow for a VFIO device going through a live update will be something like: 1. Userspace passes a VFIO cdev FD along with a token to LUO for preservation. 2. LUO passes FD to VFIO subsystem to verify if FD can be preserved. If yes, it increases the refcount on the FD. 3. Eventually, userspace tells LUO to prepare for live update which results in LUO calling prepare() callback to each of its register filesystem handler with the passed FD it should be preparing. 4. VFIO subsystem saves certain properties which will be either lost or hard to recover from the device. 5. VFIO saves the needed data to KHO and provide LUO with the physical address of the data preserved by KHO. 6. Userspace sends FREEZE event to freeze the system. LUO forwards this to each of its registered subsystem. 7. VFIO disables interrupts configured on the device during freeze call. 8. Userspace performs kexec. 9. During kexec reboot, generally, all PCI devices gets their Bus Master Enable bit disabled. In live update case, preserved VFIO devices are skipped. 9. During boot, usual device enumeration happens and LUO also intializes itself. 10. Userspace uses the same token value (step 1), and ask LUO to return VFIO FD corresponding to token. 11. LUO ask VFIO to return VFIO cdev FD corresponding to the token. It gives it the physical address which VFIO returned it in step 5. 12. VFIO restore the KHO data and read the BDF value it saved. It iterates through all of the VFIO device it has in its VFIO cdev class and finds the BDF device. 13. VFIO creates an anonymous inode and file corresponding to the VFIO PCI device and returns it to LUO and LUO returns it to userspace. 14. Now FD returned to userspace works exactly same as if userspace has opened a VFIO device from /dev/vfio/device/* location. 15. It makes usual bind iommufd and attach page table calls. 16. During bind, when VFIO device is internally opened for the first time: - VFIO skips Bus Master Disable - VFIO skips device reset. - VFIO instead of initializing vconfig from the scratch uses the vconfig stored in KHO, and same for few other fields. This is what current series is implementing and validating through selftest. There are other things are which not implemented yet and some are also dependent on other subsystems. For example: 1. Once a device has been prepared, VFIO should not allow any changes to its state from userspace for example, changing PCI config values, resetting the device, etc. 2. Device IOVA is not preserved in this series. This work is done separately in IOMMMUFD live update preservation [2] 3. During PCI device enumeration, PCI subsystem writes to PCI config space, attach device to its original driver if present. This work is being done in PCI preservation [3]. 4. Enabling PCI device done in VFIO subsystem should be handled in PCI subsystem. Current, this patch series hasn't changed the behavior. 5. If live update gets canceled, interrupts which are disabled in freeze need to be reconfigured again. 6. In finish, if a device is not restored, how to know if KHO folio has been restored or not. 6. VFIO cdev is restored in anonymous file system. This should instead be done on devetmpfs For reviewers, following are the grouping of patches in this series: Patches 1-4 ----------- Feel free to ignore if you are only interested in VFIO. These are only for live update selftests. I had to make some changes on top LUO v4 series, to create a library out of them which can be used in other selftests (vfio), and fix some build issues. Patches 5-9 ----------- Adds basic live update support in VFIO. Registers to LUO, saves the device BDF in KHO during prepare, and returns VFIO cdev FD during restore. It doesn't save or skip anything else. Patches 10-17 ------------- Adds support for skipping certain opertions and preserving certain data needed to restore a device. Patches 18-21 ------------- - Integrate VFIO selftest with live update selftest library. - Adds a basic vfio_pci_liveupdate_test test which validates that Bus Master Enable bit is preserved, and virtual config is restored properly. Testing ------- I have done testing on QEMU with a test pci device and also on a bare metal with Intel DSA device. Make sure IDXD driver is not built in your kernel if testing with Intel DSA device. Basically, whichever device you use, it should not get auto-bind to any other driver. Important config options which should be enabled to test this series: - CONFIG_KEXEC_FILE - CONFIG_LIVEUPDATE - CONFIG_KEXEC_HANDOVER Besides this usual VFIO, VFIO_PCI, IOMMU and other dependencies are enabled. To build the test provide KHDR_INCLUDES to your make command if your headers are out-of-tree. KHDR_INCLUDES="-isystem ../../../../build/usr/include" make vfio_pci_liveupdate_test needs to be executed manually. This test needs to be executed two times; one before the live update and second after. ./run.sh -d 0000:00:04.0 vfio_pci_liveupdate_test Next Steps ---------- 1. Looking forward to feedback on this series. - What other things we should save? - Which things should not be saved? - Any locks or incorrect locking done in the series. - Any optimizations. 2. Integration with IOMMUFD and PCI series for complete workflow where a device continues a DMA while undergoing through live update. I will be going on a paternity leave soon, so, my responses gonna be intermittent. David Matlack (dmatlack(a)google.com) has graciously offered to work on this series and continue upstream engagement on this feature until I am back. Thank you, David! [1] LUO-v4: https://lore.kernel.org/linux-mm/20250929010321.3462457-1-pasha.tatashin@so… [2] IOMMUFD: https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@googl… [3] PCI: https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel… Vipin Sharma (21): selftests/liveupdate: Build tests from the selftests/liveupdate directory selftests/liveupdate: Create library of core live update ioctls selftests/liveupdate: Move do_kexec.sh script to liveupdate/lib selftests/liveupdate: Move LUO ioctls calls to liveupdate library vfio/pci: Register VFIO live update file handler to Live Update Orchestrator vfio/pci: Accept live update preservation request for VFIO cdev vfio/pci: Store VFIO PCI device preservation data in KHO for live update vfio/pci: Retrieve preserved VFIO device for Live Update Orechestrator vfio/pci: Add Live Update finish callback implementation PCI: Add option to skip Bus Master Enable reset during kexec vfio/pci: Skip clearing bus master on live update device during kexec vfio/pci: Skip clearing bus master on live update restored device vfio/pci: Preserve VFIO PCI config space through live update vfio/pci: Skip device reset on live update restored device. PCI: Make PCI saved state and capability structs public vfio/pci: Save and restore the PCI state of the VFIO device vfio/pci: Disable interrupts before going live update kexec vfio: selftests: Build liveupdate library in VFIO selftests vfio: selftests: Initialize vfio_pci_device using a VFIO cdev FD vfio: selftests: Add VFIO live update test vfio: selftests: Validate vconfig preservation of VFIO PCI device during live update drivers/pci/pci-driver.c | 6 +- drivers/pci/pci.c | 5 - drivers/pci/pci.h | 7 - drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci_config.c | 17 + drivers/vfio/pci/vfio_pci_core.c | 31 +- drivers/vfio/pci/vfio_pci_liveupdate.c | 461 ++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 17 + drivers/vfio/vfio_main.c | 20 +- include/linux/pci.h | 15 + include/linux/vfio.h | 8 + include/linux/vfio_pci_core.h | 1 + tools/testing/selftests/liveupdate/.gitignore | 7 +- tools/testing/selftests/liveupdate/Makefile | 31 +- .../liveupdate/{ => lib}/do_kexec.sh | 0 .../liveupdate/lib/include/liveupdate_util.h | 27 + .../selftests/liveupdate/lib/libliveupdate.mk | 18 + .../liveupdate/lib/liveupdate_util.c | 106 ++++ .../selftests/liveupdate/luo_multi_file.c | 2 - .../selftests/liveupdate/luo_multi_kexec.c | 2 - .../selftests/liveupdate/luo_multi_session.c | 2 - .../selftests/liveupdate/luo_test_utils.c | 73 +-- .../selftests/liveupdate/luo_test_utils.h | 10 +- .../selftests/liveupdate/luo_unreclaimed.c | 1 - tools/testing/selftests/vfio/Makefile | 15 +- .../selftests/vfio/lib/include/vfio_util.h | 1 + .../selftests/vfio/lib/vfio_pci_device.c | 33 +- .../selftests/vfio/vfio_pci_liveupdate_test.c | 116 +++++ 28 files changed, 900 insertions(+), 133 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_liveupdate.c rename tools/testing/selftests/liveupdate/{ => lib}/do_kexec.sh (100%) create mode 100644 tools/testing/selftests/liveupdate/lib/include/liveupdate_util.h create mode 100644 tools/testing/selftests/liveupdate/lib/libliveupdate.mk create mode 100644 tools/testing/selftests/liveupdate/lib/liveupdate_util.c create mode 100644 tools/testing/selftests/vfio/vfio_pci_liveupdate_test.c base-commit: e48be01cadc981362646dc3a87d57316421590a5 -- 2.51.0.858.gf9c4a03a3a-goog

2 months, 2 weeks

6
56
0 0

[PATCH] gpio-selftests: replace fixed sleep with polling+timeout

by zntsproj

Replace the hard-coded sleep 0.1 with a polling loop with timeout to check the sysfs GPIO value. This avoids timing-dependent flaky failures in CI and on slower machines. --- .../testing/selftests/gpio/gpio-aggregator.sh | 59 +++++++++++++++---- 1 file changed, 46 insertions(+), 13 deletions(-) diff --git a/tools/testing/selftests/gpio/gpio-aggregator.sh b/tools/testing/selftests/gpio/gpio-aggregator.sh index 9b6f80ad9..1e81e62e9 100755 --- a/tools/testing/selftests/gpio/gpio-aggregator.sh +++ b/tools/testing/selftests/gpio/gpio-aggregator.sh @@ -671,26 +671,59 @@ teardown_4() { agg_configfs_cleanup } +# helper: wait for sysfs file to become a given value (timeout in seconds) +wait_for_sysfs_value() { + file="$1" + expected="$2" + timeout="${3:-2}" # seconds + interval="0.01" # seconds per poll + max=$((timeout * 100)) + i=0 + + while [ "$i" -lt "$max" ]; do + if [ "$(cat "$file")" = "$expected" ]; then + return 0 + fi + sleep "$interval" + i=$((i + 1)) + done + + return 1 +} + echo "4.1. Forwarding set values" setup_4 OFFSET=0 for SETTING in $SETTINGS; do - CHIP=$(echo "$SETTING" | cut -d: -f1) - BANK=$(echo "$SETTING" | cut -d: -f2) - LINE=$(echo "$SETTING" | cut -d: -f3) - DEVNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/dev_name") - CHIPNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/$BANK/chip_name") - VAL_PATH="/sys/devices/platform/$DEVNAME/$CHIPNAME/sim_gpio${LINE}/value" - test $(cat $VAL_PATH) = "0" || fail "incorrect value read from sysfs" - $BASE_DIR/gpio-mockup-cdev -s 1 "/dev/$(agg_configfs_chip_name agg0)" "$OFFSET" & - mock_pid=$! - sleep 0.1 # FIXME Any better way? - test "$(cat $VAL_PATH)" = "1" || fail "incorrect value read from sysfs" - kill "$mock_pid" - OFFSET=$(expr $OFFSET + 1) + CHIP=$(echo "$SETTING" | cut -d: -f1) + BANK=$(echo "$SETTING" | cut -d: -f2) + LINE=$(echo "$SETTING" | cut -d: -f3) + DEVNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/dev_name") + CHIPNAME=$(cat "$CONFIGFS_SIM_DIR/$CHIP/$BANK/chip_name") + VAL_PATH="/sys/devices/platform/$DEVNAME/$CHIPNAME/sim_gpio${LINE}/value" + + test "$(cat "$VAL_PATH")" = "0" || fail "incorrect value read from sysfs" + + $BASE_DIR/gpio-mockup-cdev -s 1 "/dev/$(agg_configfs_chip_name agg0)" "$OFFSET" & + mock_pid=$! + + # wait up to 2s for value to flip to "1" + if ! wait_for_sysfs_value "$VAL_PATH" "1" 2; then + kill "$mock_pid" 2>/dev/null || true + wait "$mock_pid" 2>/dev/null || true + fail "timeout waiting for $VAL_PATH to become 1" + fi + + test "$(cat "$VAL_PATH")" = "1" || fail "incorrect value read from sysfs" + + kill "$mock_pid" 2>/dev/null || true + wait "$mock_pid" 2>/dev/null || true + + OFFSET=$((OFFSET + 1)) done teardown_4 + echo "4.2. Forwarding set config" setup_4 OFFSET=0 -- 2.51.2

2 months, 2 weeks

1
0
0 0

[PATCH v2] selftests: af_unix: Add tests for ECONNRESET and EOF semantics

by Sunday Adelodun

Add selftests to verify and document Linux’s intended behaviour for UNIX domain sockets (SOCK_STREAM and SOCK_DGRAM) when a peer closes. The tests cover: 1. EOF returned when a SOCK_STREAM peer closes normally. 2. ECONNRESET returned when a SOCK_STREAM peer closes with unread data. 3. SOCK_DGRAM sockets not returning ECONNRESET on peer close. This follows up on review feedback suggesting a selftest to clarify Linux’s semantics. Suggested-by: Kuniyuki Iwashima <kuniyu(a)google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa(a)yahoo.com> --- Changelog: Changes made from v1: - Patch prefix updated to selftest: af_unix:. - All mentions of “UNIX” changed to AF_UNIX. - Removed BSD references from comments. - Shared setup refactored using FIXTURE_VARIANT(). - Cleanup moved to FIXTURE_TEARDOWN() to always run. - Tests consolidated to reduce duplication: EOF, ECONNRESET, SOCK_DGRAM peer close. - Corrected ASSERT usage and initialization style. - Makefile updated for new directory af_unix. tools/testing/selftests/net/af_unix/Makefile | 1 + .../selftests/net/af_unix/unix_connreset.c | 161 ++++++++++++++++++ 2 files changed, 162 insertions(+) create mode 100644 tools/testing/selftests/net/af_unix/unix_connreset.c diff --git a/tools/testing/selftests/net/af_unix/Makefile b/tools/testing/selftests/net/af_unix/Makefile index de805cbbdf69..5826a8372451 100644 --- a/tools/testing/selftests/net/af_unix/Makefile +++ b/tools/testing/selftests/net/af_unix/Makefile @@ -7,6 +7,7 @@ TEST_GEN_PROGS := \ scm_pidfd \ scm_rights \ unix_connect \ + unix_connreset \ # end of TEST_GEN_PROGS include ../../lib.mk diff --git a/tools/testing/selftests/net/af_unix/unix_connreset.c b/tools/testing/selftests/net/af_unix/unix_connreset.c new file mode 100644 index 000000000000..c65ec997d77d --- /dev/null +++ b/tools/testing/selftests/net/af_unix/unix_connreset.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for AF_UNIX socket close and ECONNRESET behaviour. + * + * This test verifies that: + * 1. SOCK_STREAM sockets return EOF when peer closes normally. + * 2. SOCK_STREAM sockets return ECONNRESET if peer closes with unread data. + * 3. SOCK_DGRAM sockets do not return ECONNRESET when peer closes. + * + * These tests document the intended Linux behaviour. + * + */ + +#define _GNU_SOURCE +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <errno.h> +#include <sys/socket.h> +#include <sys/un.h> +#include "../../kselftest_harness.h" + +#define SOCK_PATH "/tmp/af_unix_connreset.sock" + +static void remove_socket_file(void) +{ + unlink(SOCK_PATH); +} + +FIXTURE(unix_sock) +{ + int server; + int client; + int child; +}; + +FIXTURE_VARIANT(unix_sock) +{ + int socket_type; + const char *name; +}; + +/* Define variants: stream and datagram */ +FIXTURE_VARIANT_ADD(unix_sock, stream) { + .socket_type = SOCK_STREAM, + .name = "SOCK_STREAM", +}; + +FIXTURE_VARIANT_ADD(unix_sock, dgram) { + .socket_type = SOCK_DGRAM, + .name = "SOCK_DGRAM", +}; + +FIXTURE_SETUP(unix_sock) +{ + struct sockaddr_un addr = {}; + int err; + + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, SOCK_PATH); + + self->server = socket(AF_UNIX, variant->socket_type, 0); + ASSERT_LT(-1, self->server); + + err = bind(self->server, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + if (variant->socket_type == SOCK_STREAM) { + err = listen(self->server, 1); + ASSERT_EQ(0, err); + + self->client = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + + self->child = accept(self->server, NULL, NULL); + ASSERT_LT(-1, self->child); + } else { + /* Datagram: bind and connect only */ + self->client = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0); + ASSERT_LT(-1, self->client); + + err = connect(self->client, (struct sockaddr *)&addr, sizeof(addr)); + ASSERT_EQ(0, err); + } +} + +FIXTURE_TEARDOWN(unix_sock) +{ + if (variant->socket_type == SOCK_STREAM) + close(self->child); + + close(self->client); + close(self->server); + remove_socket_file(); +} + +/* Test 1: peer closes normally */ +TEST_F(unix_sock, eof) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type != SOCK_STREAM) + SKIP(return, "This test only applies to SOCK_STREAM"); + + /* Peer closes normally */ + close(self->child); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + if (n == -1) + ASSERT_EQ(ECONNRESET, errno); + + if (n != -1) + ASSERT_EQ(0, n); +} + +/* Test 2: peer closes with unread data */ +TEST_F(unix_sock, reset_unread) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type != SOCK_STREAM) + SKIP(return, "This test only applies to SOCK_STREAM"); + + /* Send data that will remain unread by client */ + send(self->client, "hello", 5, 0); + close(self->child); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + ASSERT_EQ(-1, n); + ASSERT_EQ(ECONNRESET, errno); +} + +/* Test 3: SOCK_DGRAM peer close */ +TEST_F(unix_sock, dgram_reset) +{ + char buf[16] = {}; + ssize_t n; + + if (variant->socket_type != SOCK_DGRAM) + SKIP(return, "This test only applies to SOCK_DGRAM"); + + send(self->client, "hello", 5, 0); + close(self->server); + + n = recv(self->client, buf, sizeof(buf), 0); + TH_LOG("%s: recv=%zd errno=%d (%s)", variant->name, n, errno, strerror(errno)); + /* Expect EAGAIN because there is no datagram and peer is closed. */ + ASSERT_EQ(-1, n); + ASSERT_EQ(EAGAIN, errno); +} + +TEST_HARNESS_MAIN + -- 2.43.0

2 months, 2 weeks

2
4
0 0

[PATCH v22 00/28] riscv control-flow integrity for usermode

by Deepak Gupta

v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. v21: fixed build errors. Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch. v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected. v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__" v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that. v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri… Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri… Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri… Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri… Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri… Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

2 months, 2 weeks

2
15
0 0

[PATCH bpf-next v7 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework

by Bastien Curutchet (eBPF Foundation)

Hi all, The test_xsk.sh script covers many AF_XDP use cases. The tests it runs are defined in xksxceiver.c. Since this script is used to test real hardware, the goal here is to leave it as it is, and only integrate the tests that run on veth peers into the test_progs framework. PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the tests available to test_progs. PATCH 2 to 7 fix small issues in the current test PATCH 8 to 13 handle all errors to release resources instead of calling exit() when any error occurs. PATCH 14 isolates the tests that won't fit in the CI PATCH 15 integrates the CI tests to the test_progs framework Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Changes in v7: - Restore 'test_ns' prefix to allow parallel execution. - PATCH 11: fix potential uninitialized variable spotted by AI. - PACTH 12: fix potential resource leak spotted by AI - Link to v6: https://lore.kernel.org/r/20251029-xsk-v6-0-5a63a64dff98@bootlin.com Changes in v6: - Setup veth peer once for each mode instead of once for each substest - Rename the 'flaky' table 'skip-ci' table and move the automatically skipped and the longest tests into it - Link to v5: https://lore.kernel.org/r/20251016-xsk-v5-0-662c95eb8005@bootlin.com Changes in v5: - Rebase on latest bpf-next_base - Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table - Add Maciej's reviewed-by - Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com Changes in v4: - Fix test_xsk.sh's summary report. - Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build. - Split old PATCH 3 in two patches. The first one fixes testapp_stats_rx_dropped(), the second one fixes testapp_xdp_shared_umem(). The unecessary frees (in testapp_stats_rx_full() and testapp_stats_fill_empty() are removed) - Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com Changes in v3: - Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf: Fix count write in testapp_xdp_metadata_copy()"). - Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests - Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com Changes in v2: - Rebase on the latest bpf-next_base and integrate the newly added tests to the work (adjust_tail* and tx_queue_consumer tests) - Re-order patches to split xkxceiver sooner. - Fix the bug reported by Maciej. - Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1, 7 and 8) - Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com --- Bastien Curutchet (eBPF Foundation) (15): selftests/bpf: test_xsk: Split xskxceiver selftests/bpf: test_xsk: Initialize bitmap before use selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped() selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem() selftests/bpf: test_xsk: Wrap test clean-up in functions selftests/bpf: test_xsk: Release resources when swap fails selftests/bpf: test_xsk: Add return value to init_iface() selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails selftests/bpf: test_xsk: Don't exit immediately when workers fail selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails selftests/bpf: test_xsk: Don't exit immediately on allocation failures selftests/bpf: test_xsk: Isolate non-CI tests selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework tools/testing/selftests/bpf/Makefile | 11 +- tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2596 ++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/test_xsk.h | 298 +++ tools/testing/selftests/bpf/prog_tests/xsk.c | 151 ++ tools/testing/selftests/bpf/xskxceiver.c | 2696 +-------------------- tools/testing/selftests/bpf/xskxceiver.h | 156 -- 6 files changed, 3184 insertions(+), 2724 deletions(-) --- base-commit: 1e2d874b04ba46a3b9fe6697097aa437641f4339 change-id: 20250218-xsk-0cf90e975d14 Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

2 months, 2 weeks

3
17
0 0

[PATCH v7 00/15] Consolidate iommu page table implementations (AMD)

by Jason Gunthorpe

[Kevin has a done a great job to get through reviews on all these, and Vasant/Ankit have been looking at it on AMD systems, I think we are close to being done now!] Currently each of the iommu page table formats duplicates all of the logic to maintain the page table and perform map/unmap/etc operations. There are several different versions of the algorithms between all the different formats. The io-pgtable system provides an interface to help isolate the page table code from the iommu driver, but doesn't provide tools to implement the common algorithms. This makes it very hard to improve the state of the pagetable code under the iommu domains as any proposed improvement needs to alter a large number of different driver code paths. Combined with a lack of software based testing this makes improvement in this area very hard. iommufd wants several new page table operations: - More efficient map/unmap operations, using iommufd's batching logic - unmap that returns the physical addresses into a batch as it progresses - cut that allows splitting areas so large pages can have holes poked in them dynamically (ie guestmemfd hitless shared/private transitions) - More agressive freeing of table memory to avoid waste - Fragmenting large pages so that dirty tracking can be more granular - Reassembling large pages so that VMs can run at full IO performance in migration/dirty tracking error flows - KHO integration for kernel live upgrade Together these are algorithmically complex enough to be a very significant task to go and implement in all the page table formats we support. Just the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86 PAE / AMDv1 / VT-d SS / RISCV) Instead of doing the duplicated work, this series takes the first step to consolidate the algorithms into one places. In spirit it is similar to the work Christoph did a few years back to pull the redundant get_user_pages() implementations out of the arch code into core MM. This unlocked a great deal of improvement in that space in the following years. I would like to see the same benefit in iommu as well. My first RFC showed a bigger picture with all most all formats and more algorithms. This series reorganizes that to be narrowly focused on just enough to convert the AMD driver to use the new mechanism. kunit tests are provided that allow good testing of the algorithms and all formats on x86, nothing is arch specific. AMD is one of the simpler options as the HW is quite uniform with few different options/bugs while still requiring the complicated contiguous pages support. The HW also has a very simple range based invalidation approach that is easy to implement. The AMD v1 and AMD v2 page table formats are implemented bit for bit identical to the current code, tested using a compare kunit test that checks against the io-pgtable version (on github, see below). Updating the AMD driver to replace the io-pgtable layer with the new stuff is fairly straightforward now. The layering is fixed up in the new version so that all the invalidation goes through function pointers. Several small fixing patches have come out of this as I've been fixing the problems that the test suite uncovers in the current code, and implementing the fixed version in iommupt. On performance, there is a quite wide variety of implementation designs across all the drivers. Looking at some key performance across the main formats: iommu_map(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 53,66 , 51,63 , 19.19 (AMDV1) 256*2^12, 386,1909 , 367,1795 , 79.79 256*2^21, 362,1633 , 355,1556 , 77.77 2^12, 56,62 , 52,59 , 11.11 (AMDv2) 256*2^12, 405,1355 , 357,1292 , 72.72 256*2^21, 393,1160 , 358,1114 , 67.67 2^12, 55,65 , 53,62 , 14.14 (VT-d second stage) 256*2^12, 391,518 , 332,512 , 35.35 256*2^21, 383,635 , 336,624 , 46.46 2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit) 256*2^12, 380,389 , 361,369 , 2.02 256*2^21, 358,419 , 345,400 , 13.13 iommu_unmap(): pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 69,88 , 65,85 , 23.23 (AMDv1) 256*2^12, 353,6498 , 331,6029 , 94.94 256*2^21, 373,6014 , 360,5706 , 93.93 2^12, 71,72 , 66,69 , 4.04 (AMDv2) 256*2^12, 228,891 , 206,871 , 76.76 256*2^21, 254,721 , 245,711 , 65.65 2^12, 69,87 , 65,82 , 20.20 (VT-d second stage) 256*2^12, 210,321 , 200,315 , 36.36 256*2^21, 255,349 , 238,342 , 30.30 2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit) 256*2^12, 521,357 , 447,346 , -29.29 256*2^21, 489,358 , 433,345 , -25.25 * Above numbers include additional patches to remove the iommu_pgsize() overheads. gcc 13.3.0, i7-12700 This version provides fairly consistent performance across formats. ARM unmap performance is quite different because this version supports contiguous pages and uses a very different algorithm for unmapping. Though why it is so worse compared to AMDv1 I haven't figured out yet. The per-format commits include a more detailed chart. There is a second branch: https://github.com/jgunthorpe/linux/commits/iommu_pt_all Containing supporting work and future steps: - ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats - RISCV format and RISCV conversion https://github.com/jgunthorpe/linux/commits/iommu_pt_riscv - Support for a DMA incoherent HW page table walker - VT-d second stage format and VT-d conversion https://github.com/jgunthorpe/linux/commits/iommu_pt_vtd - DART v1 & v2 format - Draft of a iommufd 'cut' operation to break down huge pages - A compare test that checks the iommupt formats against the iopgtable interface, including updating AMD to have a working iopgtable and patches to make VT-d have an iopgtable for testing. - A performance test to micro-benchmark map and unmap against iogptable My strategy is to go one by one for the drivers: - AMD driver conversion - RISCV page table and driver - Intel VT-d driver and VTDSS page table - Flushing improvements for RISCV - ARM SMMUv3 And concurrently work on the algorithm side: - debugfs content dump, like VT-d has - Cut support - Increase/Decrease page size support - map/unmap batching - KHO As we make more algorithm improvements the value to convert the drivers increases. This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt v7: - Rebase to v6.18-rc2 - Improve comments and documentation - Add a few missed __sme_sets() for AMD CC - Rename pt_iommu_flush_ops -> pt_iommu_driver_ops VT-D -> VT-d pt_clear_entry -> pt_clear_entries pt_entry_write_is_dirty -> pt_entry_is_write_dirty pt_entry_set_write_clean -> pt_entry_make_write_clean - Tidy some of the map flow into a new function do_map() - Fix ffz64() v6: https://patch.msgid.link/r/0-v6-0fb54a1d9850+36b-iommu_pt_jgg@nvidia.com - Improve comments and documentation - Rename pt_entry_oa_full -> pt_entry_oa_exact pt_has_system_page -> pt_has_system_page_size pt_max_output_address_lg2 -> pt_max_oa_lg2 log2_f*() -> vaf* / oaf* / f*_t pt_item_fully_covered -> pt_entry_fully_covered - Fix missed constant propogation causing division - Consolidate debugging checks to pt_check_install_leaf_args() - Change collect->ignore_mapped to check_mapped - Shuffle some hunks around to more appropriate patches - Two new mini kunit tests v5: https://patch.msgid.link/r/0-v5-116c4948af3d+68091-iommu_pt_jgg@nvidia.com - Text grammar updates and kdoc fixes v4: https://patch.msgid.link/r/0-v4-0d6a6726a372+18959-iommu_pt_jgg@nvidia.com - Rebase on v6.16-rc3 - Integrate the HATS/HATDis changes - Remove 'default n' from kconfig - Remove unused 'PT_FIXED_TOP_LEVEL' - Improve comments and documentation - Fix some compile warnings from kbuild robots v3: https://patch.msgid.link/r/0-v3-a93aab628dbc+521-iommu_pt_jgg@nvidia.com - Rebase on v6.16-rc2 - s/PT_ENTRY_WORD_SIZE/PT_ITEM_WORD_SIZE/s to follow the language better - Comment and documentation updates - Add PT_TOP_PHYS_MASK to help manage alignment restrictions on the top pointer - Add missed force_aperture = true - Make pt_iommu_deinit() take care of the not-yet-inited error case internally as AMD/RISCV/VTD all shared this logic - Change gather_range() into gather_range_pages() so it also deals with the page list. This makes the following cache flushing series simpler - Fix missed update of unmap->unmapped in some error cases - Change clear_contig() to order the gather more logically - Remove goto from the error handling in __map_range_leaf() - s/log2_/oalog2_/ in places where the argument is an oaddr_t - Pass the pts to pt_table_install64/32() - Do not use SIGN_EXTEND for the AMDv2 page table because of Vasant's information on how PASID 0 works. v2: https://patch.msgid.link/r/0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com - AMD driver only, many code changes RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/ Cc: Michael Roth <michael.roth(a)amd.com> Cc: Alexey Kardashevskiy <aik(a)amd.com> Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com> Cc: James Gowans <jgowans(a)amazon.com> Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com> Alejandro Jimenez (1): iommu/amd: Use the generic iommu page table Jason Gunthorpe (14): genpt: Generic Page Table base API genpt: Add Documentation/ files iommupt: Add the basic structure of the iommu implementation iommupt: Add the AMD IOMMU v1 page table format iommupt: Add iova_to_phys op iommupt: Add unmap_pages op iommupt: Add map_pages op iommupt: Add read_and_clear_dirty op iommupt: Add a kunit test for Generic Page Table iommupt: Add a mock pagetable format for iommufd selftest to use iommufd: Change the selftest to use iommupt instead of xarray iommupt: Add the x86 64 bit page table format iommu/amd: Remove AMD io_pgtable support iommupt: Add a kunit test for the IOMMU implementation .clang-format | 1 + Documentation/driver-api/generic_pt.rst | 142 ++ Documentation/driver-api/index.rst | 1 + drivers/iommu/Kconfig | 2 + drivers/iommu/Makefile | 1 + drivers/iommu/amd/Kconfig | 5 +- drivers/iommu/amd/Makefile | 2 +- drivers/iommu/amd/amd_iommu.h | 1 - drivers/iommu/amd/amd_iommu_types.h | 110 +- drivers/iommu/amd/io_pgtable.c | 577 -------- drivers/iommu/amd/io_pgtable_v2.c | 370 ------ drivers/iommu/amd/iommu.c | 538 ++++---- drivers/iommu/generic_pt/.kunitconfig | 13 + drivers/iommu/generic_pt/Kconfig | 68 + drivers/iommu/generic_pt/fmt/Makefile | 26 + drivers/iommu/generic_pt/fmt/amdv1.h | 415 ++++++ drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 + drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 + drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 + drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 + drivers/iommu/generic_pt/fmt/iommu_template.h | 48 + drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 11 + drivers/iommu/generic_pt/fmt/x86_64.h | 259 ++++ drivers/iommu/generic_pt/iommu_pt.h | 1162 +++++++++++++++++ drivers/iommu/generic_pt/kunit_generic_pt.h | 713 ++++++++++ drivers/iommu/generic_pt/kunit_iommu.h | 183 +++ drivers/iommu/generic_pt/kunit_iommu_pt.h | 487 +++++++ drivers/iommu/generic_pt/pt_common.h | 358 +++++ drivers/iommu/generic_pt/pt_defs.h | 329 +++++ drivers/iommu/generic_pt/pt_fmt_defaults.h | 233 ++++ drivers/iommu/generic_pt/pt_iter.h | 636 +++++++++ drivers/iommu/generic_pt/pt_log2.h | 122 ++ drivers/iommu/io-pgtable.c | 4 - drivers/iommu/iommufd/Kconfig | 1 + drivers/iommu/iommufd/iommufd_test.h | 11 +- drivers/iommu/iommufd/selftest.c | 438 +++---- include/linux/generic_pt/common.h | 167 +++ include/linux/generic_pt/iommu.h | 271 ++++ include/linux/io-pgtable.h | 2 - include/linux/irqchip/riscv-imsic.h | 3 +- tools/testing/selftests/iommu/iommufd.c | 60 +- tools/testing/selftests/iommu/iommufd_utils.h | 12 + 42 files changed, 6237 insertions(+), 1612 deletions(-) create mode 100644 Documentation/driver-api/generic_pt.rst delete mode 100644 drivers/iommu/amd/io_pgtable.c delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c create mode 100644 drivers/iommu/generic_pt/.kunitconfig create mode 100644 drivers/iommu/generic_pt/Kconfig create mode 100644 drivers/iommu/generic_pt/fmt/Makefile create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h create mode 100644 drivers/iommu/generic_pt/iommu_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h create mode 100644 drivers/iommu/generic_pt/pt_common.h create mode 100644 drivers/iommu/generic_pt/pt_defs.h create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h create mode 100644 drivers/iommu/generic_pt/pt_iter.h create mode 100644 drivers/iommu/generic_pt/pt_log2.h create mode 100644 include/linux/generic_pt/common.h create mode 100644 include/linux/generic_pt/iommu.h base-commit: bf3db0366052dcdf7dea89a07929b690aac59b15 -- 2.43.0

2 months, 2 weeks

5
38
0 0

[PATCH v3] selftests/run_kselftest.sh: exit with error if tests fail

by Brendan Jackman

Parsing KTAP is quite an inconvenience, but most of the time the thing you really want to know is "did anything fail"? Let's give the user the his information without them needing to parse anything. Because of the use of subshells and namespaces, this needs to be communicated via a file. Just write arbitrary data into the file and treat non-empty content as a signal that something failed. In case any user depends on the current behaviour, such as running this from a script with `set -e` and parsing the result for failures afterwards, add a flag they can set to get the old behaviour, namely --no-error-on-fail. Signed-off-by: Brendan Jackman <jackmanb(a)google.com> --- Changes in v3: - Fixed quoting - Link to v2: https://lore.kernel.org/r/20251014-b4-ksft-error-on-fail-v2-1-b3e2657237b8@… Changes in v2: - Fixed bug in report_failure() - Made error-on-fail the default - Link to v1: https://lore.kernel.org/r/20251007-b4-ksft-error-on-fail-v1-1-71bf058f5662@… --- tools/testing/selftests/kselftest/runner.sh | 14 ++++++++++---- tools/testing/selftests/run_kselftest.sh | 14 ++++++++++++++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh index 2c3c58e65a419f5ee8d7dc51a37671237a07fa0b..3a62039fa6217f3453423ff011575d0a1eb8c275 100644 --- a/tools/testing/selftests/kselftest/runner.sh +++ b/tools/testing/selftests/kselftest/runner.sh @@ -44,6 +44,12 @@ tap_timeout() fi } +report_failure() +{ + echo "not ok $*" + echo "$*" >> "$kselftest_failures_file" +} + run_one() { DIR="$1" @@ -105,7 +111,7 @@ run_one() echo "# $TEST_HDR_MSG" if [ ! -e "$TEST" ]; then echo "# Warning: file $TEST is missing!" - echo "not ok $test_num $TEST_HDR_MSG" + report_failure "$test_num $TEST_HDR_MSG" else if [ -x /usr/bin/stdbuf ]; then stdbuf="/usr/bin/stdbuf --output=L " @@ -123,7 +129,7 @@ run_one() interpreter=$(head -n 1 "$TEST" | cut -c 3-) cmd="$stdbuf $interpreter ./$BASENAME_TEST" else - echo "not ok $test_num $TEST_HDR_MSG" + report_failure "$test_num $TEST_HDR_MSG" return fi fi @@ -137,9 +143,9 @@ run_one() echo "ok $test_num $TEST_HDR_MSG # SKIP" elif [ $rc -eq $timeout_rc ]; then \ echo "#" - echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT $kselftest_timeout seconds" + report_failure "$test_num $TEST_HDR_MSG # TIMEOUT $kselftest_timeout seconds" else - echo "not ok $test_num $TEST_HDR_MSG # exit=$rc" + report_failure "$test_num $TEST_HDR_MSG # exit=$rc" fi) cd - >/dev/null fi diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 0443beacf3621ae36cb12ffd57f696ddef3526b5..d4be97498b32e975c63a1167d3060bdeba674c8c 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -33,6 +33,7 @@ Usage: $0 [OPTIONS] -c | --collection COLLECTION Run all tests from COLLECTION -l | --list List the available collection:test entries -d | --dry-run Don't actually run any tests + -f | --no-error-on-fail Don't exit with an error just because tests failed -n | --netns Run each test in namespace -h | --help Show this usage info -o | --override-timeout Number of seconds after which we timeout @@ -44,6 +45,7 @@ COLLECTIONS="" TESTS="" dryrun="" kselftest_override_timeout="" +ERROR_ON_FAIL=true while true; do case "$1" in -s | --summary) @@ -65,6 +67,9 @@ while true; do -d | --dry-run) dryrun="echo" shift ;; + -f | --no-error-on-fail) + ERROR_ON_FAIL=false + shift ;; -n | --netns) RUN_IN_NETNS=1 shift ;; @@ -105,9 +110,18 @@ if [ -n "$TESTS" ]; then available="$(echo "$valid" | sed -e 's/ /\n/g')" fi +kselftest_failures_file="$(mktemp --tmpdir kselftest-failures-XXXXXX)" +export kselftest_failures_file + collections=$(echo "$available" | cut -d: -f1 | sort | uniq) for collection in $collections ; do [ -w /dev/kmsg ] && echo "kselftest: Running tests in $collection" >> /dev/kmsg tests=$(echo "$available" | grep "^$collection:" | cut -d: -f2) ($dryrun cd "$collection" && $dryrun run_many $tests) done + +failures="$(cat "$kselftest_failures_file")" +rm "$kselftest_failures_file" +if "$ERROR_ON_FAIL" && [ "$failures" ]; then + exit 1 +fi --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20251007-b4-ksft-error-on-fail-0c2cb3246041 Best regards, -- Brendan Jackman <jackmanb(a)google.com>

2 months, 2 weeks

2
3
0 0

[PATCH 0/6] KVM: LoongArch: selftests: Add timer test case

by Bibo Mao

This patch set adds timer test case for LoongArch system, it is based on common arch_timer test case. And it includes time counter function, one-shot/period mode interrupt, and software emulated timer function test. Bibo Mao (6): KVM: LoongArch: selftests: Add system registers save and restore on exception KVM: LoongArch: selftests: Add exception handler register interface KVM: LoongArch: selftests: Add basic interfaces KVM: LoongArch: selftests: Add timer test case with one-shot mode KVM: LoongArch: selftests: Add period mode timer and time counter test KVM: LoongArch: selftests: Add SW emulated timer test tools/testing/selftests/kvm/Makefile.kvm | 10 +- .../kvm/include/loongarch/arch_timer.h | 84 ++++++++ .../kvm/include/loongarch/processor.h | 81 +++++++- .../selftests/kvm/lib/loongarch/exception.S | 6 + .../selftests/kvm/lib/loongarch/processor.c | 38 +++- .../selftests/kvm/loongarch/arch_timer.c | 187 ++++++++++++++++++ 6 files changed, 400 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/loongarch/arch_timer.h create mode 100644 tools/testing/selftests/kvm/loongarch/arch_timer.c base-commit: e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6 -- 2.39.3

2 months, 2 weeks

1
6
0 0

[GIT PULL] kselftest fixes update for Linux 6.18-rc4

by Shuah Khan

Hi Linus, Please pull the following kselftest fixes update for Linux 6.18-rc4. Fixes build warning in cachestat found during clang build and adds tmpshmcstat to .gitignore. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787: Linux 6.18-rc1 (2025-10-12 13:42:36 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.18-rc4 for you to fetch changes up to 920aa3a7705a061cb3004572d8b7932b54463dbf: selftests: cachestat: Fix warning on declaration under label (2025-10-22 09:23:18 -0600) ---------------------------------------------------------------- linux_kselftest-fixes-6.18-rc4 Fixes build warning in cachestat found during clang build and adds tmpshmcstat to .gitignore. ---------------------------------------------------------------- Madhur Kumar (1): selftests/cachestat: add tmpshmcstat file to .gitignore Sidharth Seela (1): selftests: cachestat: Fix warning on declaration under label tools/testing/selftests/cachestat/.gitignore | 1 + tools/testing/selftests/cachestat/test_cachestat.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) ----------------------------------------------------------------

2 months, 2 weeks

2
1
0 0

[GIT PULL] kunit fixes update for Linux 6.18-rc4

by Shuah Khan

Hi Linus, Please pull the following kunit fixes update for Linux 6.18-rc4. Fixes log overwrite in param_tests and fixes incorrect cast of priv pointer in test_dev_action(). Updates email address for Rae Moar in MAINTAINERS KUnit entry. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787: Linux 6.18-rc1 (2025-10-12 13:42:36 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-fixes-6.18-rc4 for you to fetch changes up to f3903ec76ae6afcdba0347681d1dda005fb145cd: MAINTAINERS: Update KUnit email address for Rae Moar (2025-10-29 14:57:54 -0600) ---------------------------------------------------------------- linux_kselftest-kunit-fixes-6.18-rc4 Fixes log overwrite in param_tests and fixes incorrect cast of priv pointer in test_dev_action(). Updates email address for Rae Moar in MAINTAINERS KUnit entry. ---------------------------------------------------------------- Carlos Llamas (1): kunit: prevent log overwrite in param_tests Florian Schmaus (1): kunit: test_dev_action: Correctly cast 'priv' pointer to long* Rae Moar (1): MAINTAINERS: Update KUnit email address for Rae Moar .mailmap | 1 + MAINTAINERS | 2 +- lib/kunit/kunit-test.c | 2 +- lib/kunit/test.c | 3 ++- 4 files changed, 5 insertions(+), 3 deletions(-) ----------------------------------------------------------------

2 months, 2 weeks

2
1
0 0

[PATCH bpf 0/2] use rqspinlock for bpf lru map

by Menglong Dong

Convert the raw_spinlock to rqspinlock to fix the possible deadlock in [1] for bpf lru map. Meanwhile, add the testcase for the deadlock. Link: https://lore.kernel.org/bpf/CAEf4BzbTJCUx0D=zjx6+5m5iiGhwLzaP94hnw36ZMDHAf4… Menglong Dong (2): bpf: use rqspinlock for lru map selftests/bpf: test map deadlock caused by NMI kernel/bpf/bpf_lru_list.c | 47 +++--- kernel/bpf/bpf_lru_list.h | 5 +- .../selftests/bpf/prog_tests/map_deadlock.c | 134 ++++++++++++++++++ .../selftests/bpf/progs/map_deadlock.c | 52 +++++++ 4 files changed, 217 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/map_deadlock.c create mode 100644 tools/testing/selftests/bpf/progs/map_deadlock.c -- 2.51.2

2 months, 2 weeks

3
5
0 0

[PATCH net-next v3] selftests: drv-net: replace the nsim ring test with a drv-net one

by Jakub Kicinski

We are trying to move away from netdevsim-only tests and towards tests which can be run both against netdevsim and real drivers. Replace the simple bash script we have for checking ethtool -g/-G on netdevsim with a Python test tweaking those params as well as channel count. The new test is not exactly equivalent to the netdevsim one, but real drivers don't often support random ring sizes, let alone modifying max values via debugfs. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- v3: - let ring sizes fall all the way down to 0 v2: https://lore.kernel.org/20251027192131.2053792-1-kuba@kernel.org - add the new test to Makefile and remove the old one turns out NIPA checking for Makefile presence was busted v1: https://lore.kernel.org/20251024215552.1249838-1-kuba@kernel.org CC: andrew(a)lunn.ch CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/netdevsim/Makefile | 1 - .../drivers/net/netdevsim/ethtool-ring.sh | 85 --------- .../selftests/drivers/net/ring_reconfig.py | 167 ++++++++++++++++++ 4 files changed, 168 insertions(+), 86 deletions(-) delete mode 100755 tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh create mode 100755 tools/testing/selftests/drivers/net/ring_reconfig.py diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 6e41635bd55a..68e0bb603a9d 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -22,6 +22,7 @@ TEST_PROGS := \ ping.py \ psp.py \ queues.py \ + ring_reconfig.py \ shaper.py \ stats.py \ xdp.py \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools/testing/selftests/drivers/net/netdevsim/Makefile index daf51113c827..833abd8e6fdc 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -8,7 +8,6 @@ TEST_PROGS := \ ethtool-features.sh \ ethtool-fec.sh \ ethtool-pause.sh \ - ethtool-ring.sh \ fib.sh \ fib_notifications.sh \ hw_stats_l3.sh \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh deleted file mode 100755 index c969559ffa7a..000000000000 --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-ring.sh +++ /dev/null @@ -1,85 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0-only - -source ethtool-common.sh - -function get_value { - local query="${SETTINGS_MAP[$1]}" - - echo $(ethtool -g $NSIM_NETDEV | \ - tail -n +$CURR_SETT_LINE | \ - awk -F':' -v pattern="$query:" '$0 ~ pattern {gsub(/[\t ]/, "", $2); print $2}') -} - -function update_current_settings { - for key in ${!SETTINGS_MAP[@]}; do - CURRENT_SETTINGS[$key]=$(get_value $key) - done - echo ${CURRENT_SETTINGS[@]} -} - -if ! ethtool -h | grep -q set-ring >/dev/null; then - echo "SKIP: No --set-ring support in ethtool" - exit 4 -fi - -NSIM_NETDEV=$(make_netdev) - -set -o pipefail - -declare -A SETTINGS_MAP=( - ["rx"]="RX" - ["rx-mini"]="RX Mini" - ["rx-jumbo"]="RX Jumbo" - ["tx"]="TX" -) - -declare -A EXPECTED_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -declare -A CURRENT_SETTINGS=( - ["rx"]="" - ["rx-mini"]="" - ["rx-jumbo"]="" - ["tx"]="" -) - -MAX_VALUE=$((RANDOM % $((2**32-1)))) -RING_MAX_LIST=$(ls $NSIM_DEV_DFS/ethtool/ring/) - -for ring_max_entry in $RING_MAX_LIST; do - echo $MAX_VALUE > $NSIM_DEV_DFS/ethtool/ring/$ring_max_entry -done - -CURR_SETT_LINE=$(ethtool -g $NSIM_NETDEV | grep -i -m1 -n 'Current hardware settings' | cut -f1 -d:) - -# populate the expected settings map -for key in ${!SETTINGS_MAP[@]}; do - EXPECTED_SETTINGS[$key]=$(get_value $key) -done - -# test -for key in ${!SETTINGS_MAP[@]}; do - value=$((RANDOM % $MAX_VALUE)) - - ethtool -G $NSIM_NETDEV "$key" "$value" - - EXPECTED_SETTINGS[$key]="$value" - expected=${EXPECTED_SETTINGS[@]} - current=$(update_current_settings) - - check $? "$current" "$expected" - set +x -done - -if [ $num_errors -eq 0 ]; then - echo "PASSED all $((num_passes)) checks" - exit 0 -else - echo "FAILED $num_errors/$((num_errors+num_passes)) checks" - exit 1 -fi diff --git a/tools/testing/selftests/drivers/net/ring_reconfig.py b/tools/testing/selftests/drivers/net/ring_reconfig.py new file mode 100755 index 000000000000..f9530a8b0856 --- /dev/null +++ b/tools/testing/selftests/drivers/net/ring_reconfig.py @@ -0,0 +1,167 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Test channel and ring size configuration via ethtool (-L / -G). +""" + +from lib.py import ksft_run, ksft_exit, ksft_pr +from lib.py import ksft_eq +from lib.py import NetDrvEpEnv, EthtoolFamily, GenerateTraffic +from lib.py import defer, NlError + + +def channels(cfg) -> None: + """ + Twiddle channel counts in various combinations of parameters. + We're only looking for driver adhering to the requested config + if the config is accepted and crashes. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx", "tx", "combined"] + mixes = [{"combined"}, {"rx", "tx"}, {"rx", "combined"}, {"tx", "combined"}, + {"rx", "tx", "combined"},] + + # Get the set of keys that device actually supports + restore = {} + supported = set() + for key in all_keys: + if key + "-max" in chans: + supported.add(key) + restore |= {key + "-count": chans[key + "-count"]} + + defer(cfg.eth.channels_set, ehdr | restore) + + def test_config(config): + try: + cfg.eth.channels_set(ehdr | config) + get = cfg.eth.channels_get(ehdr) + for k, v in config.items(): + ksft_eq(get.get(k, 0), v) + except NlError as e: + failed.append(mix) + ksft_pr("Can't set", config, e) + else: + ksft_pr("Okay", config) + + failed = [] + for mix in mixes: + if not mix.issubset(supported): + continue + + # Set all the values in the mix to 1, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = 1 if key in mix else 0 + test_config(config) + + for mix in mixes: + if not mix.issubset(supported): + continue + if mix in failed: + continue + + # Set all the values in the mix to max, other supported to 0 + config = {} + for key in all_keys: + config[key + "-count"] = chans[key + '-max'] if key in mix else 0 + test_config(config) + + +def _configure_min_ring_cnt(cfg) -> None: + """ Try to configure a single Rx/Tx ring. """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + chans = cfg.eth.channels_get(ehdr) + + all_keys = ["rx-count", "tx-count", "combined-count"] + restore = {} + config = {} + for key in all_keys: + if key in chans: + restore[key] = chans[key] + config[key] = 0 + + if chans.get('combined-count', 0) > 1: + config['combined-count'] = 1 + elif chans.get('rx-count', 0) > 1 and chans.get('tx-count', 0) > 1: + config['tx-count'] = 1 + config['rx-count'] = 1 + else: + # looks like we're already on 1 channel + return + + cfg.eth.channels_set(ehdr | config) + defer(cfg.eth.channels_set, ehdr | restore) + + +def ringparam(cfg) -> None: + """ + Tweak the ringparam configuration. Try to run some traffic over min + ring size to make sure it actually functions. + """ + ehdr = {'header':{'dev-index': cfg.ifindex}} + rings = cfg.eth.rings_get(ehdr) + + restore = {} + maxes = {} + params = set() + for key in rings.keys(): + if 'max' in key: + param = key[:-4] + maxes[param] = rings[key] + params.add(param) + restore[param] = rings[param] + + defer(cfg.eth.rings_set, ehdr | restore) + + # Speed up the reconfig by configuring just one ring + _configure_min_ring_cnt(cfg) + + # Try to reach min on all settings + for param in params: + val = rings[param] + while True: + try: + cfg.eth.rings_set({'header':{'dev-index': cfg.ifindex}, + param: val // 2}) + if val == 0: + break + val //= 2 + except NlError: + break + + get = cfg.eth.rings_get(ehdr) + ksft_eq(get[param], val) + + ksft_pr(f"Reached min for '{param}' at {val} (max {rings[param]})") + + GenerateTraffic(cfg).wait_pkts_and_stop(10000) + + # Try max across all params, if the driver supports large rings + # this may OOM so we ignore errors + try: + ksft_pr("Applying max settings") + config = {p: maxes[p] for p in params} + cfg.eth.rings_set(ehdr | config) + except NlError as e: + ksft_pr("Can't set max params", config, e) + else: + GenerateTraffic(cfg).wait_pkts_and_stop(10000) + + +def main() -> None: + """ Ksft boiler plate main """ + + with NetDrvEpEnv(__file__) as cfg: + cfg.eth = EthtoolFamily() + + ksft_run([channels, + ringparam], + args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main() -- 2.51.0

2 months, 2 weeks

4
3
0 0

[PATCH bpf-next v3 0/4] selftests/bpf: convert test_tc_tunnel.sh to test_progs

by Alexis Lothoré (eBPF Foundation)

Hello, this is the v3 of test_tc_tunnel conversion into test_progs framework. This new revision: - fixes a few issues spotted by the bot reviewer - removes any test ensuring connection failure (and so depending on a timout) to keep the execution time reasonable test_tc_tunnel.sh tests a variety of tunnels based on BPF: packets are encapsulated by a BPF program on the client egress. We then check that those packets can be decapsulated on server ingress side, either thanks to kernel-based or BPF-based decapsulation. Those tests are run thanks to two veths in two dedicated namespaces. - patches 1 and 2 are preparatory patches - patch 3 introduce tc_tunnel test into test_progs - patch 4 gets rid of the test_tc_tunnel.sh script The new test has been executed both in some x86 local qemu machine, as well as in CI: # ./test_progs -a tc_tunnel #454/1 tc_tunnel/ipip_none:OK #454/2 tc_tunnel/ipip6_none:OK #454/3 tc_tunnel/ip6tnl_none:OK #454/4 tc_tunnel/sit_none:OK #454/5 tc_tunnel/vxlan_eth:OK #454/6 tc_tunnel/ip6vxlan_eth:OK #454/7 tc_tunnel/gre_none:OK #454/8 tc_tunnel/gre_eth:OK #454/9 tc_tunnel/gre_mpls:OK #454/10 tc_tunnel/ip6gre_none:OK #454/11 tc_tunnel/ip6gre_eth:OK #454/12 tc_tunnel/ip6gre_mpls:OK #454/13 tc_tunnel/udp_none:OK #454/14 tc_tunnel/udp_eth:OK #454/15 tc_tunnel/udp_mpls:OK #454/16 tc_tunnel/ip6udp_none:OK #454/17 tc_tunnel/ip6udp_eth:OK #454/18 tc_tunnel/ip6udp_mpls:OK #454 tc_tunnel:OK Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com> --- Changes in v3: - remove systematic "connection must fail" test part of each subtest - also remove kernel-based decap test for subtests supposed to fail on kernel side - fix potential fd leak if connection structure allocation fails - fix wrong early return in run_test - Link to v2: https://lore.kernel.org/r/20251022-tc_tunnel-v2-0-a44a0bd52902@bootlin.com Changes in v2: - declare a single tc_prog_attach helper rather than multiple, intermediate helpers - move the new helper to network_helpers.c rather than a dedicated file - do not rename existing tc_helpers.c/h pair (drop patch) - keep only the minimal set of needed NS switches - Link to v1: https://lore.kernel.org/r/20251017-tc_tunnel-v1-0-2d86808d86b2@bootlin.com --- Alexis Lothoré (eBPF Foundation) (4): selftests/bpf: add tc helpers selftests/bpf: make test_tc_tunnel.bpf.c compatible with big endian platforms selftests/bpf: integrate test_tc_tunnel.sh tests into test_progs selftests/bpf: remove test_tc_tunnel.sh tools/testing/selftests/bpf/Makefile | 1 - tools/testing/selftests/bpf/network_helpers.c | 45 ++ tools/testing/selftests/bpf/network_helpers.h | 16 + .../selftests/bpf/prog_tests/test_tc_tunnel.c | 674 +++++++++++++++++++++ .../testing/selftests/bpf/prog_tests/test_tunnel.c | 107 +--- tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 95 ++- tools/testing/selftests/bpf/test_tc_tunnel.sh | 320 ---------- 7 files changed, 790 insertions(+), 468 deletions(-) --- base-commit: ecdeefe65eaeb82a1262e20401ba750b8c9e0b97 change-id: 20250811-tc_tunnel-c61342683f18 Best regards, -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com

2 months, 2 weeks

4
9
0 0

[PATCH] KVM: arm64: selftests: Filter ZCR_EL2 in get-reg-list

by Mark Brown

get-reg-list includes ZCR_EL2 in the list of EL2 registers that it looks for when NV is enabled but does not have any feature gate for this register, meaning that testing any combination of features that includes EL2 but does not include SVE will result in a test failure due to a missing register being reported: | The following lines are missing registers: | | ARM64_SYS_REG(3, 4, 1, 2, 0), Add ZCR_EL2 to feat_id_regs so that the test knows not to expect to see it without SVE being enabled. Fixes: 3a90b6f27964 ("KVM: arm64: selftests: get-reg-list: Add base EL2 registers") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/kvm/arm64/get-reg-list.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/kvm/arm64/get-reg-list.c b/tools/testing/selftests/kvm/arm64/get-reg-list.c index c9b84eeaab6b..7ae26ce875ad 100644 --- a/tools/testing/selftests/kvm/arm64/get-reg-list.c +++ b/tools/testing/selftests/kvm/arm64/get-reg-list.c @@ -68,6 +68,7 @@ static struct feature_id_reg feat_id_regs[] = { REG_FEAT(VNCR_EL2, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY), REG_FEAT(CNTHV_CTL_EL2, ID_AA64MMFR1_EL1, VH, IMP), REG_FEAT(CNTHV_CVAL_EL2,ID_AA64MMFR1_EL1, VH, IMP), + REG_FEAT(ZCR_EL2, ID_AA64PFR0_EL1, SVE, IMP), }; bool filter_reg(__u64 reg) --- base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada change-id: 20251023-kvm-arm64-get-reg-list-zcr-el2-c43090e11f23 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 2 weeks

2
1
0 0

[PATCH] KVM: arm64: selftests: Add SCTLR2_EL2 to get-reg-list

by Mark Brown

We recently added support for SCTLR2_EL2 to the kernel but did not add it to get-reg-list, resulting in it reporting the missing register when it is available. Add it. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/kvm/arm64/get-reg-list.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/kvm/arm64/get-reg-list.c b/tools/testing/selftests/kvm/arm64/get-reg-list.c index c9b84eeaab6b..2abef0a86d46 100644 --- a/tools/testing/selftests/kvm/arm64/get-reg-list.c +++ b/tools/testing/selftests/kvm/arm64/get-reg-list.c @@ -63,6 +63,7 @@ static struct feature_id_reg feat_id_regs[] = { REG_FEAT(HDFGWTR2_EL2, ID_AA64MMFR0_EL1, FGT, FGT2), REG_FEAT(ZCR_EL2, ID_AA64PFR0_EL1, SVE, IMP), REG_FEAT(SCTLR2_EL1, ID_AA64MMFR3_EL1, SCTLRX, IMP), + REG_FEAT(SCTLR2_EL2, ID_AA64MMFR3_EL1, SCTLRX, IMP), REG_FEAT(VDISR_EL2, ID_AA64PFR0_EL1, RAS, IMP), REG_FEAT(VSESR_EL2, ID_AA64PFR0_EL1, RAS, IMP), REG_FEAT(VNCR_EL2, ID_AA64MMFR4_EL1, NV_frac, NV2_ONLY), @@ -718,6 +719,7 @@ static __u64 el2_regs[] = { SYS_REG(VMPIDR_EL2), SYS_REG(SCTLR_EL2), SYS_REG(ACTLR_EL2), + SYS_REG(SCTLR2_EL2), SYS_REG(HCR_EL2), SYS_REG(MDCR_EL2), SYS_REG(CPTR_EL2), --- base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada change-id: 20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-222e463e8aaf Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 2 weeks

2
1
0 0

[PATCH v2] KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress

by Maximilian Dittgen

Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID, rather than a physical redistributor address, for its RDbase command argument. As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates over CPU IDs in the range [0, nr_cpus), passing them as the RDbase vcpu_id argument to its_send_mapc_cmd(). However, its_encode_target() in the its_send_mapc_cmd() selftest handler expects RDbase arguments to be formatted with a 16 bit offset, as shown by the 16-bit target_addr right shift its implementation: its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16) At the moment, all CPU IDs passed into its_send_mapc_cmd() have no offset, therefore becoming 0x0 after the bit shift. Thus, when vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c, it always interprets the RDbase target CPU as CPU 0. All interrupts sent to collections will be processed by vCPU 0, which defeats the purpose of this multi-vCPU test. Fix by creating procnum_to_rdbase() helper function, which left-shifts the vCPU parameter received by its_send_mapc_cmd 16 bits before passing it to its_encode_target for encoding. Signed-off-by: Maximilian Dittgen <mdittgen(a)amazon.de> --- v2: Refactor the vcpu_id left shift into procnum_to_rdbase() helper. Rename and rewrite commit to reflect root cause of bug which was improper RDbase formatting, not that MAPC expects a physical address as the RDbase parameter. To validate the patch, I added the following debug code at the top of vgic_its_cmd_handle_mapc: u64 raw_cmd2 = le64_to_cpu(its_cmd[2]); u32 target_addr = its_cmd_get_target_addr(its_cmd); kvm_info("MAPC: coll_id=%d, raw_cmd[2]=0x%llx, parsed_target=%u\n", coll_id, raw_cmd2, target_addr); vcpu = kvm_get_vcpu_by_id(kvm, its_cmd_get_target_addr(its_cmd)); kvm_info("MAPC: coll_id=%d, vcpu_id=%d\n", coll_id, vcpu ? vcpu->vcpu_id : -1); I then ran `./vgic_lpi_stress -v 3` to trigger the stress selftest with 3 vCPUs. Before the patch, the debug logs read: kvm [20832]: MAPC: coll_id=0, raw_cmd[2]=0x8000000000000000, parsed_target=0 kvm [20832]: MAPC: coll_id=0, vcpu_id=0 kvm [20832]: MAPC: coll_id=1, raw_cmd[2]=0x8000000000000001, parsed_target=0 kvm [20832]: MAPC: coll_id=1, vcpu_id=0 kvm [20832]: MAPC: coll_id=2, raw_cmd[2]=0x8000000000000002, parsed_target=0 kvm [20832]: MAPC: coll_id=2, vcpu_id=0 Note the last bit of the cmd string reflects the collection ID, but the rest of the cmd string reads 0. The handler parses out vCPU 0 for all 3 mapc calls. After the patch, the debug logs read: kvm [20019]: MAPC: coll_id=0, raw_cmd[2]=0x8000000000000000, parsed_target=0 kvm [20019]: MAPC: coll_id=0, vcpu_id=0 kvm [20019]: MAPC: coll_id=1, raw_cmd[2]=0x8000000000010001, parsed_target=1 kvm [20019]: MAPC: coll_id=1, vcpu_id=1 kvm [20019]: MAPC: coll_id=2, raw_cmd[2]=0x8000000000020002, parsed_target=2 kvm [20019]: MAPC: coll_id=2, vcpu_id=2 Note that the target vcpu and target collection are both visible in the cmd string. The handler parses out the correct vCPU for all 3 mapc calls. ___ tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c index 09f270545646..0e2f8ed90f30 100644 --- a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c +++ b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c @@ -15,6 +15,8 @@ #include "gic_v3.h" #include "processor.h" +#define GITS_COLLECTION_TARGET_SHIFT 16 + static u64 its_read_u64(unsigned long offset) { return readq_relaxed(GITS_BASE_GVA + offset); @@ -163,6 +165,11 @@ static void its_encode_collection(struct its_cmd_block *cmd, u16 col) its_mask_encode(&cmd->raw_cmd[2], col, 15, 0); } +static u64 procnum_to_rdbase(u32 vcpu_id) +{ + return vcpu_id << GITS_COLLECTION_TARGET_SHIFT; +} + #define GITS_CMDQ_POLL_ITERATIONS 0 static void its_send_cmd(void *cmdq_base, struct its_cmd_block *cmd) @@ -217,7 +224,7 @@ void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool val its_encode_cmd(&cmd, GITS_CMD_MAPC); its_encode_collection(&cmd, collection_id); - its_encode_target(&cmd, vcpu_id); + its_encode_target(&cmd, procnum_to_rdbase(vcpu_id)); its_encode_valid(&cmd, valid); its_send_cmd(cmdq_base, &cmd); -- 2.50.1 (Apple Git-155) Amazon Web Services Development Center Germany GmbH Tamara-Danz-Str. 13 10243 Berlin Geschaeftsfuehrung: Christian Schlaeger Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597

2 months, 2 weeks

2
1
0 0

[PATCH 0/3] KVM: selftests: arm64: Improve diagnostics from set_id_regs

by Mark Brown

While debugging issues related to aarch64 only systems I ran into speedbumps due to the lack of detail in the results reported when the guest register read and reset value preservation tests were run, they generated an immediately fatal assert without indicating which register was being tested. Update these tests to report a result per register, making it much easier to see what the problem being reported is. A similar, though less severe, issue exists with the validation of the individual bitfields in registers due to the use of immediately fatal asserts. Update those asserts to be standard kselftest reports. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (3): KVM: selftests: arm64: Report set_id_reg reads of test registers as tests KVM: selftests: arm64: Report register reset tests individually KVM: selftests: arm64: Make set_id_regs bitfield validatity checks non-fatal tools/testing/selftests/kvm/arm64/set_id_regs.c | 108 ++++++++++++++++++------ 1 file changed, 82 insertions(+), 26 deletions(-) --- base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada change-id: 20251028-kvm-arm64-set-id-regs-aarch64-ebb77969401c Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 2 weeks

1
3
0 0

[PATCH bpf-next v3 0/3] bpf: Add overwrite mode for BPF ring buffer

by Xu Kuohai

When the BPF ring buffer is full, a new event cannot be recorded until one or more old events are consumed to make enough space for it. In cases such as fault diagnostics, where recent events are more useful than older ones, this mechanism may lead to critical events being lost. So add overwrite mode for BPF ring buffer to address it. In this mode, the new event overwrites the oldest event when the buffer is full. v3: - remove half-round wakeup, drop unnecessary min in ringbuf_avail_data_sz(), switch to smp_load_acquire, update tests and fix typos, etc (Andrii) - rebase and re-collect performance data v2: https://lore.kernel.org/bpf/20250905150641.2078838-1-xukuohai@huaweicloud.c… - remove libbpf changes (Andrii) - update overwrite benchmark v1: https://lore.kernel.org/bpf/20250804022101.2171981-1-xukuohai@huaweicloud.c… Xu Kuohai (3): bpf: Add overwrite mode for BPF ring buffer selftests/bpf: Add overwrite mode test for BPF ring buffer selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer include/uapi/linux/bpf.h | 4 + kernel/bpf/ringbuf.c | 109 +++++++++++++++--- tools/include/uapi/linux/bpf.h | 4 + tools/testing/selftests/bpf/Makefile | 3 +- .../selftests/bpf/benchs/bench_ringbufs.c | 66 ++++++++++- .../bpf/benchs/run_bench_ringbufs.sh | 4 + .../selftests/bpf/prog_tests/ringbuf.c | 64 ++++++++++ .../selftests/bpf/progs/ringbuf_bench.c | 11 ++ .../bpf/progs/test_ringbuf_overwrite.c | 98 ++++++++++++++++ 9 files changed, 337 insertions(+), 26 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c -- 2.43.0

2 months, 2 weeks

2
7
0 0

[PATCH bpf-next v6 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework

by Bastien Curutchet (eBPF Foundation)

Hi all, The test_xsk.sh script covers many AF_XDP use cases. The tests it runs are defined in xksxceiver.c. Since this script is used to test real hardware, the goal here is to leave it as it is, and only integrate the tests that run on veth peers into the test_progs framework. I've looked into what could improve the speed in the CI: - some tests are skipped when run on veth peers in a VM (because they rely on huge page allocation or HW rings). This skipping logic still takes some time and can be easily avoided. - the TEARDOWN test is quite long (several seconds on its own) because it runs the same test 10 times in a row to ensure the teardown process works properly With theses tests fully skipped in the CI and the veth setup done only once for each mode (DRV / SKB), the execution time is reduced to about 5 seconds on my setup. ``` $ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk [...] real 0m 5.04s user 0m 0.38s sys 0m 1.61s ``` It still feels a bit long, but there are 24 tests run in both DRV and SKB modes which means around 100ms for each one. I'm not sure I can make it much faster without randomizing the tests so that not all of them run in every CI execution. PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the tests available to test_progs. PATCH 2 to 7 fix small issues in the current test PATCH 8 to 13 handle all errors to release resources instead of calling exit() when any error occurs. PATCH 14 isolates the tests that won't fit in the CI PATCH 15 integrates the CI tests to the test_progs framework Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com> --- Changes in v6: - Setup veth peer once for each mode instead of once for each substest - Rename the 'flaky' table 'skip-ci' table and move the automatically skipped and the longest tests into it - Link to v5: https://lore.kernel.org/r/20251016-xsk-v5-0-662c95eb8005@bootlin.com Changes in v5: - Rebase on latest bpf-next_base - Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table - Add Maciej's reviewed-by - Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com Changes in v4: - Fix test_xsk.sh's summary report. - Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build. - Split old PATCH 3 in two patches. The first one fixes testapp_stats_rx_dropped(), the second one fixes testapp_xdp_shared_umem(). The unecessary frees (in testapp_stats_rx_full() and testapp_stats_fill_empty() are removed) - Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com Changes in v3: - Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf: Fix count write in testapp_xdp_metadata_copy()"). - Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests - Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com Changes in v2: - Rebase on the latest bpf-next_base and integrate the newly added tests to the work (adjust_tail* and tx_queue_consumer tests) - Re-order patches to split xkxceiver sooner. - Fix the bug reported by Maciej. - Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1, 7 and 8) - Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com --- Bastien Curutchet (eBPF Foundation) (15): selftests/bpf: test_xsk: Split xskxceiver selftests/bpf: test_xsk: Initialize bitmap before use selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped() selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem() selftests/bpf: test_xsk: Wrap test clean-up in functions selftests/bpf: test_xsk: Release resources when swap fails selftests/bpf: test_xsk: Add return value to init_iface() selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails selftests/bpf: test_xsk: Don't exit immediately when workers fail selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails selftests/bpf: test_xsk: Don't exit immediately on allocation failures selftests/bpf: test_xsk: Isolate non-CI tests selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework tools/testing/selftests/bpf/Makefile | 11 +- tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/test_xsk.h | 298 +++ tools/testing/selftests/bpf/prog_tests/xsk.c | 151 ++ tools/testing/selftests/bpf/xskxceiver.c | 2696 +-------------------- tools/testing/selftests/bpf/xskxceiver.h | 156 -- 6 files changed, 3183 insertions(+), 2724 deletions(-) --- base-commit: 4481a8590725400f37d3015f0ee0d53a2cdc1bd6 change-id: 20250218-xsk-0cf90e975d14 Best regards, -- Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>

2 months, 2 weeks

4
19
0 0

[PATCH 0/1] selftests: net: use BASH for bareudp testing

by Po-Hsu Lin

The bareudp.sh script uses /bin/sh and it will load another lib.sh BASH script at the very beginning. But on some operating systems like Ubuntu, /bin/sh is actually pointed to DASH, thus it will try to run BASH commands with DASH and consequently leads to syntax issues. This patch fixes syntax failures on systems where /bin/sh is not BASH by explicitily using BASH for bareudp.sh. Po-Hsu Lin (1): selftests: net: use BASH for bareudp testing tools/testing/selftests/net/bareudp.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.34.1

2 months, 2 weeks

3
3
0 0

[PATCH v2] selftest: net: fix socklen_t type mismatch in sctp_collision test

by Ankit Khushwaha

Socket APIs like recvfrom(), accept(), and getsockname() expect socklen_t* arg, but tests were using int variables. This causes -Wpointer-sign warnings on platforms where socklen_t is unsigned. Change the variable type from int to socklen_t to resolve the warning and ensure type safety across platforms. warning fixed: sctp_collision.c:62:70: warning: passing 'int *' to parameter of type 'socklen_t *' (aka 'unsigned int *') converts between pointers to integer types with different sign [-Wpointer-sign] 62 | ret = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&daddr, &len); | ^~~~ /usr/include/sys/socket.h:165:27: note: passing argument to parameter '__addr_len' here 165 | socklen_t *__restrict __addr_len); | ^ Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/netfilter/sctp_collision.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/netfilter/sctp_collision.c b/tools/testing/selftests/net/netfilter/sctp_collision.c index 21bb1cfd8a85..91df996367e9 100644 --- a/tools/testing/selftests/net/netfilter/sctp_collision.c +++ b/tools/testing/selftests/net/netfilter/sctp_collision.c @@ -9,7 +9,8 @@ int main(int argc, char *argv[]) { struct sockaddr_in saddr = {}, daddr = {}; - int sd, ret, len = sizeof(daddr); + socklen_t len = sizeof(daddr); struct timeval tv = {25, 0}; char buf[] = "hello"; + int sd, ret; -- 2.51.0

2 months, 2 weeks

3
3
0 0

[PATCH v3 0/3] KHO: kfence + KHO memory corruption fix

by Pasha Tatashin

This series fixes a memory corruption bug in KHO that occurs when KFENCE is enabled. The root cause is that KHO metadata, allocated via kzalloc(), can be randomly serviced by kfence_alloc(). When a kernel boots via KHO, the early memblock allocator is restricted to a "scratch area". This forces the KFENCE pool to be allocated within this scratch area, creating a conflict. If KHO metadata is subsequently placed in this pool, it gets corrupted during the next kexec operation. Patch 1/3 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG) that adds checks to detect and fail any operation that attempts to place KHO metadata or preserved memory within the scratch area. This serves as a validation and diagnostic tool to confirm the problem without affecting production builds. Patch 2/3 Increases bitmap to PAGE_SIZE, so buddy allocator can be used. Patch 3/3 Provides the fix by modifying KHO to allocate its metadata directly from the buddy allocator instead of slab. This bypasses the KFENCE interception entirely. Pasha Tatashin (3): liveupdate: kho: warn and fail on metadata or preserved memory in scratch area liveupdate: kho: Increase metadata bitmap size to PAGE_SIZE liveupdate: kho: allocate metadata directly from the buddy allocator include/linux/gfp.h | 3 ++ kernel/Kconfig.kexec | 9 ++++ kernel/Makefile | 1 + kernel/kexec_handover.c | 72 ++++++++++++++++++++------------ kernel/kexec_handover_debug.c | 25 +++++++++++ kernel/kexec_handover_internal.h | 16 +++++++ 6 files changed, 100 insertions(+), 26 deletions(-) create mode 100644 kernel/kexec_handover_debug.c create mode 100644 kernel/kexec_handover_internal.h base-commit: 6548d364a3e850326831799d7e3ea2d7bb97ba08 -- 2.51.0.869.ge66316f041-goog

2 months, 2 weeks

6
24
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror