This series adds ONE_REG interface for SBI FWFT extension implemented
by KVM RISC-V. This was missed out in accepted SBI FWFT patches for
KVM RISC-V.
These patches can also be found in the riscv_kvm_fwft_one_reg_v3 branch
at: https://github.com/avpatel/linux.git
Changes since v2:
- Re-based on latest KVM RISC-V queue
- Improved FWFT ONE_REG interface to allow enabling/disabling each
FWFT feature from KVM userspace
Changes since v1:
- Dropped have_state in PATCH4 as suggested by Drew
- Added Drew's Reviewed-by in appropriate patches
Anup Patel (6):
RISC-V: KVM: Set initial value of hedeleg in kvm_arch_vcpu_create()
RISC-V: KVM: Introduce feature specific reset for SBI FWFT
RISC-V: KVM: Introduce optional ONE_REG callbacks for SBI extensions
RISC-V: KVM: Move copy_sbi_ext_reg_indices() to SBI implementation
RISC-V: KVM: Implement ONE_REG interface for SBI FWFT state
KVM: riscv: selftests: Add SBI FWFT to get-reg-list test
arch/riscv/include/asm/kvm_vcpu_sbi.h | 22 +-
arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 15 ++
arch/riscv/kvm/vcpu.c | 3 +-
arch/riscv/kvm/vcpu_onereg.c | 60 +----
arch/riscv/kvm/vcpu_sbi.c | 172 +++++++++++--
arch/riscv/kvm/vcpu_sbi_fwft.c | 227 ++++++++++++++++--
arch/riscv/kvm/vcpu_sbi_sta.c | 63 +++--
.../selftests/kvm/riscv/get-reg-list.c | 32 +++
9 files changed, 467 insertions(+), 128 deletions(-)
--
2.43.0
When many ADD_ADDR need to be sent, it can take some time to send each
of them, and create new subflows. Some CIs seem to occasionally have
issues with these tests, especially with "debug" kernels.
Two subtests will now run for a slightly longer time: the last two where
3 or more ADD_ADDR are sent during the test.
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index e9e11a9e60fd5374c8a98c3b7159ccbca8053030..b41cebfa1f921ce9ea6a88a908bf6d5e6027b367 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -2268,7 +2268,8 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
pm_nl_set_limits $ns2 3 3
- run_tests $ns1 $ns2 10.0.1.1
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 3 3 3
chk_add_nr 3 3
fi
@@ -2280,7 +2281,8 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
pm_nl_add_endpoint $ns1 10.0.14.1 flags signal
pm_nl_set_limits $ns2 3 3
- run_tests $ns1 $ns2 10.0.1.1
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
join_syn_tx=3 \
chk_join_nr 1 1 1
chk_add_nr 3 3
--
2.51.0
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.
To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received, except
if no ADD_ADDR are expected.
Before the parent commit, the tolerance was present for each tests where
the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now
that all tests can have retransmitted ADD_ADDR, it is normal to apply
the same tolerance for all tests.
An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to many tests, it seems better to increase the tolerance.
Later, we could add a new MIB counter to identify the ADD_ADDR
retransmissions, and remove the tolerance when this counter is
available.
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..e9e11a9e60fd5374c8a98c3b7159ccbca8053030 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -358,6 +358,7 @@ reset_with_add_addr_timeout()
tables="${ip6tables}"
fi
+ # set a maximum, to avoid too long timeout with exponential backoff
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
@@ -1669,7 +1670,6 @@ chk_add_nr()
local tx=""
local rx=""
local count
- local timeout
if [[ $ns_invert = "invert" ]]; then
ns_tx=$ns2
@@ -1678,15 +1678,13 @@ chk_add_nr()
rx=" server"
fi
- timeout=$(ip netns exec ${ns_tx} sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr rx${rx}"
count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
+ # Tolerate more ADD_ADDR then expected (if any), due to retransmissions
+ elif [ "$count" != "$add_nr" ] &&
+ { [ "$add_nr" -eq 0 ] || [ "$count" -lt "$add_nr" ]; }; then
fail_test "got $count ADD_ADDR[s] expected $add_nr"
else
print_ok
@@ -1774,18 +1772,15 @@ chk_add_tx_nr()
{
local add_tx_nr=$1
local echo_tx_nr=$2
- local timeout
local count
- timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr tx"
count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
+ # Tolerate more ADD_ADDR then expected (if any), due to retransmissions
+ elif [ "$count" != "$add_tx_nr" ] &&
+ { [ "$add_tx_nr" -eq 0 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr"
else
print_ok
--
2.51.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.
Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.
This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.
Suggested-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch(a)openai.com>
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
v2: no changes.
---
Documentation/networking/mptcp-sysctl.rst | 8 +++++---
net/mptcp/pm.c | 28 ++++++++++++++++++++++++----
2 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 1683c139821e3ba6d9eaa3c59330a523d29f1164..1eb6af26b4a7acdedd575a126c576210a78f0d4d 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -8,9 +8,11 @@ MPTCP Sysfs variables
===============================
add_addr_timeout - INTEGER (seconds)
- Set the timeout after which an ADD_ADDR control message will be
- resent to an MPTCP peer that has not acknowledged a previous
- ADD_ADDR message.
+ Set the maximum value of timeout after which an ADD_ADDR control message
+ will be resent to an MPTCP peer that has not acknowledged a previous
+ ADD_ADDR message. A dynamically estimated retransmission timeout based
+ on the estimated connection round-trip-time is used if this value is
+ lower than the maximum one.
Do not retransmit if set to 0.
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 136a380602cae872b76560649c924330e5f42533..204e1f61212e2be77a8476f024b59be67d04b80a 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -268,6 +268,27 @@ int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk,
return -EINVAL;
}
+static unsigned int mptcp_adjust_add_addr_timeout(struct mptcp_sock *msk)
+{
+ const struct net *net = sock_net((struct sock *)msk);
+ unsigned int rto = mptcp_get_add_addr_timeout(net);
+ struct mptcp_subflow_context *subflow;
+ unsigned int max = 0;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ struct inet_connection_sock *icsk = inet_csk(ssk);
+
+ if (icsk->icsk_rto > max)
+ max = icsk->icsk_rto;
+ }
+
+ if (max && max < rto)
+ rto = max;
+
+ return rto;
+}
+
static void mptcp_pm_add_timer(struct timer_list *timer)
{
struct mptcp_pm_add_entry *entry = timer_container_of(entry, timer,
@@ -292,7 +313,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
goto out;
}
- timeout = mptcp_get_add_addr_timeout(sock_net(sk));
+ timeout = mptcp_adjust_add_addr_timeout(msk);
if (!timeout)
goto out;
@@ -307,7 +328,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
if (entry->retrans_times < ADD_ADDR_RETRANS_MAX)
sk_reset_timer(sk, timer,
- jiffies + timeout);
+ jiffies + (timeout << entry->retrans_times));
spin_unlock_bh(&msk->pm.lock);
@@ -348,7 +369,6 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *add_entry = NULL;
struct sock *sk = (struct sock *)msk;
- struct net *net = sock_net(sk);
unsigned int timeout;
lockdep_assert_held(&msk->pm.lock);
@@ -374,7 +394,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
reset_timer:
- timeout = mptcp_get_add_addr_timeout(net);
+ timeout = mptcp_adjust_add_addr_timeout(msk);
if (timeout)
sk_reset_timer(sk, &add_entry->add_timer, jiffies + timeout);
--
2.51.0
Changes in v2:
- Optimized the logic in descriptions. (Song Liu)
- Created a new header file to declare kfuncs for future extensions included by other files. (Christian Loehle)
- Fixed some logical issues in the code. (Christian Loehle)
Reference:
[1] https://lore.kernel.org/bpf/20250829101137.9507-1-yikai.lin@vivo.com/
Summary
----------
Hi, everyone,
This patch set introduces an extensible cpuidle governor framework
using BPF struct_ops, enabling dynamic implementation of idle-state selection policies
via BPF programs.
Motivation
----------
As is well-known, CPUs support multiple idle states (e.g., C0, C1, C2, ...),
where deeper states reduce power consumption, but results in longer wakeup latency,
potentially affecting performance.
Existing generic cpuidle governors operate effectively in common scenarios
but exhibit suboptimal behavior in specific Android phone's use cases.
Our testing reveals that during low-utilization scenarios
(e.g., screen-off background tasks like music playback with CPU utilization <10%),
the C0 state occupies ~50% of idle time, causing significant energy inefficiency.
Reducing C0 to ≤20% could yield ≥5% power savings on mobile phones.
To address this, we expect:
1.Dynamic governor switching to power-saved policies for low cpu utilization scenarios (e.g., screen-off mode)
2.Dynamic switching to alternate governors for high-performance scenarios (e.g., gaming)
OverView
----------
The BPF cpuidle ext governor registers at postcore_initcall()
but remains disabled by default due to its low priority "rating" with value "1".
Activation requires adjust higer "rating" than other governors within BPF.
Core Components:
1.**struct cpuidle_gov_ext_ops** – BPF-overridable operations:
- ops.enable()/ops.disable(): enable or disable callback
- ops.select(): cpu Idle-state selection logic
- ops.set_stop_tick(): Scheduler tick management after state selection
- ops.reflect(): feedback info about previous idle state.
- ops.init()/ops.deinit(): Initialization or cleanup.
2.**Critical kfuncs for kernel state access**:
- bpf_cpuidle_ext_gov_update_rating():
Activate ext governor by raising rating must be called from "ops.init()"
- bpf_cpuidle_ext_gov_latency_req(): get idle-state latency constraints
- bpf_tick_nohz_get_sleep_length(): get CPU sleep duration in tickless mode
Future work
----------
1. Scenario detection: Identifying low-utilization states (e.g., screen-off + background music)
2. Policy optimization: Optimizing state-selection algorithms for specific scenarios
Is it related to sched_ext?
---------------------------
The cpuidle framework is as follows.
----------------------------------------------------------
Scheduler Core
----------------------------------------------------------
|
v
----------------------------------------------------------
| FAIR Class | EXT Class | IDLE Class |
----------------------------------------------------------
| | | |
| | | v
| | | ------------------------
| | | enter_cpu_idle()
| | | ------------------------
| | | |
| | | v
| | | ------------------------------
| | | | CPUIDLE Governor |
| | | ------------------------------
| | | | | |
| | | v v v
| | |-----------------------------------
| | | default | | other | | BPF ext |
| | | Governor | | Governor | | Governor | <<===Here is the feature we add.
| | |-----------------------------------
| | | | | |
| | | v v v
| | |-------------------------------------
| | | select idle state
| | |-------------------------------------
Whereas cpuidle is invoked after switching to idle class when no tasks are present in the scheduling RQ.
They are not directly related, so implementing kfuncs or other extensions through sched_ext is not feasible.
Lin Yikai (2):
cpuidle: Implement BPF extensible cpuidle governor class
selftests/bpf: Add selftests for cpuidle_gov_ext
drivers/cpuidle/Kconfig | 12 +
drivers/cpuidle/governors/Makefile | 1 +
drivers/cpuidle/governors/ext.c | 537 ++++++++++++++++++
.../bpf/prog_tests/test_cpuidle_gov_ext.c | 28 +
.../selftests/bpf/progs/cpuidle_common.h | 13 +
.../selftests/bpf/progs/cpuidle_gov_ext.c | 200 +++++++
6 files changed, 791 insertions(+)
create mode 100644 drivers/cpuidle/governors/ext.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cpuidle_gov_ext.c
create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_common.h
create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_gov_ext.c
--
2.43.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v16 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28, and it
will be RFC9768.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v16 (6-Sep-2025)
- Use TCP_ECN_IN_ACCECN_OUT_ACCECN, TCP_ECN_IN_ECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_ECN in comments of tcp_ecn_send_syn() (Eric Dumazet <edumazet(a)google.com>)
- Add tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to make tcp_info be multiple of 64 bits in patch #12
v15 (14-Aug-205)
- Update pahole results in commit messages
- Accurate ECN will become RFC9768
v14 (22-Jul-2025)
- Add missing const for struct tcp_sock of tcp_accecn_option_beacon_check() of #11 (Simon Horman <horms(a)kernel.org>)
v13 (18-Jul-2025)
- Implement tcp_accecn_extract_syn_ect() and tcp_accecn_reflector_flags() with static array lookup of patch #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Fix typos in comments of #6 and remove patch #7 of v12 about simulatenous connect (Paolo Abeni <pabeni(a)redhat.com>)
- Move TCP_ACCECN_E1B_INIT_OFFSET, TCP_ACCECN_E0B_INIT_OFFSET, and TCP_ACCECN_CEB_INIT_OFFSET from patch #7 to #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use static array lookup in tcp_accecn_optfield_to_ecnfield() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return false when WARN_ON_ONCE() is true in tcp_accecn_process_option() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Make synack_ecn_bytes as static const array and use const u32 pointer in tcp_options_write() of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use ALIGN() and ALIGN_DOWN() in tcp_options_fit_accecn() to pad TCP AccECN option to dword of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return TCP_ACCECN_OPT_FAIL_SEEN if WARN_ON_ONCE() is true in tcp_accecn_option_init() of #12 (Paolo Abeni <pabeni(a)redhat.com>)
v12 (04-Jul-2025)
- Fix compilation issues with some intermediate patches in v11
- Add more comments for AccECN helpers of tcp_ecn.h
v11 (03-Jul-2025)
- Fix compilation issues with some intermediate patches in v10
v10 (02-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (5):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 12 +
include/linux/tcp.h | 32 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 642 ++++++++++++++++++
include/uapi/linux/tcp.h | 9 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 353 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 294 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1406 insertions(+), 184 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
devmem test fails on NIPA. Most likely we get skb(s) with readable
frags (why?) but the failure manifests as an OOM. The OOM happens
because ncdevmem spams the following message:
recvmsg ret=-1
recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at
least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev <sdf(a)fomichev.me>
---
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index 8dc9511d046f..c0a22938bed2 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -945,6 +945,10 @@ static int do_server(struct memory_buffer *mem)
continue;
if (ret < 0) {
perror("recvmsg");
+ if (errno == EFAULT) {
+ pr_err("received EFAULT, won't recover");
+ goto err_close_client;
+ }
continue;
}
if (ret == 0) {
--
2.51.0
Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
In order to do it, refactor create_dynamic_target(), so it can be used to
create random targets in the torture test.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (2):
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++--
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++++++++++
3 files changed, 147 insertions(+), 11 deletions(-)
---
base-commit: 2fd4161d0d2547650d9559d57fc67b4e0a26a9e3
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
[ I think at this point everyone is OK with the ABI, and the x86
implementation has been tested so hopefully we are near to being
able to get this merged? If there are any outstanding issues let
me know and I can look at addressing them. The one possible issue
I am aware of is that the RISC-V shadow stack support was briefly
in -next but got dropped along with the general RISC-V issues during
the last merge window, rebasing for that is still in progress. I
guess ideally this could be applied on a branch and then pulled into
the RISC-V tree? ]
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3(). There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does). The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v20:
- Comment fixes and clarifications in x86 arch_shstk_validate_clone()
from Rick Edgecombe.
- Spelling fix in documentation.
- Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@k…
Changes in v19:
- Rebase onto v6.17-rc1.
- Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@k…
Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@k…
Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@k…
Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@k…
Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@k…
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 55 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 9 +-
kernel/fork.c | 93 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
When the bpf ring buffer is full, new events can not be recorded util
the consumer consumes some events to free space. This may cause critical
events to be discarded, such as in fault diagnostic, where recent events
are more critical than older ones.
So add ovewrite mode for bpf ring buffer. In this mode, the new event
overwrites the oldest event when the buffer is full.
v2:
- remove libbpf changes (Andrii)
- update overwrite benchmark
v1:
https://lore.kernel.org/bpf/20250804022101.2171981-1-xukuohai@huaweicloud.c…
Xu Kuohai (3):
bpf: Add overwrite mode for bpf ring buffer
selftests/bpf: Add test for overwrite ring buffer
selftests/bpf/benchs: Add producer and overwrite bench for ring buffer
include/uapi/linux/bpf.h | 4 +
kernel/bpf/ringbuf.c | 159 +++++++++++++++---
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/bench.c | 2 +
.../selftests/bpf/benchs/bench_ringbufs.c | 95 ++++++++++-
.../bpf/benchs/run_bench_ringbufs.sh | 4 +
.../selftests/bpf/prog_tests/ringbuf.c | 74 ++++++++
.../selftests/bpf/progs/ringbuf_bench.c | 10 ++
.../bpf/progs/test_ringbuf_overwrite.c | 98 +++++++++++
10 files changed, 418 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c
--
2.43.0
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs mount is
auto-selected based on the mounting process's active pidns, and the
pidns itself is basically hidden once the mount has been constructed).
/* pidns mount option for procfs */
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns of a
procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes creates
a procfs mount from an empty pidns so that user namespaced containers
can be nested (without this, the nested containers would fail to
mount procfs). But this requires forking off a helper process because
you cannot just one-shot this using mount(2).
* Container runtimes in general need to fork into a container before
configuring its mounts, which can lead to security issues in the case
of shared-pidns containers (a privileged process in the pidns can
interact with your container runtime process). While
SUID_DUMP_DISABLE and user namespaces make this less of an issue, the
strict need for this due to a minor uAPI wart is kind of unfortunate.
Things would be much easier if there was a way for userspace to just
specify the pidns they want. Patch 1 implements a new "pidns" argument
which can be set using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
The initial security model I have in this RFC is to be as conservative
as possible and just mirror the security model for setns(2) -- which
means that you can only set pidns=... to pid namespaces that your
current pid namespace is a direct ancestor of and you have CAP_SYS_ADMIN
privileges over the pid namespace. This fulfils the requirements of
container runtimes, but I suspect that this may be too strict for some
usecases.
The pidns argument is not displayed in mountinfo -- it's not clear to me
what value it would make sense to show (maybe we could just use ns_dname
to provide an identifier for the namespace, but this number would be
fairly useless to userspace). I'm open to suggestions. Note that
PROCFS_GET_PID_NAMESPACE (see below) does at least let userspace get
information about this outside of mountinfo.
Note that you cannot change the pidns of an already-created procfs
instance. The primary reason is that allowing this to be changed would
require RCU-protecting proc_pid_ns(sb) and thus auditing all of
fs/proc/* and some of the users in fs/* to make sure they wouldn't UAF
the pid namespace. Since creating procfs instances is very cheap, it
seems unnecessary to overcomplicate this upfront. Trying to reconfigure
procfs this way errors out with -EBUSY.
/* ioctl(PROCFS_GET_PID_NAMESPACE) */
In addition, being able to figure out what pid namespace is being used
by a procfs mount is quite useful when you have an administrative
process (such as a container runtime) which wants to figure out the
correct way of mapping PIDs between its own namespace and the namespace
for procfs (using NS_GET_{PID,TGID}_{IN,FROM}_PIDNS). There are
alternative ways to do this, but they all rely on ancillary information
that third-party libraries and tools do not necessarily have access to.
To make this easier, add a new ioctl (PROCFS_GET_PID_NAMESPACE) which
can be used to get a reference to the pidns that a procfs is using.
Rather than copying the (fairly strict) security model for setns(2),
apply a slightly looser model to better match what userspace can already
do:
* Make the ioctl only valid on the root (meaning that a process without
access to the procfs root -- such as only having an fd to a procfs
file or some open_tree(2)-like subset -- cannot use this API). This
means that the process already has some level of access to the
/proc/$pid directories.
* If the calling process is in an ancestor pidns, then they can already
create pidfd for processes inside the pidns, which is morally
equivalent to a pidns file descriptor according to setns(2). So it
seems reasonable to just allow it in this case. (The justification
for this model was suggested by Christian.)
* If the process has access to /proc/1/ns/pid already (i.e. has
ptrace-read access to the pidns pid1), then this ioctl is equivalent
to just opening a handle to it that way.
Ideally we would check for ptrace-read access against all processes
in the pidns (which is very likely to be true for at least one
process, as SUID_DUMP_DISABLE is cleared on exec(2) and is rarely set
by most programs), but this would obviously not scale.
I'm open to suggestions for whether we need to make this stricter (or
possibly allow more cases).
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v4:
- Remove unneeded EXPORT_SYMBOL_GPL. [Christian Brauner]
- Return -EOPNOTSUPP for new APIs for CONFIG_PID_NS=n rather than
pretending they don't exist entirely. [Christian Brauner]
- PROCFS_IOCTL_MAGIC conflicts with XSDFEC_MAGIC, so we need to allocate
subvalues more carefully (switch to _IO(PROCFS_IOCTL_MAGIC, 32)).
- Add some more selftests for PROCFS_GET_PID_NAMESPACE.
- Reword argument for PROCFS_GET_PID_NAMESPACE security model based on
Christian's suggestion, and remove CAP_SYS_ADMIN edge-case (in most
cases, such a process would also have ptrace-read credentials over the
pidns pid1).
- v3: <https://lore.kernel.org/r/20250724-procfs-pidns-api-v3-0-4c685c910923@cypha…>
Changes in v3:
- Disallow changing pidns for existing procfs instances, as we'd
probably have to RCU-protect everything that touches the pinned pidns
reference.
- Improve tests with slightly nicer ASSERT_ERRNO* macros.
- v2: <https://lore.kernel.org/r/20250723-procfs-pidns-api-v2-0-621e7edd8e40@cypha…>
Changes in v2:
- #ifdef CONFIG_PID_NS
- Improve cover letter wording to make it clear we're talking about two
separate features with different permission models. [Andy Lutomirski]
- Fix build warnings in pidns_is_ancestor() patch. [kernel test robot]
- v1: <https://lore.kernel.org/r/20250721-procfs-pidns-api-v1-0-5cd9007e512d@cypha…>
---
Aleksa Sarai (4):
pidns: move is-ancestor logic to helper
procfs: add "pidns" mount option
procfs: add PROCFS_GET_PID_NAMESPACE ioctl
selftests/proc: add tests for new pidns APIs
Documentation/filesystems/proc.rst | 12 ++
fs/proc/root.c | 166 +++++++++++++++-
include/linux/pid_namespace.h | 9 +
include/uapi/linux/fs.h | 4 +
kernel/pid_namespace.c | 22 ++-
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 315 ++++++++++++++++++++++++++++++
8 files changed, 514 insertions(+), 16 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250717-procfs-pidns-api-8ed1583431f0
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
Print a message so that people reading dmesg know that these NULL
dereferences are not a bug, but instead a deliberate part of
the testing.
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
lib/kunit/kunit-test.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index 8c01eabd4eaf..a8b6e16f4465 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -119,6 +119,8 @@ static void kunit_test_null_dereference(void *data)
struct kunit *test = data;
int *null = NULL;
+ pr_info("Triggering deliberate NULL derefence.\n");
+
*null = 0;
KUNIT_FAIL(test, "This line should never be reached\n");
--
2.47.2
Hello Jiayuan Chen,
Commit 7b2fa44de5e7 ("selftest/bpf/benchs: Add benchmark for sockmap
usage") from Apr 7, 2025 (linux-next), leads to the following Smatch
static checker warning:
tools/testing/selftests/bpf/benchs/bench_sockmap.c:129 bench_sockmap_prog_destroy()
error: buffer overflow 'ctx.fds' 5 <= 19
tools/testing/selftests/bpf/benchs/bench_sockmap.c
123 static void bench_sockmap_prog_destroy(void)
124 {
125 int i;
126
127 for (i = 0; i < sizeof(ctx.fds); i++) {
^^^^^^^^^^^^^^^
This should be ARRAY_SIZE(ctx.fds) otherwise it's a buffer overflow.
128 if (ctx.fds[0] > 0)
^^^^^^^^^^
Instead of .fds[0] it should be .fds[i], right?
--> 129 close(ctx.fds[i]);
130 }
131
132 bench_sockmap_prog__destroy(ctx.skel);
133 }
regards,
dan carpenter
There are currently no kernel tests that verify setting and getting
options of the team driver.
In the future, options may be added that implicitly change other
options, which will make it useful to have tests like these that show
nothing breaks. There will be a follow up patch to this that adds new
"rx_enabled" and "tx_enabled" options, which will implicitly affect the
"enabled" option value and vice versa.
The tests use teamnl to first set options to specific values and then
gets them to compare to the set values.
Signed-off-by: Marc Harvey <marcharvey(a)google.com>
---
Changes in v2:
- Fixed shellcheck failures.
- Fixed test failing in vng by adding a config option to enable the
team driver's active backup mode.
- Link to v1: https://lore.kernel.org/netdev/20250902235504.4190036-1-marcharvey@google.c…
.../selftests/drivers/net/team/Makefile | 6 +-
.../testing/selftests/drivers/net/team/config | 1 +
.../selftests/drivers/net/team/options.sh | 192 ++++++++++++++++++
3 files changed, 197 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/team/options.sh
diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile
index eaf6938f100e..8b00b70ce67f 100644
--- a/tools/testing/selftests/drivers/net/team/Makefile
+++ b/tools/testing/selftests/drivers/net/team/Makefile
@@ -1,11 +1,13 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for net selftests
-TEST_PROGS := dev_addr_lists.sh propagation.sh
+TEST_PROGS := dev_addr_lists.sh propagation.sh options.sh
TEST_INCLUDES := \
../bonding/lag_lib.sh \
../../../net/forwarding/lib.sh \
- ../../../net/lib.sh
+ ../../../net/lib.sh \
+ ../../../net/in_netns.sh \
+ ../../../net/lib/sh/defer.sh \
include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/team/config b/tools/testing/selftests/drivers/net/team/config
index 636b3525b679..558e1d0cf565 100644
--- a/tools/testing/selftests/drivers/net/team/config
+++ b/tools/testing/selftests/drivers/net/team/config
@@ -3,4 +3,5 @@ CONFIG_IPV6=y
CONFIG_MACVLAN=y
CONFIG_NETDEVSIM=m
CONFIG_NET_TEAM=y
+CONFIG_NET_TEAM_MODE_ACTIVEBACKUP=y
CONFIG_NET_TEAM_MODE_LOADBALANCE=y
diff --git a/tools/testing/selftests/drivers/net/team/options.sh b/tools/testing/selftests/drivers/net/team/options.sh
new file mode 100755
index 000000000000..82bf22aa3480
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/team/options.sh
@@ -0,0 +1,192 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# These tests verify basic set and get functionality of the team
+# driver options over netlink.
+
+# Run in private netns.
+test_dir="$(dirname "$0")"
+if [[ $# -eq 0 ]]; then
+ "${test_dir}"/../../../net/in_netns.sh "$0" __subprocess
+ exit $?
+fi
+
+ALL_TESTS="
+ team_test_options
+"
+
+source "${test_dir}/../../../net/lib.sh"
+
+TEAM_PORT="team0"
+MEMBER_PORT="dummy0"
+
+setup()
+{
+ ip link add name "${MEMBER_PORT}" type dummy
+ ip link add name "${TEAM_PORT}" type team
+}
+
+get_and_check_value()
+{
+ local option_name="$1"
+ local expected_value="$2"
+ local port_flag="$3"
+
+ local value_from_get
+
+ if ! value_from_get=$(teamnl "${TEAM_PORT}" getoption "${option_name}" \
+ "${port_flag}"); then
+ echo "Could not get option '${option_name}'" >&2
+ return 1
+ fi
+
+ if [[ "${value_from_get}" != "${expected_value}" ]]; then
+ echo "Incorrect value for option '${option_name}'" >&2
+ echo "get (${value_from_get}) != set (${expected_value})" >&2
+ return 1
+ fi
+}
+
+set_and_check_get()
+{
+ local option_name="$1"
+ local option_value="$2"
+ local port_flag="$3"
+
+ local value_from_get
+
+ if ! teamnl "${TEAM_PORT}" setoption "${option_name}" "${option_value}" \
+ "${port_flag}"; then
+ echo "'setoption ${option_name} ${option_value}' failed" >&2
+ return 1
+ fi
+
+ get_and_check_value "${option_name}" "${option_value}" "${port_flag}"
+ return $?
+}
+
+# Get a "port flag" to pass to the `teamnl` command.
+# E.g. $1="dummy0" -> "port=dummy0",
+# $1="" -> ""
+get_port_flag()
+{
+ local port_name="$1"
+
+ if [[ -n "${port_name}" ]]; then
+ echo "--port=${port_name}"
+ fi
+}
+
+attach_port_if_specified()
+{
+ local port_name="${1}"
+
+ if [[ -n "${port_name}" ]]; then
+ ip link set dev "${port_name}" master "${TEAM_PORT}"
+ return $?
+ fi
+}
+
+detach_port_if_specified()
+{
+ local port_name="${1}"
+
+ if [[ -n "${port_name}" ]]; then
+ ip link set dev "${port_name}" nomaster
+ return $?
+ fi
+}
+
+#######################################
+# Test that an option's get value matches its set value.
+# Globals:
+# RET - Used by testing infra like `check_err`.
+# EXIT_STATUS - Used by `log_test` to whole script exit value.
+# Arguments:
+# option_name - The name of the option.
+# value_1 - The first value to try setting.
+# value_2 - The second value to try setting.
+# port_name - The (optional) name of the attached port.
+#######################################
+team_test_option()
+{
+ local option_name="$1"
+ local value_1="$2"
+ local value_2="$3"
+ local possible_values="$2 $3 $2"
+ local port_name="$4"
+ local port_flag
+
+ RET=0
+
+ echo "Setting '${option_name}' to '${value_1}' and '${value_2}'"
+
+ attach_port_if_specified "${port_name}"
+ check_err $? "Couldn't attach ${port_name} to master"
+ port_flag=$(get_port_flag "${port_name}")
+
+ # Set and get both possible values.
+ for value in ${possible_values}; do
+ set_and_check_get "${option_name}" "${value}" "${port_flag}"
+ check_err $? "Failed to set '${option_name}' to '${value}'"
+ done
+
+ detach_port_if_specified "${port_name}"
+ check_err $? "Couldn't detach ${port_name} from its master"
+
+ log_test "Set + Get '${option_name}' test"
+}
+
+#######################################
+# Test that getting a non-existant option fails.
+# Globals:
+# RET - Used by testing infra like `check_err`.
+# EXIT_STATUS - Used by `log_test` to whole script exit value.
+# Arguments:
+# option_name - The name of the option.
+# port_name - The (optional) name of the attached port.
+#######################################
+team_test_get_option_fails()
+{
+ local option_name="$1"
+ local port_name="$2"
+ local port_flag
+
+ RET=0
+
+ attach_port_if_specified "${port_name}"
+ check_err $? "Couldn't attach ${port_name} to master"
+ port_flag=$(get_port_flag "${port_name}")
+
+ # Just confirm that getting the value fails.
+ teamnl "${TEAM_PORT}" getoption "${option_name}" "${port_flag}"
+ check_fail $? "Shouldn't be able to get option '${option_name}'"
+
+ detach_port_if_specified "${port_name}"
+
+ log_test "Get '${option_name}' fails"
+}
+
+team_test_options()
+{
+ # Wrong option name behavior.
+ team_test_get_option_fails fake_option1
+ team_test_get_option_fails fake_option2 "${MEMBER_PORT}"
+
+ # Correct set and get behavior.
+ team_test_option mode activebackup loadbalance
+ team_test_option notify_peers_count 0 5
+ team_test_option notify_peers_interval 0 5
+ team_test_option mcast_rejoin_count 0 5
+ team_test_option mcast_rejoin_interval 0 5
+ team_test_option enabled true false "${MEMBER_PORT}"
+ team_test_option user_linkup true false "${MEMBER_PORT}"
+ team_test_option user_linkup_enabled true false "${MEMBER_PORT}"
+ team_test_option priority 10 20 "${MEMBER_PORT}"
+ team_test_option queue_id 0 1 "${MEMBER_PORT}"
+}
+
+require_command teamnl
+setup
+tests_run
+exit "${EXIT_STATUS}"
--
2.51.0.338.gd7d06c2dae-goog
Usually the autodefer helpers in lib.sh are expected to be run in context
where success is the expected outcome. However when using them for feature
detection, failure can legitimately occur. But the failed command still
schedules a cleanup, which will likely fail again.
Instead, only schedule deferred cleanup when the positive command succeeds.
This way of organizing the cleanup has the added benefit that now the
return code from these functions reflects whether the command passed.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
tools/testing/selftests/net/lib.sh | 32 +++++++++++++++---------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index c7add0dc4c60..80cf1a75136c 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -547,8 +547,8 @@ ip_link_add()
{
local name=$1; shift
- ip link add name "$name" "$@"
- defer ip link del dev "$name"
+ ip link add name "$name" "$@" && \
+ defer ip link del dev "$name"
}
ip_link_set_master()
@@ -556,8 +556,8 @@ ip_link_set_master()
local member=$1; shift
local master=$1; shift
- ip link set dev "$member" master "$master"
- defer ip link set dev "$member" nomaster
+ ip link set dev "$member" master "$master" && \
+ defer ip link set dev "$member" nomaster
}
ip_link_set_addr()
@@ -566,8 +566,8 @@ ip_link_set_addr()
local addr=$1; shift
local old_addr=$(mac_get "$name")
- ip link set dev "$name" address "$addr"
- defer ip link set dev "$name" address "$old_addr"
+ ip link set dev "$name" address "$addr" && \
+ defer ip link set dev "$name" address "$old_addr"
}
ip_link_has_flag()
@@ -590,8 +590,8 @@ ip_link_set_up()
local name=$1; shift
if ! ip_link_is_up "$name"; then
- ip link set dev "$name" up
- defer ip link set dev "$name" down
+ ip link set dev "$name" up && \
+ defer ip link set dev "$name" down
fi
}
@@ -600,8 +600,8 @@ ip_link_set_down()
local name=$1; shift
if ip_link_is_up "$name"; then
- ip link set dev "$name" down
- defer ip link set dev "$name" up
+ ip link set dev "$name" down && \
+ defer ip link set dev "$name" up
fi
}
@@ -609,20 +609,20 @@ ip_addr_add()
{
local name=$1; shift
- ip addr add dev "$name" "$@"
- defer ip addr del dev "$name" "$@"
+ ip addr add dev "$name" "$@" && \
+ defer ip addr del dev "$name" "$@"
}
ip_route_add()
{
- ip route add "$@"
- defer ip route del "$@"
+ ip route add "$@" && \
+ defer ip route del "$@"
}
bridge_vlan_add()
{
- bridge vlan add "$@"
- defer bridge vlan del "$@"
+ bridge vlan add "$@" && \
+ defer bridge vlan del "$@"
}
wait_local_port_listen()
--
2.49.0
The fact that all cleanup (ideally) goes through the defer framework makes
debugging of these commands a bit tricky. However, this also gives us a
nice point to place a hook along the lines of PAUSE_ON_FAIL. When the
environment variable DEFER_PAUSE_ON_FAIL is set, and a cleanup command
results in non-zero exit status, show a bit of debuginfo and give the user
an opportunity to interrupt the execution altogether.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
tools/testing/selftests/net/lib/sh/defer.sh | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tools/testing/selftests/net/lib/sh/defer.sh b/tools/testing/selftests/net/lib/sh/defer.sh
index 6c642f3d0ced..47ab78c4d465 100644
--- a/tools/testing/selftests/net/lib/sh/defer.sh
+++ b/tools/testing/selftests/net/lib/sh/defer.sh
@@ -1,6 +1,10 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+# Whether to pause and allow debugging when an executed deferred command has a
+# non-zero exit code.
+: "${DEFER_PAUSE_ON_FAIL:=no}"
+
# map[(scope_id,track,cleanup_id) -> cleanup_command]
# track={d=default | p=priority}
declare -A __DEFER__JOBS
@@ -38,8 +42,20 @@ __defer__run()
local track=$1; shift
local defer_ix=$1; shift
local defer_key=$(__defer__defer_key $track $defer_ix)
+ local ret
eval ${__DEFER__JOBS[$defer_key]}
+ ret=$?
+
+ if [[ "$DEFER_PAUSE_ON_FAIL" == yes && "$ret" -ne 0 ]]; then
+ echo "Deferred command (track $track index $defer_ix):"
+ echo " ${__DEFER__JOBS[$defer_key]}"
+ echo "... ended with an exit status of $ret"
+ echo "Hit enter to continue, 'q' to quit"
+ read a
+ [[ "$a" == q ]] && exit 1
+ fi
+
unset __DEFER__JOBS[$defer_key]
}
--
2.49.0
Currently the way deferred commands are stored and invoked causes any
whitespace to act as an argument separator when the command is executed.
To make it possible to use spaces in deferred commands, store the commands
quoted, and then eval the string prior to execution.
Fixes: a6e263f125cd ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
tools/testing/selftests/net/lib/sh/defer.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/lib/sh/defer.sh b/tools/testing/selftests/net/lib/sh/defer.sh
index 082f5d38321b..6c642f3d0ced 100644
--- a/tools/testing/selftests/net/lib/sh/defer.sh
+++ b/tools/testing/selftests/net/lib/sh/defer.sh
@@ -39,7 +39,7 @@ __defer__run()
local defer_ix=$1; shift
local defer_key=$(__defer__defer_key $track $defer_ix)
- ${__DEFER__JOBS[$defer_key]}
+ eval ${__DEFER__JOBS[$defer_key]}
unset __DEFER__JOBS[$defer_key]
}
@@ -49,7 +49,7 @@ __defer__schedule()
local ndefers=$(__defer__ndefers $track)
local ndefers_key=$(__defer__ndefer_key $track)
local defer_key=$(__defer__defer_key $track $ndefers)
- local defer="$@"
+ local defer="${@@Q}"
__DEFER__JOBS[$defer_key]="$defer"
__DEFER__NJOBS[$ndefers_key]=$((ndefers + 1))
--
2.49.0
This series adds namespace support to vhost-vsock and loopback. It does
not add namespaces to any of the other guest transports (virtio-vsock,
hyperv, or vmci).
The current revision only supports two modes: local or global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).
The mode is set using /proc/sys/net/vsock/ns_mode.
Modes are per-netns and write-once. This allows a system to configure
namespaces independently (some may share CIDs, others are completely
isolated). This also supports future possible mixed use cases, where
there may be namespaces in global mode spinning up VMs while there are
mixed mode namespaces that provide services to the VMs, but are not
allowed to allocate from the global CID pool.
Additionally, added tests for the new semantics:
tools/testing/selftests/vsock/vmtest.sh
1..22
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 host_vsock_ns_mode_ok
ok 5 host_vsock_ns_mode_write_once_ok
ok 6 global_same_cid_fails
ok 7 local_same_cid_ok
ok 8 global_local_same_cid_ok
ok 9 local_global_same_cid_ok
ok 10 diff_ns_global_host_connect_to_global_vm_ok
ok 11 diff_ns_global_host_connect_to_local_vm_fails
ok 12 diff_ns_global_vm_connect_to_global_host_ok
ok 13 diff_ns_global_vm_connect_to_local_host_fails
ok 14 diff_ns_local_host_connect_to_local_vm_fails
ok 15 diff_ns_local_vm_connect_to_local_host_fails
ok 16 diff_ns_global_to_local_loopback_local_fails
ok 17 diff_ns_local_to_global_loopback_fails
ok 18 diff_ns_local_to_local_loopback_fails
ok 19 diff_ns_global_to_global_loopback_ok
ok 20 same_ns_local_loopback_ok
ok 21 same_ns_local_host_connect_to_local_vm_ok
ok 22 same_ns_local_vm_connect_to_local_host_ok
SUMMARY: PASS=22 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_OQC4.log
Thanks again for everyone's help and reviews!
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
To: Stefano Garzarella <sgarzare(a)redhat.com>
To: Shuah Khan <shuah(a)kernel.org>
To: David S. Miller <davem(a)davemloft.net>
To: Eric Dumazet <edumazet(a)google.com>
To: Jakub Kicinski <kuba(a)kernel.org>
To: Paolo Abeni <pabeni(a)redhat.com>
To: Simon Horman <horms(a)kernel.org>
To: Stefan Hajnoczi <stefanha(a)redhat.com>
To: Michael S. Tsirkin <mst(a)redhat.com>
To: Jason Wang <jasowang(a)redhat.com>
To: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
To: Eugenio Pérez <eperezma(a)redhat.com>
To: K. Y. Srinivasan <kys(a)microsoft.com>
To: Haiyang Zhang <haiyangz(a)microsoft.com>
To: Wei Liu <wei.liu(a)kernel.org>
To: Dexuan Cui <decui(a)microsoft.com>
To: Bryan Tan <bryan-bt.tan(a)broadcom.com>
To: Vishnu Dasa <vishnu.dasa(a)broadcom.com>
To: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: virtualization(a)lists.linux.dev
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: linux-hyperv(a)vger.kernel.org
Cc: berrange(a)redhat.com
Changes in v5:
- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
- vsock_global_net -> vsock_global_dummy_net
- fix netns lookup in vhost_vsock to respect pid namespaces
- add callbacks for vsock_loopback to avoid circular dependency
- vmtest.sh loads vsock_loopback module
- remove vsock_net_mode_can_set()
- change vsock_net_write_mode() to return true/false based on success
- make vsock_net_mode enum instead of u8
- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
---
Bobby Eshleman (9):
vsock: a per-net vsock NS mode state
vsock: add net to vsock skb cb
vsock: add netns to vsock core
vsock/loopback: add netns support
vsock/virtio: add netns to virtio transport common
vhost/vsock: add netns support
selftests/vsock: improve logging in vmtest.sh
selftests/vsock: invoke vsock_test through helpers
selftests/vsock: add namespace tests
MAINTAINERS | 1 +
drivers/vhost/vsock.c | 30 +-
include/linux/virtio_vsock.h | 12 +
include/net/af_vsock.h | 89 ++-
include/net/net_namespace.h | 4 +
include/net/netns/vsock.h | 25 +
net/vmw_vsock/af_vsock.c | 312 ++++++++-
net/vmw_vsock/hyperv_transport.c | 2 +-
net/vmw_vsock/virtio_transport.c | 5 +-
net/vmw_vsock/virtio_transport_common.c | 14 +-
net/vmw_vsock/vmci_transport.c | 4 +-
net/vmw_vsock/vsock_loopback.c | 76 ++-
tools/testing/selftests/vsock/vmtest.sh | 1092 ++++++++++++++++++++++++++-----
13 files changed, 1475 insertions(+), 191 deletions(-)
---
base-commit: 242041164339594ca019481d54b4f68a7aaff64e
change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)meta.com>
Add the benchmark testcase "kprobe-multi-all", which will hook all the
kernel functions during the testing.
This series is separated out from [1].
Changes since V2:
* add some comment to attach_ksyms_all, which notes that don't run the
testing on a debug kernel
Changes since V1:
* introduce trace_blacklist instead of copy-pasting strcmp in the 2nd
patch
* use fprintf() instead of printf() in 3rd patch
Link: https://lore.kernel.org/bpf/20250817024607.296117-1-dongml2@chinatelecom.cn/ [1]
Menglong Dong (3):
selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
selftests/bpf: skip recursive functions for kprobe_multi
selftests/bpf: add benchmark testing for kprobe-multi-all
tools/testing/selftests/bpf/bench.c | 4 +
.../selftests/bpf/benchs/bench_trigger.c | 61 +++++
.../selftests/bpf/benchs/run_bench_trigger.sh | 4 +-
.../bpf/prog_tests/kprobe_multi_test.c | 220 +---------------
.../selftests/bpf/progs/trigger_bench.c | 12 +
tools/testing/selftests/bpf/trace_helpers.c | 234 ++++++++++++++++++
tools/testing/selftests/bpf/trace_helpers.h | 3 +
7 files changed, 319 insertions(+), 219 deletions(-)
--
2.51.0
The two patches fix the va_high_addr_switch.sh test failure on x86_64.
Patch 1 fixes the hugepages setup issue that nr_hugepages is reset too
early in run_vmtests.sh and break the later va_high_addr_switch testing.
Patch 2 fixes the test failure caused by the hint addr align method change
in hugetlb_get_unmapped_area().
Chunyu Hu (2):
selftests/mm: fix hugepages cleanup too early
selftests/mm: fix va_high_addr_switch.sh failure on x86_64
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++--
tools/testing/selftests/mm/va_high_addr_switch.c | 4 ++--
2 files changed, 9 insertions(+), 4 deletions(-)
--
2.49.0
Hi all,
This is a second version of a series I sent some time ago, it continues
the work of migrating the script tests into prog_tests.
The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.
Some tests are flaky so they can't be integrated in the CI as they are.
I think that fixing their flakyness would require a significant amount of
work. So, as first step, I've excluded them from the list of tests
migrated to the CI (see PATCH 13). If these tests get fixed at some
point, integrating them into the CI will be straightforward.
PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 5 fix small issues in the current test
PATCH 7 to 12 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 13 isolates some flaky tests
PATCH 14 integrate the non-flaky tests to the test_progs framework
Maciej, I've fixed the bug you found in the initial series. I've
looked for any hardware able to run test_xsk.sh in my office, but I
couldn't find one ... So here again, only the veth part has been tested,
sorry about that.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com
Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
7 and 8)
- Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com
---
Bastien Curutchet (eBPF Foundation) (14):
selftests/bpf: test_xsk: Split xskxceiver
selftests/bpf: test_xsk: Initialize bitmap before use
selftests/bpf: test_xsk: Fix memory leaks
selftests/bpf: test_xsk: Wrap test clean-up in functions
selftests/bpf: test_xsk: Release resources when swap fails
selftests/bpf: test_xsk: Add return value to init_iface()
selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
selftests/bpf: test_xsk: Don't exit immediately when workers fail
selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
selftests/bpf: test_xsk: Don't exit immediately on allocation failures
selftests/bpf: test_xsk: Move exit_with_error to xskxceiver.c
selftests/bpf: test_xsk: Isolate flaky tests
selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
tools/testing/selftests/bpf/Makefile | 11 +-
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2604 ++++++++++++++++++++
tools/testing/selftests/bpf/prog_tests/test_xsk.h | 294 +++
tools/testing/selftests/bpf/prog_tests/xsk.c | 146 ++
tools/testing/selftests/bpf/xskxceiver.c | 2685 +--------------------
tools/testing/selftests/bpf/xskxceiver.h | 156 --
6 files changed, 3170 insertions(+), 2726 deletions(-)
---
base-commit: e4e08c130231eb8071153ab5f056874d8f70430b
change-id: 20250218-xsk-0cf90e975d14
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
From: Dong Yang <dayss1224(a)gmail.com>
Add supported KVM test cases and fix the compilation dependencies.
---
Changes in v3:
- Reorder patches to fix build dependencies
- Sort common supported test cases alphabetically
- Move ucall_common.h include from common header to specific source files
Changes in v2:
- Delete some repeat KVM test cases on riscv
- Add missing headers to fix the build for new RISC-V KVM selftests
Dong Yang (1):
KVM: riscv: selftests: Add missing headers for new testcases
Quan Zhou (2):
KVM: riscv: selftests: Use the existing RISCV_FENCE macro in
`rseq-riscv.h`
KVM: riscv: selftests: Add common supported test cases
tools/testing/selftests/kvm/Makefile.kvm | 6 ++++++
tools/testing/selftests/kvm/access_tracking_perf_test.c | 1 +
tools/testing/selftests/kvm/include/riscv/processor.h | 1 +
.../selftests/kvm/memslot_modification_stress_test.c | 1 +
tools/testing/selftests/kvm/memslot_perf_test.c | 1 +
tools/testing/selftests/rseq/rseq-riscv.h | 3 +--
6 files changed, 11 insertions(+), 2 deletions(-)
--
2.34.1
Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
messages are being sent, as creating parallel targets.
The code launches three background jobs on distinct schedules:
* Toggle netcons target every 30 iterations
* create and delete random_target every 50 iterations
* toggle iface every 70 iterations
This creates multiple concurrency sources that interact with netconsole
states. This is good practice to simulate stress, and exercise netpoll
and netconsole locks.
This test already found an issue as reported in [1]
Link: https://lore.kernel.org/all/20250901-netpoll_memleak-v1-1-34a181977dfc@debi… [1]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/netcons_torture.sh | 133 +++++++++++++++++++++
2 files changed, 134 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 984ece05f7f92..2b253b1ff4f38 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -17,6 +17,7 @@ TEST_PROGS := \
netcons_fragmented_msg.sh \
netcons_overflow.sh \
netcons_sysdata.sh \
+ netcons_torture.sh \
netpoll_basic.py \
ping.py \
queues.py \
diff --git a/tools/testing/selftests/drivers/net/netcons_torture.sh b/tools/testing/selftests/drivers/net/netcons_torture.sh
new file mode 100755
index 0000000000000..d41884c83cab3
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_torture.sh
@@ -0,0 +1,133 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Repeatedly send kernel messages, toggles netconsole targets on and off,
+# creates and deletes targets in parallel, and toggles the source interface to
+# simulate stress conditions.
+#
+# This test aims verify the robustness of netconsole under dynamic
+# configurations and concurrent operations.
+#
+# The major goal is to run this test with LOCKDEP, Kmemleak and KASAN to make
+# sure no issues is reported.
+#
+# Author: Breno Leitao <leitao(a)debian.org>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+# Number of times the main loop run
+ITERATIONS=${1:-1000}
+
+# Only test extended format
+FORMAT="extended"
+# And ipv6 only
+IP_VERSION="ipv6"
+
+# Create, enable and delete some targets.
+create_and_delete_random_target() {
+ COUNT=1
+ RND_PREFIX=$(mktemp -u netcons_rnd_XXXX_)
+
+ if [ -d "${NETCONS_CONFIGFS}/${RND_PREFIX}${COUNT}" ] || \
+ [ -d "${NETCONS_CONFIGFS}/${RND_PREFIX}0" ]; then
+ echo "Function didn't finish yet, skipping it." >&2
+ return
+ fi
+
+ # enable COUNT targets
+ for i in $(seq 0 ${COUNT})
+ do
+ RND_TARGET="${RND_PREFIX}"${i}
+ RND_TARGET_PATH="${NETCONS_CONFIGFS}"/"${RND_TARGET}"
+
+ # Basic population so the target can come up
+ mkdir "${RND_TARGET_PATH}"
+ echo "${DSTIP}" > "${RND_TARGET_PATH}"/remote_ip
+ echo "${SRCIP}" > "${RND_TARGET_PATH}"/local_ip
+ echo "${DSTMAC}" > "${RND_TARGET_PATH}"/remote_mac
+ echo "${SRCIF}" > "${RND_TARGET_PATH}"/dev_name
+
+ echo 1 > "${RND_TARGET_PATH}"/enabled
+ done
+
+ echo "netconsole selftest: ${COUNT} additional target was created" > /dev/kmsg
+ # disable them all
+ for i in $(seq 0 ${COUNT})
+ do
+ RND_TARGET="${RND_PREFIX}"${i}
+ RND_TARGET_PATH="${NETCONS_CONFIGFS}"/"${RND_TARGET}"
+ echo 0 > "${RND_TARGET_PATH}"/enabled
+ rmdir "${RND_TARGET_PATH}"
+ done
+}
+
+# Disable and enable the target mid-air, while messages
+# are being transmitted.
+toggle_netcons_target() {
+ for i in $(seq 2)
+ do
+ if [ ! -d "${NETCONS_PATH}" ]
+ then
+ break
+ fi
+ echo 0 > "${NETCONS_PATH}"/enabled 2> /dev/null || true
+ # Try to enable a bit harder, given it might fail to enable
+ # Write to `enabled` might fail depending on the lock, which is
+ # highly contentious here
+ for _ in $(seq 5)
+ do
+ echo 1 > "${NETCONS_PATH}"/enabled 2> /dev/null || true
+ done
+ done
+}
+
+toggle_iface(){
+ ip link set "${SRCIF}" down
+ ip link set "${SRCIF}" up
+}
+
+# Start here
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# Remove the namespace, interfaces and netconsole target on exit
+trap cleanup EXIT
+# Create one namespace and two interfaces
+set_network "${IP_VERSION}"
+# Create a dynamic target for netconsole
+create_dynamic_target "${FORMAT}"
+
+for i in $(seq "$ITERATIONS")
+do
+ for _ in $(seq 10)
+ do
+ echo "${MSG}: ${TARGET} ${i}" > /dev/kmsg
+ wait
+ done
+
+ if (( i % 30 == 0 )); then
+ toggle_netcons_target &
+ fi
+
+ if (( i % 50 == 0 )); then
+ # create some targets, enable them, send msg and disable
+ # all in a parallel thread
+ create_and_delete_random_target &
+ fi
+
+ if (( i % 70 == 0 )); then
+ toggle_iface &
+ fi
+done
+wait
+
+exit "${ksft_pass}"
---
base-commit: 2fd4161d0d2547650d9559d57fc67b4e0a26a9e3
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
This is v9 of the TDX selftests.
Thanks everyone for the thorough review on v8 [1]. I tried addressing
all the comments. I'm terribly sorry if I missed something.
The original v8 series [1] was split to make reviewing the test framework
changes easier. This series includes the original patches up to the TDX
lifecycle test which is the first TDX selftest in the series.
This series is based on v6.17-rc2
Changes from v8:
- Rebased on top of v6.17-rc2
- Drop several patches which are no longer needed now that TDX support
is integrated into the common flow.
- Split several patches to make reviewing easier.
- Massive refactor compared to v8 to pull TDX special handling into
__vm_create() and vm_vcpu_add() instead of creating separate functions
for TDX.
- Use kbuild to expose values from c to assembly code.
- Move setup of the reset vectors to c code as suggested by Sean.
- Drop redundant cpuid masking functions which are no longer necessary.
- Initialize TDX protected pages one at a time instead of allocating
large chinks of memory.
- Add UCALL support for TDX to align with the rest of the selftests.
- Minor fixes to kselftest_harness.h and virt_map() that were identified
as part of this work.
[1] https://lore.kernel.org/lkml/20250807201628.1185915-1-sagis@google.com/
Ackerley Tng (2):
KVM: selftests: Add helpers to init TDX memory and finalize VM
KVM: selftests: Add ucall support for TDX
Erdem Aktas (2):
KVM: selftests: Add TDX boot code
KVM: selftests: Add support for TDX TDCALL from guest
Isaku Yamahata (2):
KVM: selftests: Update kvm_init_vm_address_properties() for TDX
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
Sagi Shahar (13):
KVM: selftests: Include overflow.h instead of redefining
is_signed_type()
KVM: selftests: Allocate pgd in virt_map() as necessary
KVM: selftests: Expose functions to get default sregs values
KVM: selftests: Expose function to allocate guest vCPU stack
KVM: selftests: Expose segment definitons to assembly files
KVM: selftests: Add kbuild definitons
KVM: selftests: Define structs to pass parameters to TDX boot code
KVM: selftests: Set up TDX boot code region
KVM: selftests: Set up TDX boot parameters region
KVM: selftests: Add helper to initialize TDX VM
KVM: selftests: Hook TDX support to vm and vcpu creation
KVM: selftests: Add wrapper for TDX MMIO from guest
KVM: selftests: Add TDX lifecycle test
tools/include/linux/kbuild.h | 18 +
tools/testing/selftests/kselftest_harness.h | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 32 ++
.../selftests/kvm/include/x86/processor.h | 8 +
.../selftests/kvm/include/x86/processor_asm.h | 12 +
.../selftests/kvm/include/x86/tdx/td_boot.h | 81 ++++
.../kvm/include/x86/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++
.../selftests/kvm/include/x86/tdx/tdx.h | 14 +
.../selftests/kvm/include/x86/tdx/tdx_util.h | 86 ++++
.../testing/selftests/kvm/include/x86/ucall.h | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 25 +-
.../testing/selftests/kvm/lib/x86/processor.c | 122 ++++--
.../selftests/kvm/lib/x86/tdx/td_boot.S | 60 +++
.../kvm/lib/x86/tdx/td_boot_offsets.c | 21 +
.../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++
.../kvm/lib/x86/tdx/tdcall_offsets.c | 16 +
tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 22 +
.../selftests/kvm/lib/x86/tdx/tdx_util.c | 391 ++++++++++++++++++
tools/testing/selftests/kvm/lib/x86/ucall.c | 45 +-
tools/testing/selftests/kvm/x86/tdx_vm_test.c | 31 ++
21 files changed, 1095 insertions(+), 39 deletions(-)
create mode 100644 tools/include/linux/kbuild.h
create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c
--
2.51.0.rc1.193.gad69d77794-goog
There are currently no kernel tests that verify setting and getting
options of the team driver.
In the future, options may be added that implicitly change other
options, which will make it useful to have tests like these that show
nothing breaks. There will be a follow up patch to this that adds new
"rx_enabled" and "tx_enabled" options, which will implicitly affect the
"enabled" option value and vice versa.
The tests use teamnl to first set options to specific values and then
gets them to compare to the set values.
Signed-off-by: Marc Harvey <marcharvey(a)google.com>
---
.../selftests/drivers/net/team/Makefile | 6 +-
.../selftests/drivers/net/team/options.sh | 194 ++++++++++++++++++
2 files changed, 198 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/team/options.sh
diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile
index eaf6938f100e..8b00b70ce67f 100644
--- a/tools/testing/selftests/drivers/net/team/Makefile
+++ b/tools/testing/selftests/drivers/net/team/Makefile
@@ -1,11 +1,13 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for net selftests
-TEST_PROGS := dev_addr_lists.sh propagation.sh
+TEST_PROGS := dev_addr_lists.sh propagation.sh options.sh
TEST_INCLUDES := \
../bonding/lag_lib.sh \
../../../net/forwarding/lib.sh \
- ../../../net/lib.sh
+ ../../../net/lib.sh \
+ ../../../net/in_netns.sh \
+ ../../../net/lib/sh/defer.sh \
include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/team/options.sh b/tools/testing/selftests/drivers/net/team/options.sh
new file mode 100755
index 000000000000..b9c7aa357ad5
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/team/options.sh
@@ -0,0 +1,194 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# These tests verify basic set and get functionality of the team
+# driver options over netlink.
+
+# Run in private netns.
+test_dir="$(dirname "$0")"
+if [[ $# -eq 0 ]]; then
+ "${test_dir}"/../../../net/in_netns.sh "$0" __subprocess
+ exit $?
+fi
+
+ALL_TESTS="
+ team_test_options
+"
+
+source "${test_dir}/../../../net/lib.sh"
+
+TEAM_PORT="team0"
+MEMBER_PORT="dummy0"
+
+setup()
+{
+ ip link add name "${MEMBER_PORT}" type dummy
+ ip link add name "${TEAM_PORT}" type team
+}
+
+get_and_check_value()
+{
+ local option_name="$1"
+ local expected_value="$2"
+ local port_flag="$3"
+
+ local value_from_get
+
+ value_from_get=$(teamnl "${TEAM_PORT}" getoption "${option_name}" \
+ "${port_flag}")
+ if [[ $? != 0 ]]; then
+ echo "Could not get option '${option_name}'" >&2
+ return 1
+ fi
+
+ if [[ "${value_from_get}" != "${expected_value}" ]]; then
+ echo "Incorrect value for option '${option_name}'" >&2
+ echo "get (${value_from_get}) != set (${expected_value})" >&2
+ return 1
+ fi
+}
+
+set_and_check_get()
+{
+ local option_name="$1"
+ local option_value="$2"
+ local port_flag="$3"
+
+ local value_from_get
+
+ teamnl "${TEAM_PORT}" setoption "${option_name}" "${option_value}" \
+ "${port_flag}"
+ if [[ $? != 0 ]]; then
+ echo "'setoption ${option_name} ${option_value}' failed" >&2
+ return 1
+ fi
+
+ get_and_check_value "${option_name}" "${option_value}" "${port_flag}"
+ return $?
+}
+
+# Get a "port flag" to pass to the `teamnl` command.
+# E.g. $?="dummy0" -> "port=dummy0",
+# $?="" -> ""
+get_port_flag()
+{
+ local port_name="$1"
+
+ if [[ -n "${port_name}" ]]; then
+ echo "--port=${port_name}"
+ fi
+}
+
+attach_port_if_specified()
+{
+ local port_name="${1}"
+
+ if [[ -n "${port_name}" ]]; then
+ ip link set dev "${port_name}" master "${TEAM_PORT}"
+ return $?
+ fi
+}
+
+detach_port_if_specified()
+{
+ local port_name="${1}"
+
+ if [[ -n "${port_name}" ]]; then
+ ip link set dev "${port_name}" nomaster
+ return $?
+ fi
+}
+
+#######################################
+# Test that an option's get value matches its set value.
+# Globals:
+# RET - Used by testing infra like `check_err`.
+# EXIT_STATUS - Used by `log_test` to whole script exit value.
+# Arguments:
+# option_name - The name of the option.
+# value_1 - The first value to try setting.
+# value_2 - The second value to try setting.
+# port_name - The (optional) name of the attached port.
+#######################################
+team_test_option()
+{
+ local option_name="$1"
+ local value_1="$2"
+ local value_2="$3"
+ local possible_values="$2 $3 $2"
+ local port_name="$4"
+ local port_flag
+
+ RET=0
+
+ echo "Setting '${option_name}' to '${value_1}' and '${value_2}'"
+
+ attach_port_if_specified "${port_name}"
+ check_err $? "Couldn't attach ${port_name} to master"
+ port_flag=$(get_port_flag "${port_name}")
+
+ # Set and get both possible values.
+ for value in ${possible_values}; do
+ set_and_check_get "${option_name}" "${value}" "${port_flag}"
+ check_err $? "Failed to set '${option_name}' to '${value}'"
+ done
+
+ detach_port_if_specified "${port_name}"
+ check_err $? "Couldn't detach ${port_name} from its master"
+
+ log_test "Set + Get '${option_name}' test"
+}
+
+#######################################
+# Test that getting a non-existant option fails.
+# Globals:
+# RET - Used by testing infra like `check_err`.
+# EXIT_STATUS - Used by `log_test` to whole script exit value.
+# Arguments:
+# option_name - The name of the option.
+# port_name - The (optional) name of the attached port.
+#######################################
+team_test_get_option_fails()
+{
+ local option_name="$1"
+ local port_name="$2"
+ local port_flag
+
+ RET=0
+
+ attach_port_if_specified "${port_name}"
+ check_err $? "Couldn't attach ${port_name} to master"
+ port_flag=$(get_port_flag "${port_name}")
+
+ # Just confirm that getting the value fails.
+ teamnl "${TEAM_PORT}" getoption "${option_name}" "${port_flag}"
+ check_fail $? "Shouldn't be able to get option '${option_name}'"
+
+ detach_port_if_specified "${port_name}"
+
+ log_test "Get '${option_name}' fails"
+}
+
+team_test_options()
+{
+ # Wrong option name behavior.
+ team_test_get_option_fails fake_option1
+ team_test_get_option_fails fake_option2 "${MEMBER_PORT}"
+
+ # Correct set and get behavior.
+ team_test_option mode activebackup loadbalance
+ team_test_option notify_peers_count 0 5
+ team_test_option notify_peers_interval 0 5
+ team_test_option mcast_rejoin_count 0 5
+ team_test_option mcast_rejoin_interval 0 5
+ team_test_option enabled true false "${MEMBER_PORT}"
+ team_test_option user_linkup true false "${MEMBER_PORT}"
+ team_test_option user_linkup_enabled true false "${MEMBER_PORT}"
+ team_test_option priority 10 20 "${MEMBER_PORT}"
+ team_test_option queue_id 0 1 "${MEMBER_PORT}"
+}
+
+require_command teamnl
+setup
+tests_run
+exit "${EXIT_STATUS}"
--
2.51.0.355.g5224444f11-goog
Add the benchmark testcase "kprobe-multi-all", which will hook all the
kernel functions during the testing.
This series is separated out from [1].
Changes since V2:
* add some comment to attach_ksyms_all, which notes that don't run the
testing on a debug kernel
Changes since V1:
* introduce trace_blacklist instead of copy-pasting strcmp in the 2nd
patch
* use fprintf() instead of printf() in 3rd patch
Link: https://lore.kernel.org/bpf/20250817024607.296117-1-dongml2@chinatelecom.cn/ [1]
Menglong Dong (3):
selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
selftests/bpf: skip recursive functions for kprobe_multi
selftests/bpf: add benchmark testing for kprobe-multi-all
tools/testing/selftests/bpf/bench.c | 4 +
.../selftests/bpf/benchs/bench_trigger.c | 61 +++++
.../selftests/bpf/benchs/run_bench_trigger.sh | 4 +-
.../bpf/prog_tests/kprobe_multi_test.c | 220 +---------------
.../selftests/bpf/progs/trigger_bench.c | 12 +
tools/testing/selftests/bpf/trace_helpers.c | 234 ++++++++++++++++++
tools/testing/selftests/bpf/trace_helpers.h | 3 +
7 files changed, 319 insertions(+), 219 deletions(-)
--
2.51.0
Currently, even if some subtests fails, the end result will still yield
"ok 1 selftests: bpf: test_xsk.sh". Fix it by exiting with 1 if there are
any failures.
Signed-off-by: Ricardo B. Marlière <rbm(a)suse.com>
---
tools/testing/selftests/bpf/test_xsk.sh | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh
index 65aafe0003db054e9dfd156092fed53b07be06a0..62db060298a4a3b4391ee4cfa50557cf4a62d3d5 100755
--- a/tools/testing/selftests/bpf/test_xsk.sh
+++ b/tools/testing/selftests/bpf/test_xsk.sh
@@ -241,4 +241,6 @@ done
if [ $failures -eq 0 ]; then
echo "All tests successful!"
+else
+ exit 1
fi
---
base-commit: 5b6d6fe1ca7b712c74f78426bb23c465fd34b322
change-id: 20250828-selftests-bpf-test_xsk_ret-1eb27dbac071
Best regards,
--
Ricardo B. Marlière <rbm(a)suse.com>
This series contains 4 independent new features:
- Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.
- Patch 2: selftests: check for unexpected fallback counter increments.
- Patches 3-4: record subflows in RPS table, for aRFS support.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Drop previous patches 2 ("mptcp: make ADD_ADDR retransmission timeout
adaptive") + 3 ("selftests: mptcp: remove add_addr_timeout settings"):
They were introducing instabilities in the selftests.
- Rebased. Other patches have not been modified.
- Link to v1: https://lore.kernel.org/r/20250901-net-next-mptcp-misc-feat-6-18-v1-0-80ae8…
---
Christoph Paasch (2):
net: Add rfs_needed() helper
mptcp: record subflows in RPS table
Eric Biggers (1):
mptcp: use HMAC-SHA256 library instead of open-coded HMAC
Gang Yan (1):
selftests: mptcp: add checks for fallback counters
include/net/rps.h | 85 ++++++++++------
net/mptcp/crypto.c | 35 +------
net/mptcp/protocol.c | 21 ++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 123 ++++++++++++++++++++++++
4 files changed, 202 insertions(+), 62 deletions(-)
---
base-commit: cd8a4cfa6bb43a441901e82f5c222dddc75a18a3
change-id: 20250829-net-next-mptcp-misc-feat-6-18-722fa87a60f1
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
One fix for occasional failures I found while testing and a bunch of
cleanups that should make that test easier to digest.
Tested on x86-64, the test seems to reliably pass.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Cc: Nico Pache <npache(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
--
Mostly a resend, because I accidentally disabled "ccover = true" in my
git config so people were only CCed on the cover letter.
v1 -> v2:
* "selftests/mm: split_huge_page_test: fix occasional is_backed_by_folio()
wrong results"
-> Fixup missing ")" in patch description
David Hildenbrand (2):
selftests/mm: split_huge_page_test: fix occasional
is_backed_by_folio() wrong results
selftests/mm: split_huge_page_test: cleanups for split_pte_mapped_thp
test
.../selftests/mm/split_huge_page_test.c | 138 ++++++++++--------
1 file changed, 81 insertions(+), 57 deletions(-)
base-commit: ef42a39c44ef6da64ae3495d27e28dd6fca62a51
--
2.50.1