IDT event delivery has a debug hole in which it does not generate #DB
upon returning to userspace before the first userspace instruction is
executed if the Trap Flag (TF) is set.
FRED closes this hole by introducing a software event flag, i.e., bit
17 of the augmented SS: if the bit is set and ERETU would result in
RFLAGS.TF = 1, a single-step trap will be pending upon completion of
ERETU.
However I overlooked properly setting and clearing the bit in different
situations. Thus when FRED is enabled, if the Trap Flag (TF) is set
without an external debugger attached, it can lead to an infinite loop
in the SIGTRAP handler. To avoid this, the software event flag in the
augmented SS must be cleared, ensuring that no single-step trap remains
pending when ERETU completes.
This patch set combines the fix [1] and its corresponding selftest [2]
(requested by Dave Hansen) into one patch set.
[1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/
[2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/
This patch set is based on tip/x86/urgent branch as of today.
Xin Li (Intel) (2):
x86/fred/signal: Prevent single-step upon ERETU completion
selftests/x86: Add a test to detect infinite sigtrap handler loop
arch/x86/include/asm/sighandling.h | 22 +++++
arch/x86/kernel/signal_32.c | 4 +
arch/x86/kernel/signal_64.c | 4 +
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/sigtrap_loop.c | 97 ++++++++++++++++++++++
5 files changed, 128 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c
base-commit: dd2922dcfaa3296846265e113309e5f7f138839f
--
2.49.0
This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.
The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.
This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
contexts
Patch 1 fixes the validation logic to skip the existence check for
context 0.
Patch 2 adds a selftest that verifies this behavior.
Changelog -
v2->v3: https://lore.kernel.org/all/20250609120250.1630125-1-gal@nvidia.com/
* Reworded commit message.
* Fix pylint warning.
v1->v2: https://lore.kernel.org/all/20250225071348.509432-1-gal@nvidia.com/
* Reworded commit message.
* Added a selftest.
Gal Pressman (2):
net: ethtool: Don't check if RSS context exists in case of context 0
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting
default RSS context
net/ethtool/ioctl.c | 3 +-
.../selftests/drivers/net/hw/rss_ctx.py | 59 ++++++++++++++++++-
2 files changed, 60 insertions(+), 2 deletions(-)
--
2.40.1
The SBI Firmware Feature extension allows the S-mode to request some
specific features (either hardware or software) to be enabled. This
series uses this extension to request misaligned access exception
delegation to S-mode in order to let the kernel handle it. It also adds
support for the KVM FWFT SBI extension based on the misaligned access
handling infrastructure.
FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be
tested using the qemu provided at [2] which contains the series from
[3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct
delegation of misaligned exceptions. Upstream OpenSBI can be used.
Note: Since SBI V3.0 is not yet ratified, FWFT extension API is split
between interface only and implementation, allowing to pick only the
interface which do not have hard dependencies on SBI.
The tests can be run using the kselftest from series [4].
$ qemu-system-riscv64 \
-cpu rv64,trap-misaligned-access=true,v=true \
-M virt \
-m 1024M \
-bios fw_dynamic.bin \
-kernel Image
...
# ./misaligned
TAP version 13
1..23
# Starting 23 tests from 1 test cases.
# RUN global.gp_load_lh ...
# OK global.gp_load_lh
ok 1 global.gp_load_lh
# RUN global.gp_load_lhu ...
# OK global.gp_load_lhu
ok 2 global.gp_load_lhu
# RUN global.gp_load_lw ...
# OK global.gp_load_lw
ok 3 global.gp_load_lw
# RUN global.gp_load_lwu ...
# OK global.gp_load_lwu
ok 4 global.gp_load_lwu
# RUN global.gp_load_ld ...
# OK global.gp_load_ld
ok 5 global.gp_load_ld
# RUN global.gp_load_c_lw ...
# OK global.gp_load_c_lw
ok 6 global.gp_load_c_lw
# RUN global.gp_load_c_ld ...
# OK global.gp_load_c_ld
ok 7 global.gp_load_c_ld
# RUN global.gp_load_c_ldsp ...
# OK global.gp_load_c_ldsp
ok 8 global.gp_load_c_ldsp
# RUN global.gp_load_sh ...
# OK global.gp_load_sh
ok 9 global.gp_load_sh
# RUN global.gp_load_sw ...
# OK global.gp_load_sw
ok 10 global.gp_load_sw
# RUN global.gp_load_sd ...
# OK global.gp_load_sd
ok 11 global.gp_load_sd
# RUN global.gp_load_c_sw ...
# OK global.gp_load_c_sw
ok 12 global.gp_load_c_sw
# RUN global.gp_load_c_sd ...
# OK global.gp_load_c_sd
ok 13 global.gp_load_c_sd
# RUN global.gp_load_c_sdsp ...
# OK global.gp_load_c_sdsp
ok 14 global.gp_load_c_sdsp
# RUN global.fpu_load_flw ...
# OK global.fpu_load_flw
ok 15 global.fpu_load_flw
# RUN global.fpu_load_fld ...
# OK global.fpu_load_fld
ok 16 global.fpu_load_fld
# RUN global.fpu_load_c_fld ...
# OK global.fpu_load_c_fld
ok 17 global.fpu_load_c_fld
# RUN global.fpu_load_c_fldsp ...
# OK global.fpu_load_c_fldsp
ok 18 global.fpu_load_c_fldsp
# RUN global.fpu_store_fsw ...
# OK global.fpu_store_fsw
ok 19 global.fpu_store_fsw
# RUN global.fpu_store_fsd ...
# OK global.fpu_store_fsd
ok 20 global.fpu_store_fsd
# RUN global.fpu_store_c_fsd ...
# OK global.fpu_store_c_fsd
ok 21 global.fpu_store_c_fsd
# RUN global.fpu_store_c_fsdsp ...
# OK global.fpu_store_c_fsdsp
ok 22 global.fpu_store_c_fsdsp
# RUN global.gen_sigbus ...
[12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000]
[12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51
[12797.989169] Hardware name: riscv-virtio,qemu (DT)
[12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100
[12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008
[12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160
[12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002
[12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef
[12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000
[12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848
[12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200
[12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000
[12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0
[12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073
[12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006
[12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001
# OK global.gen_sigbus
ok 23 global.gen_sigbus
# PASSED: 23 / 23 tests passed.
# Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0
With kvm-tools:
# lkvm run -k sbi.flat -m 128
Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97
Info: Removed ghost socket file "/root/.lkvm//guest-97.sock".
##########################################################################
# kvm-unit-tests
##########################################################################
... [test messages elided]
PASS: sbi: fwft: FWFT extension probing no error
PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1
PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor
SUMMARY: 50 tests, 2 unexpected failures, 12 skipped
This series is available at [5].
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/… [1]
Link: https://github.com/rivosinc/qemu/tree/dev/cleger/misaligned [2]
Link: https://lore.kernel.org/all/20241211211933.198792-3-fkonrad@amd.com/T/ [3]
Link: https://lore.kernel.org/linux-riscv/20250414123543.1615478-1-cleger@rivosin… [4]
Link: https://github.com/rivosinc/linux/tree/dev/cleger/fwft [5]
---
V8:
- Move misaligned_access_speed under CONFIG_RISCV_MISALIGNED and add a
separate commit for that.
V7:
- Fix ifdefery build problems
- Move sbi_fwft_is_supported with fwft_set_req struct
- Added Atish Reviewed-by
- Updated KVM vcpu cfg hedeleg value in set_delegation
- Changed SBI ETIME error mapping to ETIMEDOUT
- Fixed a few typo reported by Alok
V6:
- Rename FWFT interface to remove "_local"
- Fix test for MEDELEG values in KVM FWFT support
- Add __init for unaligned_access_init()
- Rebased on master
V5:
- Return ERANGE as mapping for SBI_ERR_BAD_RANGE
- Removed unused sbi_fwft_get()
- Fix kernel for sbi_fwft_local_set_cpumask()
- Fix indentation for sbi_fwft_local_set()
- Remove spurious space in kvm_sbi_fwft_ops.
- Rebased on origin/master
- Remove fixes commits and sent them as a separate series [4]
V4:
- Check SBI version 3.0 instead of 2.0 for FWFT presence
- Use long for kvm_sbi_fwft operation return value
- Init KVM sbi extension even if default_disabled
- Remove revert_on_fail parameter for sbi_fwft_feature_set().
- Fix comments for sbi_fwft_set/get()
- Only handle local features (there are no globals yet in the spec)
- Add new SBI errors to sbi_err_map_linux_errno()
V3:
- Added comment about kvm sbi fwft supported/set/get callback
requirements
- Move struct kvm_sbi_fwft_feature in kvm_sbi_fwft.c
- Add a FWFT interface
V2:
- Added Kselftest for misaligned testing
- Added get_user() usage instead of __get_user()
- Reenable interrupt when possible in misaligned access handling
- Document that riscv supports unaligned-traps
- Fix KVM extension state when an init function is present
- Rework SBI misaligned accesses trap delegation code
- Added support for CPU hotplugging
- Added KVM SBI reset callback
- Added reset for KVM SBI FWFT lock
- Return SBI_ERR_DENIED_LOCKED when LOCK flag is set
Clément Léger (14):
riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions
riscv: sbi: remove useless parenthesis
riscv: sbi: add new SBI error mappings
riscv: sbi: add FWFT extension interface
riscv: sbi: add SBI FWFT extension calls
riscv: misaligned: request misaligned exception from SBI
riscv: misaligned: use on_each_cpu() for scalar misaligned access
probing
riscv: misaligned: declare misaligned_access_speed under
CONFIG_RISCV_MISALIGNED
riscv: misaligned: move emulated access uniformity check in a function
riscv: misaligned: add a function to check misalign trap delegability
RISC-V: KVM: add SBI extension init()/deinit() functions
RISC-V: KVM: add SBI extension reset callback
RISC-V: KVM: add support for FWFT SBI extension
RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEG
arch/riscv/include/asm/cpufeature.h | 14 +-
arch/riscv/include/asm/kvm_host.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_sbi.h | 12 +
arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 29 +++
arch/riscv/include/asm/sbi.h | 60 +++++
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/sbi.c | 81 ++++++-
arch/riscv/kernel/traps_misaligned.c | 112 ++++++++-
arch/riscv/kernel/unaligned_access_speed.c | 8 +-
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu.c | 4 +-
arch/riscv/kvm/vcpu_sbi.c | 54 +++++
arch/riscv/kvm/vcpu_sbi_fwft.c | 257 +++++++++++++++++++++
arch/riscv/kvm/vcpu_sbi_sta.c | 3 +-
14 files changed, 620 insertions(+), 21 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h
create mode 100644 arch/riscv/kvm/vcpu_sbi_fwft.c
--
2.49.0
This patch series introduces a new feature to netconsole which allows
appending a message ID to the userdata dictionary.
If the msgid feature is enabled, the message ID is built from a per-target 32
bit counter that is incremented and appended to every message sent to the target.
Example::
echo 1 > "/sys/kernel/config/netconsole/cmdline0/userdata/msgid_enabled"
echo "This is message #1" > /dev/kmsg
echo "This is message #2" > /dev/kmsg
13,434,54928466,-;This is message #1
msgid=1
13,435,54934019,-;This is message #2
msgid=2
This feature can be used by the target to detect if messages were dropped or
reordered before reaching the target. This allows system administrators to
assess the reliability of their netconsole pipeline and detect loss of messages
due to network contention or temporary unavailability.
Suggested-by: Breno Leitao <leitao(a)debian.org>
Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com>
Note to maintainer:
This will conflict with a fix I sent recently to net:
c85bf1975108 netconsole: fix appending sysdata when sysdata_fields ==
SYSDATA_RELEASE
Please let me know if I should rebase at some point and send a v2.
---
Gustavo Luiz Duarte (5):
netconsole: introduce 'msgid' as a new sysdata field
netconsole: implement configfs for msgid_enabled
netconsole: append msgid to sysdata
selftests: netconsole: Add tests for 'msgid' feature in sysdata
docs: netconsole: document msgid feature
Documentation/networking/netconsole.rst | 22 +++++++
drivers/net/netconsole.c | 67 +++++++++++++++++++++-
.../selftests/drivers/net/netcons_sysdata.sh | 30 ++++++++++
3 files changed, 118 insertions(+), 1 deletion(-)
---
base-commit: 0097c4195b1d0ca57d15979626c769c74747b5a0
change-id: 20250609-netconsole-msgid-b93c6f8e9c60
Best regards,
--
Gustavo Luiz Duarte <gustavold(a)gmail.com>
Tests may wish to add other interfaces to listen on. Notably locally
generated traffic uses dummy interfaces. The multicast daemon needs to know
about these so that it allows forming rules that involve these interfaces,
and so that net.ipv4.conf.X.mc_forwarding is set for the interfaces.
To that end, allow passing in a list of interfaces to configure in addition
to all the physical ones.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
tools/testing/selftests/net/forwarding/lib.sh | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 88e63562f5c5..5f144d75167a 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1760,6 +1760,8 @@ mc_send()
adf_mcd_start()
{
+ local ifs=("$@")
+ local if
local i
check_command $MCD || return 1
@@ -1775,6 +1777,16 @@ adf_mcd_start()
$smcroutedir/$table_name.conf
done
+ for if in ${ifs[@]}; do
+ if ! ip_link_has_flag "$if" MULTICAST; then
+ ip link set dev "$if" multicast on
+ defer ip link set dev "$if" multicast off
+ fi
+
+ echo "phyint $if enable" >> \
+ $smcroutedir/$table_name.conf
+ done
+
$MCD -N -I $table_name -f $smcroutedir/$table_name.conf \
-P $smcroutedir/$table_name.pid
busywait "$BUSYWAIT_TIMEOUT" test -e $smcroutedir/$table_name.pid
--
2.49.0
If CONFIG_UPROBES is not set, a merge subtest fails:
Failure log:
7151 12:46:54.627936 # # # RUN merge.handle_uprobe_upon_merged_vma ...
7152 12:46:54.639014 # # f /sys/bus/event_source/devices/uprobe/type
7153 12:46:54.639306 # # fopen: No such file or directory
7154 12:46:54.650451 # # # merge.c:473:handle_uprobe_upon_merged_vma:Expected read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) (1) == 0 (0)
7155 12:46:54.650730 # # # handle_uprobe_upon_merged_vma: Test terminated by assertion
7156 12:46:54.661750 # # # FAIL merge.handle_uprobe_upon_merged_vma
7157 12:46:54.662030 # # not ok 8 merge.handle_uprobe_upon_merged_vma
CONFIG_UPROBES is enabled by CONFIG_UPROBE_EVENTS, which gets enabled by
CONFIG_FTRACE. Therefore add this config to selftests/mm/config so that
CI systems can include this config in the kernel build.
Fixes: efe99fabeb11b ("selftests/mm: add test about uprobe pte be orphan during vma merge")
Reported-by: Aishwarya <aishwarya.tcv(a)arm.com>
Closes: https://lore.kernel.org/all/20250610103729.72440-1-aishwarya.tcv@arm.com/
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/config b/tools/testing/selftests/mm/config
index a28baa536332..e600b41030c1 100644
--- a/tools/testing/selftests/mm/config
+++ b/tools/testing/selftests/mm/config
@@ -8,3 +8,4 @@ CONFIG_GUP_TEST=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_MEM_SOFT_DIRTY=y
CONFIG_ANON_VMA_NAME=y
+CONFIG_FTRACE=y
--
2.30.2
This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and
removing a dummy interface and then confirming that the system
correctly receives join and removal notifications for the 224.0.0.1
and ff02::1 multicast addresses.
The test relies on the iproute2 version to be 6.13+.
Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \
TEST_GEN_PROGS="" run_tests
Cc: Maciej Żenczykowski <maze(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com>
---
Changelog since v1:
- Skip the test if the iproute2 is too old.
tools/testing/selftests/net/rtnetlink.sh | 39 ++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 2e8243a65b50..74d4afb55d7c 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -21,6 +21,7 @@ ALL_TESTS="
kci_test_vrf
kci_test_encap
kci_test_macsec
+ kci_test_mcast_addr_notification
kci_test_ipsec
kci_test_ipsec_offload
kci_test_fdb_get
@@ -1334,6 +1335,44 @@ kci_test_mngtmpaddr()
return $ret
}
+kci_test_mcast_addr_notification()
+{
+ local tmpfile
+ local monitor_pid
+ local match_result
+
+ tmpfile=$(mktemp)
+
+ ip monitor maddr > $tmpfile &
+ monitor_pid=$!
+ sleep 1
+ if [ ! -e "/proc/$monitor_pid" ]; then
+ end_test "SKIP: mcast addr notification: iproute2 too old"
+ rm $tmpfile
+ return $ksft_skip
+ fi
+
+ run_cmd ip link add name test-dummy1 type dummy
+ run_cmd ip link set test-dummy1 up
+ run_cmd ip link del dev test-dummy1
+ sleep 1
+
+ match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile)
+
+ kill $monitor_pid
+ rm $tmpfile
+ # There should be 4 line matches as follows.
+ # 13: test-dummy1 inet6 mcast ff02::1 scope global
+ # 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global
+ if [ $match_result -ne 4 ];then
+ end_test "FAIL: mcast addr notification"
+ return 1
+ fi
+ end_test "PASS: mcast addr notification"
+}
+
kci_test_rtnl()
{
local current_test
--
2.49.0.1204.g71687c7c1d-goog
A not-so-careful NAT46 BPF program can crash the kernel
if it indiscriminately flips ingress packets from v4 to v6:
BUG: kernel NULL pointer dereference, address: 0000000000000000
ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
ipv6_rcv (net/ipv6/ip6_input.c:306:8)
process_backlog (net/core/dev.c:6186:4)
napi_poll (net/core/dev.c:6906:9)
net_rx_action (net/core/dev.c:7028:13)
do_softirq (kernel/softirq.c:462:3)
netif_rx (net/core/dev.c:5326:3)
dev_loopback_xmit (net/core/dev.c:4015:2)
ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
NF_HOOK (./include/linux/netfilter.h:314:9)
ip_mc_output (net/ipv4/ip_output.c:400:5)
dst_output (./include/net/dst.h:459:9)
ip_local_out (net/ipv4/ip_output.c:130:9)
ip_send_skb (net/ipv4/ip_output.c:1496:8)
udp_send_skb (net/ipv4/udp.c:1040:8)
udp_sendmsg (net/ipv4/udp.c:1328:10)
The output interface has a 4->6 program attached at ingress.
We try to loop the multicast skb back to the sending socket.
Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
and changes skb->protocol to v6. We enter ip6_rcv_core which
tries to use skb_dst(). But the dst is still an IPv4 one left
after IPv4 mcast output.
Clear the dst in all BPF helpers which change the protocol.
Try to preserve metadata dsts, those may carry non-routing
metadata.
Cc: stable(a)vger.kernel.org
Reviewed-by: Maciej Żenczykowski <maze(a)google.com>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
v3:
- go back to v1, the encap / decap which don't change proto
will be added in -next
- split out the test
v2: https://lore.kernel.org/20250607204734.1588964-1-kuba@kernel.org
- drop on encap/decap
- fix typo (protcol)
- add the test to the Makefile
v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org
I wonder if we should not skip ingress (tc_skip_classify?)
for looped back packets in the first place. But that doesn't
seem robust enough vs multiple redirections to solve the crash.
Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
work either, since BPF can change pkt_type arbitrarily.
CC: martin.lau(a)linux.dev
CC: daniel(a)iogearbox.net
CC: john.fastabend(a)gmail.com
CC: eddyz87(a)gmail.com
CC: sdf(a)fomichev.me
CC: haoluo(a)google.com
CC: willemb(a)google.com
CC: william.xuanziyang(a)huawei.com
CC: alan.maguire(a)oracle.com
CC: bpf(a)vger.kernel.org
CC: edumazet(a)google.com
CC: maze(a)google.com
CC: shuah(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
CC: yonghong.song(a)linux.dev
---
net/core/filter.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 327ca73f9cd7..7a72f766aacf 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3233,6 +3233,13 @@ static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
.arg1_type = ARG_PTR_TO_CTX,
};
+static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
+{
+ skb->protocol = htons(proto);
+ if (skb_valid_dst(skb))
+ skb_dst_drop(skb);
+}
+
static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
{
/* Caller already did skb_cow() with len as headroom,
@@ -3329,7 +3336,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
}
}
- skb->protocol = htons(ETH_P_IPV6);
+ bpf_skb_change_protocol(skb, ETH_P_IPV6);
skb_clear_hash(skb);
return 0;
@@ -3359,7 +3366,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
}
}
- skb->protocol = htons(ETH_P_IP);
+ bpf_skb_change_protocol(skb, ETH_P_IP);
skb_clear_hash(skb);
return 0;
@@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
/* Match skb->protocol to new outer l3 protocol */
if (skb->protocol == htons(ETH_P_IP) &&
flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
- skb->protocol = htons(ETH_P_IPV6);
+ bpf_skb_change_protocol(skb, ETH_P_IPV6);
else if (skb->protocol == htons(ETH_P_IPV6) &&
flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
- skb->protocol = htons(ETH_P_IP);
+ bpf_skb_change_protocol(skb, ETH_P_IP);
}
if (skb_is_gso(skb)) {
@@ -3606,10 +3613,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
/* Match skb->protocol to new outer l3 protocol */
if (skb->protocol == htons(ETH_P_IP) &&
flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
- skb->protocol = htons(ETH_P_IPV6);
+ bpf_skb_change_protocol(skb, ETH_P_IPV6);
else if (skb->protocol == htons(ETH_P_IPV6) &&
flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
- skb->protocol = htons(ETH_P_IP);
+ bpf_skb_change_protocol(skb, ETH_P_IP);
if (skb_is_gso(skb)) {
struct skb_shared_info *shinfo = skb_shinfo(skb);
--
2.49.0
This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and
removing a dummy interface and then confirming that the system
correctly receives join and removal notifications for the 224.0.0.1
and ff02::1 multicast addresses.
The test relies on the iproute2 version to be 6.13+.
Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \
TEST_GEN_PROGS="" run_tests
Cc: Maciej Żenczykowski <maze(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com>
---
Changelog since v1:
- Skip the test if the iproute2 is too old.
tools/testing/selftests/net/rtnetlink.sh | 39 ++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 2e8243a65b50..74d4afb55d7c 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -21,6 +21,7 @@ ALL_TESTS="
kci_test_vrf
kci_test_encap
kci_test_macsec
+ kci_test_mcast_addr_notification
kci_test_ipsec
kci_test_ipsec_offload
kci_test_fdb_get
@@ -1334,6 +1335,44 @@ kci_test_mngtmpaddr()
return $ret
}
+kci_test_mcast_addr_notification()
+{
+ local tmpfile
+ local monitor_pid
+ local match_result
+
+ tmpfile=$(mktemp)
+
+ ip monitor maddr > $tmpfile &
+ monitor_pid=$!
+ sleep 1
+ if [ ! -e "/proc/$monitor_pid" ]; then
+ end_test "SKIP: mcast addr notification: iproute2 too old"
+ rm $tmpfile
+ return $ksft_skip
+ fi
+
+ run_cmd ip link add name test-dummy1 type dummy
+ run_cmd ip link set test-dummy1 up
+ run_cmd ip link del dev test-dummy1
+ sleep 1
+
+ match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile)
+
+ kill $monitor_pid
+ rm $tmpfile
+ # There should be 4 line matches as follows.
+ # 13: test-dummy1 inet6 mcast ff02::1 scope global
+ # 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global
+ if [ $match_result -ne 4 ];then
+ end_test "FAIL: mcast addr notification"
+ return 1
+ fi
+ end_test "PASS: mcast addr notification"
+}
+
kci_test_rtnl()
{
local current_test
--
2.49.0.1204.g71687c7c1d-goog
Hello,
This is RFC v2 for the TDX intra-host migration patch series. It
addresses comments in RFC v1 [1] and is rebased onto the latest kvm/next
(v6.16-rc1).
This patchset was built on top of the latest TDX selftests [2] and gmem
linking [3] RFC patch series.
Here is the series stitched together for your convenience:
https://github.com/googleprodkernel/linux-cc/tree/tdx-copyless-rfc-v2
Changes from RFC v1:
+ Added patch to prevent deadlock warnings by re-ordering locking order.
+ Added patch to allow vCPUs to be created for uninitialized VMs.
+ Minor optimizations to TDX intra-host migration core logic.
+ Moved lapic state transfer into TDX intra-host migration core logic.
+ Added logic to handle posted interrupts that are injected during
migration.
+ Added selftests.
+ Addressed comments from RFC v1.
+ Various small changes to make patchset compatible with latest version
of kvm/next.
[1] https://lore.kernel.org/lkml/20230407201921.2703758-2-sagis@google.com
[2] https://lore.kernel.org/lkml/20250414214801.2693294-2-sagis@google.com
[3] https://lore.kernel.org/all/cover.1747368092.git.afranji@google.com
Ackerley Tng (2):
KVM: selftests: Add TDX support for ucalls
KVM: selftests: Add irqfd/interrupts test for TDX with migration
Ryan Afranji (3):
KVM: x86: Adjust locking order in move_enc_context_from
KVM: TDX: Allow vCPUs to be created for migration
KVM: selftests: Refactor userspace_mem_region creation out of
vm_mem_add
Sagi Shahar (5):
KVM: Split tdp_mmu_pages to mirror and direct counters
KVM: TDX: Add base implementation for tdx_vm_move_enc_context_from
KVM: TDX: Implement moving mirror pages between 2 TDs
KVM: TDX: Add core logic for TDX intra-host migration
KVM: selftests: TDX: Add tests for TDX in-place migration
arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/kvm/mmu.h | 2 +
arch/x86/kvm/mmu/mmu.c | 66 ++++
arch/x86/kvm/mmu/tdp_mmu.c | 72 +++-
arch/x86/kvm/mmu/tdp_mmu.h | 6 +
arch/x86/kvm/svm/sev.c | 13 +-
arch/x86/kvm/vmx/main.c | 12 +-
arch/x86/kvm/vmx/tdx.c | 236 +++++++++++-
arch/x86/kvm/vmx/x86_ops.h | 1 +
arch/x86/kvm/x86.c | 14 +-
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../testing/selftests/kvm/include/kvm_util.h | 25 ++
.../selftests/kvm/include/x86/tdx/tdx_util.h | 3 +
.../selftests/kvm/include/x86/tdx/test_util.h | 5 +
.../testing/selftests/kvm/include/x86/ucall.h | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 222 ++++++++----
.../testing/selftests/kvm/lib/ucall_common.c | 2 +-
.../selftests/kvm/lib/x86/tdx/tdx_util.c | 63 +++-
.../selftests/kvm/lib/x86/tdx/test_util.c | 17 +
tools/testing/selftests/kvm/lib/x86/ucall.c | 108 ++++--
.../kvm/x86/tdx_irqfd_migrate_test.c | 264 ++++++++++++++
.../selftests/kvm/x86/tdx_migrate_tests.c | 337 ++++++++++++++++++
22 files changed, 1349 insertions(+), 132 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/tdx_irqfd_migrate_test.c
create mode 100644 tools/testing/selftests/kvm/x86/tdx_migrate_tests.c
--
2.50.0.rc1.591.g9c95f17f64-goog
> > Modify several functions in tools/bpf/bpftool/common.c to allow
> > specification of requested access for file descriptors, such as
> > read-only access.
> >
> > Update bpftool to request only read access for maps when write
> > access is not required. This fixes errors when reading from maps
> > that are protected from modification via security_bpf_map.
> >
> > Signed-off-by: Slava Imameev <slava.imameev(a)crowdstrike.com>
>
>
> Thanks for this!
>
> I think the topic of map access in bpftool has been discussed in the
> path, but I can't remember what we said or find it again - maybe I don't
> remember correctly. Looks good to me overall.
>
> One question: How thoroughly have you tested that write permissions are
> necessary for the different cases? I'm asking because I'm wondering
> whether we could restrict to read-only in a couple more cases, see
> below. (At the end of the day it doesn't matter too much, it's fine
> being conservative and conserving write permissions for now, we can
> always refine later; it's already an improvement to do read-only for the
> dump/list cases).
The goal of this patch was to fix bpftool errors we experienced on our systems.
The efforts were focused only on changes to the affected subset of map commands.
> > + /* Get an fd with the requested options. */
> > + close(fd);
> > + fd = bpf_map_get_fd_by_id_opts(id, opts);
> > + if (fd < 0) {
> > + p_err("can't get map by id (%u): %s", id,
> > + strerror(errno));
> > + goto err_close_fds;
> > + }
>
>
> We could maybe skip this step if the requested options are read-only, no
> need to close and re-open a fd in that case?
I agree. The change will be submitted with version 3.
> > -int map_parse_fds(int *argc, char ***argv, int **fds)
> > +int map_parse_fds(int *argc, char ***argv, int **fds, __u32 open_flags)
> > {
> > + LIBBPF_OPTS(bpf_get_fd_by_id_opts, opts);
> > +
> > + if (open_flags & ~BPF_F_RDONLY) {
> > + p_err("invalid open_flags: %x", open_flags);
> > + return -1;
> > + }
>
>
> I don't think we need this check, the flag is never passed by users. If
> you want to catch a bug, use an assert() instead?
I agree. This check is replaced with an assert and will be submitted with v3.
> > diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c
> > index 5c39c2ed36a2..ad318a8667a4 100644
> > --- a/tools/bpf/bpftool/iter.c
> > +++ b/tools/bpf/bpftool/iter.c
> > @@ -37,7 +37,7 @@ static int do_pin(int argc, char **argv)
> > return -1;
> > }
> >
> > - map_fd = map_parse_fd(&argc, &argv);
> > + map_fd = map_parse_fd(&argc, &argv, 0);
>
>
> Do you need write permissions here? (I don't remember.)
Iterator requires only read access. I changed it to BPF_F_RDONLY for v3.
An iterator test is added to v3.
> > - fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
> > + fd = map_parse_fd_and_info(&argc, &argv, &info, &len, BPF_F_RDONLY);
>
>
> This one is surprising, don't you need write permissions to delete an
> element from the map? Please double-check if you haven't already, I
> wouldn't want to break "bpftool map delete".
>
> I note you don't test items deletion in your tests, by the way.
Right, the delete command requires write access. I changed it and added
an item deletion test to v3.
> > static int do_pin(int argc, char **argv)
> > {
> > int err;
> >
> > - err = do_pin_any(argc, argv, map_parse_fd);
> > + err = do_pin_any(argc, argv, map_parse_read_only_fd);
> > if (!err && json_output)
> > jsonw_null(json_wtr);
> > return err;
> > @@ -1319,7 +1329,7 @@ static int do_create(int argc, char **argv)
> > if (!REQ_ARGS(2))
> > usage();
> > inner_map_fd = map_parse_fd_and_info(&argc, &argv,
> > - &info, &len);
> > + &info, &len, 0);
>
>
> Do you need write permissions for the inner map's fd? This is something
> that could be worth checking in the tests, as well.
The inner map fd can be created with read only access. I changed it and added
a test for map-of-maps creation to v3.
> > @@ -128,7 +128,8 @@ int do_event_pipe(int argc, char **argv)
> > int err, map_fd;
> >
> > map_info_len = sizeof(map_info);
> > - map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len);
> > + map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len,
> > + 0);
>
>
> This one might be worth checking, too.
An event pipe map fd requires write access as the map is updated by bpf_map_update_elem
inside __perf_buffer__new .
This commit introduces a new vmtest.sh runner for vsock.
It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H,
H2G, and loopback. The testing tools from tools/testing/vsock/ are
reused. Currently, only vsock_test is used.
VMCI and hyperv support is included in the config file to be built with
the -b option, though not used in the tests.
Only tested on x86.
To run:
$ make -C tools/testing/selftests TARGETS=vsock
$ tools/testing/selftests/vsock/vmtest.sh
or
$ make -C tools/testing/selftests TARGETS=vsock run_tests
Example runs (after make -C tools/testing/selftests TARGETS=vsock):
$ ./tools/testing/selftests/vsock/vmtest.sh
1..3
ok 0 vm_server_host_client
ok 1 vm_client_host_server
ok 2 vm_loopback
SUMMARY: PASS=3 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_m7DI.log
$ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback
1..1
ok 0 vm_loopback
SUMMARY: PASS=1 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_a1IO.log
$ mkdir -p ~/scratch
$ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch
[... omitted ...]
$ cd ~/scratch
$ ./run_kselftest.sh
TAP version 13
1..1
# timeout set to 300
# selftests: vsock: vmtest.sh
# 1..3
# ok 0 vm_server_host_client
# ok 1 vm_client_host_server
# ok 2 vm_loopback
# SUMMARY: PASS=3 SKIP=0 FAIL=0
# Log: /tmp/vsock_vmtest_svEl.log
ok 1 selftests: vsock: vmtest.sh
Future work can include vsock_diag_test.
Because vsock requires a VM to test anything other than loopback, this
patch adds vmtest.sh as a kselftest itself. This is different than other
systems that have a "vmtest.sh", where it is used as a utility script to
spin up a VM to run the selftests as a guest (but isn't hooked into
kselftest).
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
---
Changes in v10:
- remove dupes in tools/testing/selftests/vsock/config
- Link to v9: https://lore.kernel.org/r/20250527-vsock-vmtest-v9-1-24eaeec6fa55@gmail.com
Changes in v9:
- make kselftest build target depend on tools/testing/vsock sources (Stefano)
- add check_vng() for vng version checking (Stefano)
- add virtme_ssh_channel=tcp to kernel cmdline (Stefano)
- Link to v8: https://lore.kernel.org/r/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com
Changes in v8:
- remove NIPA comment from commit msg
- remove tap_* functions and TAP_PREFIX
- add -b for building kernel
- Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com
Changes in v7:
- fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS
- updated commit message with updated output
- updated commit message with commands for installing/running as
kselftest
- Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com
Changes in v6:
- add make cmd in commit message in vmtest.sh example (Stefano)
- check nonzero size of QEMU_PIDFILE using -s conditional (Stefano)
- display log file path after tests so it is easier to find amongst other random names
- cleanup qemu pidfile if qemu is unable to remove it
- make oops/warning failures more obvious with 'FAIL' prefix in log
(simply saying 'detected' wasn't clear enough to identify failing
condition)
- Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com
Changes in v5:
- make log file a tmpfile (Paolo)
- make sure both default and user defined QEMU gets handled by the dependency check (Paolo)
- increased VM boot up timeout from 1m to 3m for slow hosts (Paolo)
- rename vm_setup -> vm_start (Paolo)
- derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage
- Remove unused 'unset IFS' line (Paolo)
- leave space after variable declarations (Paolo)
- make QEMU_PIDFILE a tmp file (Paolo)
- make everything readonly that is only read (Paolo)
- source ktap_helpers.sh for KSFT_PASS and friends (Paolo)
- don't check for timeout util (Paolo)
- add missing usage string for -q qemu arg
- add tap prefix to SUMMARY line since it isn't part of TAP protocol
- exit with the correct status code based on failure/pass counts
- Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com
Changes in v4:
- do not use special tab delimiter for help string parsing (Stefano + Paolo)
- fix paths for when installing kselftest and running out-of-tree (Paolo)
- change vng to using running kernel instead of compiled kernel (Paolo)
- use multi-line string for QEMU_OPTS (Stefano)
- change timeout to 300s (Paolo)
- skip if tools are not found and use kselftests status codes (Paolo)
- remove build from vmtest.sh (Paolo)
- change 2222 -> SSH_HOST_PORT (Stefano)
- add tap-format output
- add vmtest.log to gitignore
- check for vsock_test binary and remind user to build it if missing
- create a proper build in makefile
- style fixes
- add ssh, timeout, and pkill to dependency check, just in case
- fix numerical comparison in conditionals
- check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh)
- fix tracking of pass/fail bug
- fix stderr redirection bug
- Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com
Changes in v3:
- use common conditional syntax for checking variables
- use return value instead of global rc
- fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER
- use SIGTERM instead of SIGKILL on cleanup
- use peer-cid=1 for loopback
- change sleep delay times into globals
- fix test_vm_loopback logging
- add test selection in arguments
- make QEMU an argument
- check that vng binary is on path
- use QEMU variable
- change <tab><backslash> to <space><backslash>
- fix hardcoded file paths
- add comment in commit msg about script that vmtest.sh was based off of
- Add tools/testing/selftest/vsock/Makefile for kselftest
- Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com
Changes in v2:
- add kernel oops and warnings checker
- change testname variable to use FUNCNAME
- fix spacing in test_vm_server_host_client
- add -s skip build option to vmtest.sh
- add test_vm_loopback
- pass port to vm_wait_for_listener
- fix indentation in vmtest.sh
- add vmci and hyperv to config
- changed whitespace from tabs to spaces in help string
- Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com
---
MAINTAINERS | 1 +
tools/testing/selftests/vsock/.gitignore | 2 +
tools/testing/selftests/vsock/Makefile | 17 ++
tools/testing/selftests/vsock/config | 111 +++++++
tools/testing/selftests/vsock/settings | 1 +
tools/testing/selftests/vsock/vmtest.sh | 487 +++++++++++++++++++++++++++++++
6 files changed, 619 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h
F: include/uapi/linux/vm_sockets_diag.h
F: include/uapi/linux/vsockmon.h
F: net/vmw_vsock/
+F: tools/testing/selftests/vsock/
F: tools/testing/vsock/
VMALLOC
diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84
--- /dev/null
+++ b/tools/testing/selftests/vsock/.gitignore
@@ -0,0 +1,2 @@
+vmtest.log
+vsock_test
diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..c407c0afd9388ee692d59a95f304182f067596e9
--- /dev/null
+++ b/tools/testing/selftests/vsock/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CURDIR := $(abspath .)
+TOOLSDIR := $(abspath ../../..)
+VSOCK_TEST_DIR := $(TOOLSDIR)/testing/vsock
+VSOCK_TEST_SRCS := $(wildcard $(VSOCK_TEST_DIR)/*.c $(VSOCK_TEST_DIR)/*.h)
+
+$(OUTPUT)/vsock_test: $(VSOCK_TEST_DIR)/vsock_test
+ install -m 755 $< $@
+
+$(VSOCK_TEST_DIR)/vsock_test: $(VSOCK_TEST_SRCS)
+ $(MAKE) -C $(VSOCK_TEST_DIR) vsock_test
+TEST_PROGS += vmtest.sh
+TEST_GEN_FILES := vsock_test
+
+include ../lib.mk
+
diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config
new file mode 100644
index 0000000000000000000000000000000000000000..5f0a4f17dfc96c0f4b8decafb01724648495191a
--- /dev/null
+++ b/tools/testing/selftests/vsock/config
@@ -0,0 +1,111 @@
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_BPF=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT=y
+CONFIG_HAVE_EBPF_JIT=y
+CONFIG_BPF_EVENTS=y
+CONFIG_FTRACE_SYSCALLS=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_HAVE_DYNAMIC_FTRACE=y
+CONFIG_DYNAMIC_FTRACE=y
+CONFIG_HAVE_KPROBES=y
+CONFIG_KPROBES=y
+CONFIG_KPROBE_EVENTS=y
+CONFIG_ARCH_SUPPORTS_UPROBES=y
+CONFIG_UPROBES=y
+CONFIG_UPROBE_EVENTS=y
+CONFIG_DEBUG_FS=y
+CONFIG_FW_CFG_SYSFS=y
+CONFIG_FW_CFG_SYSFS_CMDLINE=y
+CONFIG_DRM=y
+CONFIG_DRM_VIRTIO_GPU=y
+CONFIG_DRM_VIRTIO_GPU_KMS=y
+CONFIG_DRM_BOCHS=y
+CONFIG_VIRTIO_IOMMU=y
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_SND_SEQUENCER=y
+CONFIG_SND_PCI=y
+CONFIG_SND_INTEL8X0=y
+CONFIG_SND_HDA_CODEC_REALTEK=y
+CONFIG_SECURITYFS=y
+CONFIG_CGROUP_BPF=y
+CONFIG_SQUASHFS=y
+CONFIG_SQUASHFS_XZ=y
+CONFIG_SQUASHFS_ZSTD=y
+CONFIG_FUSE_FS=y
+CONFIG_VIRTIO_FS=y
+CONFIG_SERIO=y
+CONFIG_PCI=y
+CONFIG_INPUT=y
+CONFIG_INPUT_KEYBOARD=y
+CONFIG_KEYBOARD_ATKBD=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_X86_VERBOSE_BOOTUP=y
+CONFIG_VGA_CONSOLE=y
+CONFIG_FB=y
+CONFIG_FB_VESA=y
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_DRV_CMOS=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+CONFIG_KVM_GUEST=y
+CONFIG_KVM=y
+CONFIG_KVM_INTEL=y
+CONFIG_KVM_AMD=y
+CONFIG_VSOCKETS=y
+CONFIG_VSOCKETS_DIAG=y
+CONFIG_VSOCKETS_LOOPBACK=y
+CONFIG_VMWARE_VMCI_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS_COMMON=y
+CONFIG_HYPERV_VSOCKETS=y
+CONFIG_VMWARE_VMCI=y
+CONFIG_VHOST_VSOCK=y
+CONFIG_HYPERV=y
+CONFIG_UEVENT_HELPER=n
+CONFIG_VIRTIO=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_NET=y
+CONFIG_NET_CORE=y
+CONFIG_NETDEVICES=y
+CONFIG_NETWORK_FILESYSTEMS=y
+CONFIG_INET=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
+CONFIG_9P_FS=y
+CONFIG_VIRTIO_NET=y
+CONFIG_CMDLINE_OVERRIDE=n
+CONFIG_BINFMT_SCRIPT=y
+CONFIG_SHMEM=y
+CONFIG_TMPFS=y
+CONFIG_UNIX=y
+CONFIG_MODULE_SIG_FORCE=n
+CONFIG_DEVTMPFS=y
+CONFIG_TTY=y
+CONFIG_VT=y
+CONFIG_UNIX98_PTYS=y
+CONFIG_EARLY_PRINTK=y
+CONFIG_INOTIFY_USER=y
+CONFIG_BLOCK=y
+CONFIG_SCSI_LOWLEVEL=y
+CONFIG_SCSI=y
+CONFIG_SCSI_VIRTIO=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+CONFIG_I6300ESB_WDT=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
+CONFIG_OVERLAY_FS=y
+CONFIG_DAX=y
+CONFIG_DAX_DRIVER=y
+CONFIG_FS_DAX=y
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
+CONFIG_ZONE_DEVICE=y
diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings
new file mode 100644
index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019
--- /dev/null
+++ b/tools/testing/selftests/vsock/settings
@@ -0,0 +1 @@
+timeout=300
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
new file mode 100755
index 0000000000000000000000000000000000000000..edacebfc163251eee9cd495eb5e704dc7adc958e
--- /dev/null
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -0,0 +1,487 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2025 Meta Platforms, Inc. and affiliates
+#
+# Dependencies:
+# * virtme-ng
+# * busybox-static (used by virtme-ng)
+# * qemu (used by virtme-ng)
+
+readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
+readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
+
+source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh
+
+readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test
+readonly TEST_GUEST_PORT=51000
+readonly TEST_HOST_PORT=50000
+readonly TEST_HOST_PORT_LISTENER=50001
+readonly SSH_GUEST_PORT=22
+readonly SSH_HOST_PORT=2222
+readonly VSOCK_CID=1234
+readonly WAIT_PERIOD=3
+readonly WAIT_PERIOD_MAX=60
+readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
+readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+
+# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a
+# control port forwarded for vsock_test. Because virtme-ng doesn't support
+# adding an additional port to forward to the device created from "--ssh" and
+# virtme-init mistakenly sets identical IPs to the ssh device and additional
+# devices, we instead opt out of using --ssh, add the device manually, and also
+# add the kernel cmdline options that virtme-init uses to setup the interface.
+readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}"
+readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}"
+readonly QEMU_OPTS="\
+ -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
+ -device virtio-net-pci,netdev=n0 \
+ -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \
+ --pidfile ${QEMU_PIDFILE} \
+"
+readonly KERNEL_CMDLINE="\
+ virtme.dhcp net.ifnames=0 biosdevname=0 \
+ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \
+"
+readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
+readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly TEST_DESCS=(
+ "Run vsock_test in server mode on the VM and in client mode on the host."
+ "Run vsock_test in client mode on the VM and in server mode on the host."
+ "Run vsock_test using the loopback transport in the VM."
+)
+
+VERBOSE=0
+
+usage() {
+ local name
+ local desc
+ local i
+
+ echo
+ echo "$0 [OPTIONS] [TEST]..."
+ echo "If no TEST argument is given, all tests will be run."
+ echo
+ echo "Options"
+ echo " -b: build the kernel from the current source tree and use it for guest VMs"
+ echo " -q: set the path to or name of qemu binary"
+ echo " -v: verbose output"
+ echo
+ echo "Available tests"
+
+ for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
+ name=${TEST_NAMES[${i}]}
+ desc=${TEST_DESCS[${i}]}
+ printf "\t%-35s%-35s\n" "${name}" "${desc}"
+ done
+ echo
+
+ exit 1
+}
+
+die() {
+ echo "$*" >&2
+ exit "${KSFT_FAIL}"
+}
+
+vm_ssh() {
+ ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
+ return $?
+}
+
+cleanup() {
+ if [[ -s "${QEMU_PIDFILE}" ]]; then
+ pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
+ fi
+
+ # If failure occurred during or before qemu start up, then we need
+ # to clean this up ourselves.
+ if [[ -e "${QEMU_PIDFILE}" ]]; then
+ rm "${QEMU_PIDFILE}"
+ fi
+}
+
+check_args() {
+ local found
+
+ for arg in "$@"; do
+ found=0
+ for name in "${TEST_NAMES[@]}"; do
+ if [[ "${name}" = "${arg}" ]]; then
+ found=1
+ break
+ fi
+ done
+
+ if [[ "${found}" -eq 0 ]]; then
+ echo "${arg} is not an available test" >&2
+ usage
+ fi
+ done
+
+ for arg in "$@"; do
+ if ! command -v > /dev/null "test_${arg}"; then
+ echo "Test ${arg} not found" >&2
+ usage
+ fi
+ done
+}
+
+check_deps() {
+ for dep in vng ${QEMU} busybox pkill ssh; do
+ if [[ ! -x $(command -v "${dep}") ]]; then
+ echo -e "skip: dependency ${dep} not found!\n"
+ exit "${KSFT_SKIP}"
+ fi
+ done
+
+ if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then
+ printf "skip: %s not found!" "${VSOCK_TEST}"
+ printf " Please build the kselftest vsock target.\n"
+ exit "${KSFT_SKIP}"
+ fi
+}
+
+check_vng() {
+ local tested_versions
+ local version
+ local ok
+
+ tested_versions=("1.33" "1.36")
+ version="$(vng --version)"
+
+ ok=0
+ for tv in "${tested_versions[@]}"; do
+ if [[ "${version}" == *"${tv}"* ]]; then
+ ok=1
+ break
+ fi
+ done
+
+ if [[ ! "${ok}" -eq 1 ]]; then
+ printf "warning: vng version '%s' has not been tested and may " "${version}" >&2
+ printf "not function properly.\n\tThe following versions have been tested: " >&2
+ echo "${tested_versions[@]}" >&2
+ fi
+}
+
+handle_build() {
+ if [[ ! "${BUILD}" -eq 1 ]]; then
+ return
+ fi
+
+ if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then
+ echo "-b requires vmtest.sh called from the kernel source tree" >&2
+ exit 1
+ fi
+
+ pushd "${KERNEL_CHECKOUT}" &>/dev/null
+
+ if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then
+ die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})"
+ fi
+
+ if ! make -j$(nproc); then
+ die "failed to build kernel from source tree (${KERNEL_CHECKOUT})"
+ fi
+
+ popd &>/dev/null
+}
+
+vm_start() {
+ local logfile=/dev/null
+ local verbose_opt=""
+ local kernel_opt=""
+ local qemu
+
+ qemu=$(command -v "${QEMU}")
+
+ if [[ "${VERBOSE}" -eq 1 ]]; then
+ verbose_opt="--verbose"
+ logfile=/dev/stdout
+ fi
+
+ if [[ "${BUILD}" -eq 1 ]]; then
+ kernel_opt="${KERNEL_CHECKOUT}"
+ fi
+
+ vng \
+ --run \
+ ${kernel_opt} \
+ ${verbose_opt} \
+ --qemu-opts="${QEMU_OPTS}" \
+ --qemu="${qemu}" \
+ --user root \
+ --append "${KERNEL_CMDLINE}" \
+ --rw &> ${logfile} &
+
+ if ! timeout ${WAIT_TOTAL} \
+ bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then
+ die "failed to boot VM"
+ fi
+}
+
+vm_wait_for_ssh() {
+ local i
+
+ i=0
+ while true; do
+ if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
+ die "Timed out waiting for guest ssh"
+ fi
+ if vm_ssh -- true; then
+ break
+ fi
+ i=$(( i + 1 ))
+ sleep ${WAIT_PERIOD}
+ done
+}
+
+# derived from selftests/net/net_helper.sh
+wait_for_listener()
+{
+ local port=$1
+ local interval=$2
+ local max_intervals=$3
+ local protocol=tcp
+ local pattern
+ local i
+
+ pattern=":$(printf "%04X" "${port}") "
+
+ # for tcp protocol additionally check the socket state
+ [ "${protocol}" = "tcp" ] && pattern="${pattern}0A"
+ for i in $(seq "${max_intervals}"); do
+ if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \
+ grep -q "${pattern}"; then
+ break
+ fi
+ sleep "${interval}"
+ done
+}
+
+vm_wait_for_listener() {
+ local port=$1
+
+ vm_ssh <<EOF
+$(declare -f wait_for_listener)
+wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+EOF
+}
+
+host_wait_for_listener() {
+ wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+}
+
+__log_stdin() {
+ cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+__log_args() {
+ echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+log() {
+ local prefix="$1"
+
+ shift
+ local redirect=
+ if [[ ${VERBOSE} -eq 0 ]]; then
+ redirect=/dev/null
+ else
+ redirect=/dev/stdout
+ fi
+
+ if [[ "$#" -eq 0 ]]; then
+ __log_stdin | tee -a "${LOG}" > ${redirect}
+ else
+ __log_args "$@" | tee -a "${LOG}" > ${redirect}
+ fi
+}
+
+log_setup() {
+ log "setup" "$@"
+}
+
+log_host() {
+ local testname=$1
+
+ shift
+ log "test:${testname}:host" "$@"
+}
+
+log_guest() {
+ local testname=$1
+
+ shift
+ log "test:${testname}:guest" "$@"
+}
+
+test_vm_server_host_client() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=server \
+ --control-port="${TEST_GUEST_PORT}" \
+ --peer-cid=2 \
+ 2>&1 | log_guest "${testname}" &
+
+ vm_wait_for_listener "${TEST_GUEST_PORT}"
+
+ ${VSOCK_TEST} \
+ --mode=client \
+ --control-host=127.0.0.1 \
+ --peer-cid="${VSOCK_CID}" \
+ --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}"
+
+ return $?
+}
+
+test_vm_client_host_server() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ ${VSOCK_TEST} \
+ --mode "server" \
+ --control-port "${TEST_HOST_PORT_LISTENER}" \
+ --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" &
+
+ host_wait_for_listener
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=client \
+ --control-host=10.0.2.2 \
+ --peer-cid=2 \
+ --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}"
+
+ return $?
+}
+
+test_vm_loopback() {
+ local testname="${FUNCNAME[0]#test_}"
+ local port=60000 # non-forwarded local port
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=server \
+ --control-port="${port}" \
+ --peer-cid=1 2>&1 | log_guest "${testname}" &
+
+ vm_wait_for_listener "${port}"
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=client \
+ --control-host="127.0.0.1" \
+ --control-port="${port}" \
+ --peer-cid=1 2>&1 | log_guest "${testname}"
+
+ return $?
+}
+
+run_test() {
+ local host_oops_cnt_before
+ local host_warn_cnt_before
+ local vm_oops_cnt_before
+ local vm_warn_cnt_before
+ local host_oops_cnt_after
+ local host_warn_cnt_after
+ local vm_oops_cnt_after
+ local vm_warn_cnt_after
+ local name
+ local rc
+
+ host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
+ host_warn_cnt_before=$(dmesg --level=warn | wc -l)
+ vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
+ vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l)
+
+ name=$(echo "${1}" | awk '{ print $1 }')
+ eval test_"${name}"
+ rc=$?
+
+ host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on host" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ host_warn_cnt_after=$(dmesg --level=warn | wc -l)
+ if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
+ echo "FAIL: kernel warning detected on host" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l)
+ if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
+ echo "FAIL: kernel warning detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ return "${rc}"
+}
+
+QEMU="qemu-system-$(uname -m)"
+
+while getopts :hvsq:b o
+do
+ case $o in
+ v) VERBOSE=1;;
+ b) BUILD=1;;
+ q) QEMU=$OPTARG;;
+ h|*) usage;;
+ esac
+done
+shift $((OPTIND-1))
+
+trap cleanup EXIT
+
+if [[ ${#} -eq 0 ]]; then
+ ARGS=("${TEST_NAMES[@]}")
+else
+ ARGS=("$@")
+fi
+
+check_args "${ARGS[@]}"
+check_deps
+check_vng
+handle_build
+
+echo "1..${#ARGS[@]}"
+
+log_setup "Booting up VM"
+vm_start
+vm_wait_for_ssh
+log_setup "VM booted up"
+
+cnt_pass=0
+cnt_fail=0
+cnt_skip=0
+cnt_total=0
+for arg in "${ARGS[@]}"; do
+ run_test "${arg}"
+ rc=$?
+ if [[ ${rc} -eq $KSFT_PASS ]]; then
+ cnt_pass=$(( cnt_pass + 1 ))
+ echo "ok ${cnt_total} ${arg}"
+ elif [[ ${rc} -eq $KSFT_SKIP ]]; then
+ cnt_skip=$(( cnt_skip + 1 ))
+ echo "ok ${cnt_total} ${arg} # SKIP"
+ elif [[ ${rc} -eq $KSFT_FAIL ]]; then
+ cnt_fail=$(( cnt_fail + 1 ))
+ echo "not ok ${cnt_total} ${arg} # exit=$rc"
+ fi
+ cnt_total=$(( cnt_total + 1 ))
+done
+
+echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
+echo "Log: ${LOG}"
+
+if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then
+ exit "$KSFT_PASS"
+else
+ exit "$KSFT_FAIL"
+fi
---
base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a
change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)gmail.com>
Regressions found on arm, arm64 and x86_64 building warnings with clang-20
and clang-nightly started from Linux next-20250603
Regressions found on arm, arm64 and x86_64
- selftests/filesystem
Regression Analysis:
- New regression? Yes
- Reproducible? Yes
First seen on the next-20250603
Good: next-20250530
Bad: next-20250603
Test regression: arm arm64 x86_64 clang warning null passed to a
callee that requires a non-null argument [-Wnonnull]
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
## Build warnings
make[4]: Entering directory '/builds/linux/tools/testing/selftests/filesystems'
CC devpts_pts
CC file_stressor
CC anon_inode_test
anon_inode_test.c:45:37: warning: null passed to a callee that
requires a non-null argument [-Wnonnull]
45 | ASSERT_LT(execveat(fd_context, "", NULL, NULL,
AT_EMPTY_PATH), 0);
| ^~~~
## Source
* Kernel version: 6.15.0-next-20250605
* Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
* Git sha: a0bea9e39035edc56a994630e6048c8a191a99d8
* Toolchain: Debian clang version 21.0.0
(++20250529012636+c474f8f2404d-1~exp1~20250529132821.1479)
## Build
* Test log: https://qa-reports.linaro.org/api/testruns/28651387/log_file/
* Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2xzM4wMl8SvuLKE3mw3c…
* Kernel config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2xzM4wMl8SvuLKE3mw3c…
--
Linaro LKFT
https://lkft.linaro.org
The BTF dumper code currently displays arrays of characters as just that -
arrays, with each character formatted individually. Sometimes this is what
makes sense, but it's nice to be able to treat that array as a string.
This change adds a special case to the btf_dump functionality to allow
0-terminated arrays of single-byte integer values to be printed as
character strings. Characters for which isprint() returns false are
printed as hex-escaped values. This is enabled when the new ".emit_strings"
is set to 1 in the btf_dump_type_data_opts structure.
As an example, here's what it looks like to dump the string "hello" using
a few different field values for btf_dump_type_data_opts (.compact = 1):
- .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',]
- .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',]
- .emit_strings = 1, .skip_names = 0: (char[6])"hello"
- .emit_strings = 1, .skip_names = 1: "hello"
Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1:
- .emit_strings = 0: ['h',-1,]
- .emit_strings = 1: "h\xff"
Signed-off-by: Blake Jones <blakejones(a)google.com>
---
tools/lib/bpf/btf.h | 3 ++-
tools/lib/bpf/btf_dump.c | 55 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 4392451d634b..ccfd905f03df 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -326,9 +326,10 @@ struct btf_dump_type_data_opts {
bool compact; /* no newlines/indentation */
bool skip_names; /* skip member/type names */
bool emit_zeroes; /* show 0-valued fields */
+ bool emit_strings; /* print char arrays as strings */
size_t :0;
};
-#define btf_dump_type_data_opts__last_field emit_zeroes
+#define btf_dump_type_data_opts__last_field emit_strings
LIBBPF_API int
btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 460c3e57fadb..7c2f1f13f958 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -68,6 +68,7 @@ struct btf_dump_data {
bool compact;
bool skip_names;
bool emit_zeroes;
+ bool emit_strings;
__u8 indent_lvl; /* base indent level */
char indent_str[BTF_DATA_INDENT_STR_LEN];
/* below are used during iteration */
@@ -2028,6 +2029,52 @@ static int btf_dump_var_data(struct btf_dump *d,
return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0);
}
+static int btf_dump_string_data(struct btf_dump *d,
+ const struct btf_type *t,
+ __u32 id,
+ const void *data)
+{
+ const struct btf_array *array = btf_array(t);
+ const char *chars = data;
+ __u32 i;
+
+ /* Make sure it is a NUL-terminated string. */
+ for (i = 0; i < array->nelems; i++) {
+ if ((void *)(chars + i) >= d->typed_dump->data_end)
+ return -E2BIG;
+ if (chars[i] == '\0')
+ break;
+ }
+ if (i == array->nelems) {
+ /* The caller will print this as a regular array. */
+ return -EINVAL;
+ }
+
+ btf_dump_data_pfx(d);
+ btf_dump_printf(d, "\"");
+
+ for (i = 0; i < array->nelems; i++) {
+ char c = chars[i];
+
+ if (c == '\0') {
+ /*
+ * When printing character arrays as strings, NUL bytes
+ * are always treated as string terminators; they are
+ * never printed.
+ */
+ break;
+ }
+ if (isprint(c))
+ btf_dump_printf(d, "%c", c);
+ else
+ btf_dump_printf(d, "\\x%02x", (__u8)c);
+ }
+
+ btf_dump_printf(d, "\"");
+
+ return 0;
+}
+
static int btf_dump_array_data(struct btf_dump *d,
const struct btf_type *t,
__u32 id,
@@ -2055,8 +2102,13 @@ static int btf_dump_array_data(struct btf_dump *d,
* char arrays, so if size is 1 and element is
* printable as a char, we'll do that.
*/
- if (elem_size == 1)
+ if (elem_size == 1) {
+ if (d->typed_dump->emit_strings &&
+ btf_dump_string_data(d, t, id, data) == 0) {
+ return 0;
+ }
d->typed_dump->is_array_char = true;
+ }
}
/* note that we increment depth before calling btf_dump_print() below;
@@ -2544,6 +2596,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->compact = OPTS_GET(opts, compact, false);
d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
+ d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);
--
2.49.0.1204.g71687c7c1d-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find DUALPI2 iproute2 patch v8.
v8 (09-May-25)
- Update pkt_sched.h with the one in nex-next
- Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
v7 (05-May-25)
- Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml
- Reorganize dualpi2_print_opt() to match the order in tc.yaml
- Remove credit-queue in PRINT_JSON
v6 (26-Apr-25)
- Update JSON file output due to spec modification in tc.yaml of net-next
v5 (25-Mar-25)
- Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>)
- Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>)
v4 (16-Mar-25)
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking.
v3 (21-Feb-25)
- Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update the manual to align with the latest implementation and clarify the queue naming and default unit
- Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2
v2 (23-Oct-24)
- Rename get_float in dualpi2 to get_float_min_max in utils.c
- Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>)
For more details of DualPI2, please refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
Best Regards,
Chia-Yu
Chia-Yu Chang (1):
tc: add dualpi2 scheduler module
bash-completion/tc | 11 +-
include/uapi/linux/pkt_sched.h | 68 +++++
include/utils.h | 2 +
ip/iplink_can.c | 14 -
lib/utils.c | 30 ++
man/man8/tc-dualpi2.8 | 249 +++++++++++++++
tc/Makefile | 1 +
tc/q_dualpi2.c | 534 +++++++++++++++++++++++++++++++++
8 files changed, 894 insertions(+), 15 deletions(-)
create mode 100644 man/man8/tc-dualpi2.8
create mode 100644 tc/q_dualpi2.c
--
2.34.1
This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.
The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.
This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
contexts
Patch 1 fixes the validation logic to skip the existence check for
context 0.
Patch 2 adds a selftest that verifies this behavior.
Changelog -
v1->v2: https://lore.kernel.org/all/20250225071348.509432-1-gal@nvidia.com/
* Reworded commit message.
* Added a selftest.
Gal Pressman (2):
net: ethtool: Don't check if RSS context exists in case of context 0
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting
default RSS context
net/ethtool/ioctl.c | 3 +-
.../selftests/drivers/net/hw/rss_ctx.py | 59 ++++++++++++++++++-
2 files changed, 60 insertions(+), 2 deletions(-)
--
2.40.1
Cong reported an issue where running 'test_sockmap' in the current
bpf-next tree results in an error [1].
The specific test case that triggered the error is a combined test
involving ktls and bpf_msg_pop_data().
Root Cause:
When sending plaintext data, we initially calculated the corresponding
ciphertext length. However, if we later reduced the plaintext data length
via socket policy, we failed to recalculate the ciphertext length.
This results in transmitting buffers containing uninitialized data during
ciphertext transmission.
This causes uninitialized bytes to be appended after a complete
"Application Data" packet, leading to errors on the receiving end when
parsing TLS record.
This issue has existed for a long time but was only exposed after the
following test code was merged.
commit 47eae080410b ("selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap")
Although we already had tests for pop data before this commit, the
pop data length was insufficient (less than 5 bytes). This meant that the
corrupted TLS records with data length <5 bytes were cached without being
parsed, resulting in no error being triggered.
After this fix, all tests pass.
1/ 6 sockmap::txmsg test passthrough:OK
2/ 6 sockmap::txmsg test redirect:OK
3/ 2 sockmap::txmsg test redirect wait send mem:OK
4/ 6 sockmap::txmsg test drop:OK
5/ 6 sockmap::txmsg test ingress redirect:OK
6/ 7 sockmap::txmsg test skb:OK
7/12 sockmap::txmsg test apply:OK
8/12 sockmap::txmsg test cork:OK
9/ 3 sockmap::txmsg test hanging corks:OK
10/11 sockmap::txmsg test push_data:OK
11/17 sockmap::txmsg test pull-data:OK
12/ 9 sockmap::txmsg test pop-data:OK
13/ 6 sockmap::txmsg test push/pop data:OK
14/ 1 sockmap::txmsg test ingress parser:OK
15/ 1 sockmap::txmsg test ingress parser2:OK
16/ 6 sockhash::txmsg test passthrough:OK
17/ 6 sockhash::txmsg test redirect:OK
18/ 2 sockhash::txmsg test redirect wait send mem:OK
19/ 6 sockhash::txmsg test drop:OK
20/ 6 sockhash::txmsg test ingress redirect:OK
21/ 7 sockhash::txmsg test skb:OK
22/12 sockhash::txmsg test apply:OK
23/12 sockhash::txmsg test cork:OK
24/ 3 sockhash::txmsg test hanging corks:OK
25/11 sockhash::txmsg test push_data:OK
26/17 sockhash::txmsg test pull-data:OK
27/ 9 sockhash::txmsg test pop-data:OK
28/ 6 sockhash::txmsg test push/pop data:OK
29/ 1 sockhash::txmsg test ingress parser:OK
30/ 1 sockhash::txmsg test ingress parser2:OK
31/ 6 sockhash:ktls:txmsg test passthrough:OK
32/ 6 sockhash:ktls:txmsg test redirect:OK
33/ 2 sockhash:ktls:txmsg test redirect wait send mem:OK
34/ 6 sockhash:ktls:txmsg test drop:OK
35/ 6 sockhash:ktls:txmsg test ingress redirect:OK
36/ 7 sockhash:ktls:txmsg test skb:OK
37/12 sockhash:ktls:txmsg test apply:OK
38/12 sockhash:ktls:txmsg test cork:OK
39/ 3 sockhash:ktls:txmsg test hanging corks:OK
40/11 sockhash:ktls:txmsg test push_data:OK
41/17 sockhash:ktls:txmsg test pull-data:OK
42/ 9 sockhash:ktls:txmsg test pop-data:OK
43/ 6 sockhash:ktls:txmsg test push/pop data:OK
44/ 1 sockhash:ktls:txmsg test ingress parser:OK
45/ 0 sockhash:ktls:txmsg test ingress parser2:OK
Pass: 45 Fail: 0
[1]: https://lore.kernel.org/bpf/CAM_iQpU7=4xjbefZoxndKoX9gFFMOe7FcWMq5tHBsymbrn…
---
v2 -> v1: Removed WARN_ON() check and added Reviewed-by tag.
https://lore.kernel.org/bpf/20250605145529.3q3aqr6iygpa6xg6@gmail.com/
Jiayuan Chen (2):
bpf,ktls: Fix data corruption when using bpf_msg_pop_data() in ktls
selftests/bpf: Add test to cover ktls with bpf_msg_pop_data
net/tls/tls_sw.c | 13 +++
.../selftests/bpf/prog_tests/sockmap_ktls.c | 91 +++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_ktls.c | 4 +
3 files changed, 108 insertions(+)
--
2.47.1
When running the memfd_secret test run_vmtests.sh unconditionally tries
to confgiure the YAMA LSM's ptrace_scope configuration, leading to an error
if YAMA is not in the running kernel:
# ./run_vmtests.sh: line 432: /proc/sys/kernel/yama/ptrace_scope: No such file or directory
# # ----------------------
# # running ./memfd_secret
# # ----------------------
Check that this file is present before trying to write to it.
The indentation here is a bit odd, and it doesn't seem great that we
configure but don't restore ptrace_scope.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/run_vmtests.sh | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index dddd1dd8af14..33fc7fafa8f9 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -429,7 +429,9 @@ CATEGORY="vma_merge" run_test ./merge
if [ -x ./memfd_secret ]
then
-(echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
+if [ -f /proc/sys/kernel/yama/ptrace_scope ]; then
+ (echo 0 > /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
+fi
CATEGORY="memfd_secret" run_test ./memfd_secret
fi
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250605-selftest-mm-enable-yama-1541c2d2ddcd
Best regards,
--
Mark Brown <broonie(a)kernel.org>
A collection of non-functional updates from David Hildenbrand's review.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (4):
kselftest/mm: Clarify errors for pipe()
selftests/mm: Convert some cow error reports to ksft_perror()
selftests/mm: Don't compare return values to in cow
selftests/mm: Add messages about test errors to the cow tests
tools/testing/selftests/mm/cow.c | 44 +++++++++++++++++++++++++---------------
1 file changed, 28 insertions(+), 16 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250603-selftest-mm-cow-tweaks-0199c5132b3e
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Fixes and cleanups for various issues in the vDSO selftests.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Rebase on v6.16-rc1
- Preserve vgetrandom_put_state()
- Pick up vdso_standalone_test_x86 into this series
- Link to v2: https://lore.kernel.org/r/20250505-selftests-vdso-fixes-v2-0-3bc86e42f242@l…
Changes in v2:
- Refer to -Wstrict-prototypes over -Wold-style-prototypes
- Pick up Acks
- Enable fixed warnings in Makefile
- Link to v1: https://lore.kernel.org/r/20250502-selftests-vdso-fixes-v1-0-fb5d640a4f78@l…
---
Thomas Weißschuh (9):
selftests: vDSO: chacha: Correctly skip test if necessary
selftests: vDSO: clock_getres: Drop unused include of err.h
selftests: vDSO: vdso_test_getrandom: Drop unused include of linux/compiler.h
selftests: vDSO: vdso_test_getrandom: Avoid -Wunused
selftests: vDSO: vdso_config: Avoid -Wunused-variables
selftests: vDSO: enable -Wall
selftests: vDSO: vdso_test_correctness: Fix -Wstrict-prototypes
selftests: vDSO: vdso_test_getrandom: Always print TAP header
selftests: vDSO: vdso_standalone_test_x86: Replace source file with symlink
tools/testing/selftests/vDSO/Makefile | 2 +-
tools/testing/selftests/vDSO/vdso_config.h | 2 +
.../selftests/vDSO/vdso_standalone_test_x86.c | 59 +---------------------
tools/testing/selftests/vDSO/vdso_test_chacha.c | 3 +-
.../selftests/vDSO/vdso_test_clock_getres.c | 1 -
.../testing/selftests/vDSO/vdso_test_correctness.c | 2 +-
tools/testing/selftests/vDSO/vdso_test_getrandom.c | 10 ++--
7 files changed, 13 insertions(+), 66 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250423-selftests-vdso-fixes-d2ce74142359
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hi,
While running the nolibc tests I discovered that they build a kernel in
the current directory, including overwriting the existing .config. This
is rather suprising for the selftests build system - it usually wouldn't
do a kernel build at all - and might be annoying for users.
KUnit deals with this by doing it's kernel build in a .kunit directory,
it'd probably be good to do something similar for nolibc.
Thanks,
Mark
Initially netpoll and netconsole were created together, and some
functions are in the wrong file. Seperate netconsole-only functions
in netconsole, avoiding exports.
1. Expose netpoll logging macros in the public header to enable consistent
log formatting across netpoll consumers.
2. Relocate netconsole-specific functions from netpoll to the netconsole
module where they are actually used, reducing unnecessary coupling.
3. Remove unnecessary function exports
4. Rename netpoll parsing functions in netconsole to better reflect their
specific usage.
5. Create a test to check that cmdline works fine. This was in my todo
list since [1], this was a good time to add it here to make sure this
patchset doesn't regress.
PS: The code was split in a way that it is easy to review. When copying
the functions from netpoll to netconsole, I do not change than other
than adding `static`. This will make checkpatch unhappy, but, further
patches will address the issues. It is done this way to make it easy for
reviewers.
Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorag… [1]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Breno Leitao (7):
netpoll: remove __netpoll_cleanup from exported API
netpoll: expose netpoll logging macros in public header
netpoll: relocate netconsole-specific functions to netconsole module
netpoll: move netpoll_print_options to netconsole
netconsole: rename functions to better reflect their purpose
netconsole: improve code style in parser function
selftest: netconsole: add test for cmdline configuration
drivers/net/netconsole.c | 137 ++++++++++++++++++++-
include/linux/netpoll.h | 10 +-
net/core/netpoll.c | 136 +-------------------
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 39 +++++-
.../selftests/drivers/net/netcons_cmdline.sh | 52 ++++++++
6 files changed, 228 insertions(+), 147 deletions(-)
---
base-commit: 2c7e4a2663a1ab5a740c59c31991579b6b865a26
change-id: 20250603-rework-c175cad8d22e
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Most of the packetdrill tests have not flaked once last week.
Add the few which did to the XFAIL list.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: matttbe(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
Every time I sit down to add more I plan to just XFAIL all of packetdrill
on slow machines, but then I convince myself otherwise. One last time?
---
tools/testing/selftests/net/packetdrill/ksft_runner.sh | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
index ef8b25a606d8..c5b01e1bd4c7 100755
--- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh
+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -39,11 +39,15 @@ if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
# xfail tests that are known flaky with dbg config, not fixable.
# still run them for coverage (and expect 100% pass without dbg).
declare -ar xfail_list=(
+ "tcp_blocking_blocking-connect.pkt"
+ "tcp_blocking_blocking-read.pkt"
"tcp_eor_no-coalesce-retrans.pkt"
"tcp_fast_recovery_prr-ss.*.pkt"
+ "tcp_sack_sack-route-refresh-ip-tos.pkt"
"tcp_slow_start_slow-start-after-win-update.pkt"
"tcp_timestamping.*.pkt"
"tcp_user_timeout_user-timeout-probe.pkt"
+ "tcp_zerocopy_cl.*.pkt"
"tcp_zerocopy_epoll_.*.pkt"
"tcp_tcp_info_tcp-info-.*-limited.pkt"
)
--
2.49.0
As titled, adding version file to kselftest installation dir, so the user
of the tarball can know which kernel version the tarball belongs to.
Signed-off-by: Tianyi Cui <1997cui(a)gmail.com>
---
tools/testing/selftests/Makefile | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index a0a6ba47d600..246e9863b45b 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -291,6 +291,12 @@ ifdef INSTALL_PATH
$(MAKE) -s --no-print-directory OUTPUT=$$BUILD_TARGET COLLECTION=$$TARGET \
-C $$TARGET emit_tests >> $(TEST_LIST); \
done;
+ @if git describe HEAD > /dev/null 2>&1; then \
+ git describe HEAD > $(INSTALL_PATH)/VERSION; \
+ printf "Version saved to $(INSTALL_PATH)/VERSION\n"; \
+ else \
+ printf "Unable to get version from git describe\n"; \
+ fi
else
$(error Error: set INSTALL_PATH to use install)
endif
--
2.47.1
During performance analysis of console subsystem latency, I discovered that
netconsole registers console handlers even when no active targets exist.
These orphaned console handlers are invoked on every printk() call, get
the lock, iterate through empty target lists, and consume CPU cycles
without performing any useful work.
This patch series addresses the inefficiency by:
1. Implementing dynamic console registration/unregistration based on target
availability, ensuring console handlers are only active when needed
2. Adding automatic cleanup of unused console registrations when targets
are disabled or removed
3. Extending the selftest suite to cover non-extended console format,
which was previously untested
The optimization reduces printk() overhead by eliminating unnecessary
function calls and list traversals when netconsole targets are not
configured, improving overall system performance during heavy logging
scenarios.
---
Changes in v3:
- Set CON_ENABLED before re-enabling the console, to fix a selftest that
was failing, as reported by Jakub on v2.
- Link to v2: https://lore.kernel.org/r/20250602-netcons_ext-v2-0-ef88d999326d@debian.org
Changes in v2:
- Added selftests to test the new mechanism
- Unregister the console if the last target got disabled
- Sending to net-next instead of net (Jakub)
- Link to v1: https://lore.kernel.org/r/20250528-netcons_ext-v1-1-69f71e404e00@debian.org
---
Breno Leitao (4):
netconsole: Only register console drivers when targets are configured
netconsole: Add automatic console unregistration on target removal
selftests: netconsole: Do not exit from inside the validation function
selftests: netconsole: Add support for basic netconsole target format
drivers/net/netconsole.c | 67 +++++++++++++++++++---
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 27 +++++++--
.../testing/selftests/drivers/net/netcons_basic.sh | 50 ++++++++++------
3 files changed, 112 insertions(+), 32 deletions(-)
---
base-commit: 2c7e4a2663a1ab5a740c59c31991579b6b865a26
change-id: 20250528-netcons_ext-572982619bea
Best regards,
--
Breno Leitao <leitao(a)debian.org>
I had cause to look at the vfork() support for GCS and realised that we
don't have any direct test coverage, this series does so by adding
vfork() to nolibc and then using that in basic-gcs to provide some
simple vfork() coverage.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Add replacement of ifdef with if defined() in nolibc since the code
doesn't reflect the coding style.
- Remove check for arch specific vfork().
- Link to v1: https://lore.kernel.org/r/20250609-arm64-gcs-vfork-exit-v1-0-baad0f085747@k…
---
Mark Brown (4):
tools/nolibc: Replace ifdef with if defined() in sys.h
tools/nolibc: Provide vfork()
kselftest/arm64: Add a test for vfork() with GCS
selftests/nolibc: Add coverage of vfork()
tools/include/nolibc/sys.h | 57 +++++++++++++++++-------
tools/testing/selftests/arm64/gcs/basic-gcs.c | 63 +++++++++++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 23 ++++++++--
3 files changed, 124 insertions(+), 19 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250528-arm64-gcs-vfork-exit-4a7daf7652ee
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This started with a patch that enabled `clippy::ptr_as_ptr`. Benno
Lossin suggested I also look into `clippy::ptr_cast_constness` and I
discovered `clippy::as_ptr_cast_mut`. This series now enables all 3
lints. It also enables `clippy::as_underscore` which ensures other
pointer casts weren't missed.
As a later addition, `clippy::cast_lossless` and `clippy::ref_as_ptr`
are also enabled.
This series depends on "rust: retain pointer mut-ness in
`container_of!`"[1].
Link: https://lore.kernel.org/all/20250409-container-of-mutness-v1-1-64f472b94534… [1]
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v10:
- Move fragment from "rust: enable `clippy::ptr_cast_constness` lint" to
"rust: enable `clippy::ptr_as_ptr` lint". (Boqun Feng)
- Replace `(...).into()` with `T::from(...)` where the destination type
isn't obvious in "rust: enable `clippy::cast_lossless` lint". (Boqun
Feng)
- Link to v9: https://lore.kernel.org/r/20250416-ptr-as-ptr-v9-0-18ec29b1b1f3@gmail.com
Changes in v9:
- Replace ref-to-ptr coercion using `let` bindings with
`core::ptr::from_{ref,mut}`. (Boqun Feng).
- Link to v8: https://lore.kernel.org/r/20250409-ptr-as-ptr-v8-0-3738061534ef@gmail.com
Changes in v8:
- Use coercion to go ref -> ptr.
- rustfmt.
- Rebase on v6.15-rc1.
- Extract first commit to its own series as it is shared with other
series.
- Link to v7: https://lore.kernel.org/r/20250325-ptr-as-ptr-v7-0-87ab452147b9@gmail.com
Changes in v7:
- Add patch to enable `clippy::ref_as_ptr`.
- Link to v6: https://lore.kernel.org/r/20250324-ptr-as-ptr-v6-0-49d1b7fd4290@gmail.com
Changes in v6:
- Drop strict provenance patch.
- Fix URLs in doc comments.
- Add patch to enable `clippy::cast_lossless`.
- Rebase on rust-next.
- Link to v5: https://lore.kernel.org/r/20250317-ptr-as-ptr-v5-0-5b5f21fa230a@gmail.com
Changes in v5:
- Use `pointer::addr` in OF. (Boqun Feng)
- Add documentation on stubs. (Benno Lossin)
- Mark stubs `#[inline]`.
- Pick up Alice's RB on a shared commit from
https://lore.kernel.org/all/Z9f-3Aj3_FWBZRrm@google.com/.
- Link to v4: https://lore.kernel.org/r/20250315-ptr-as-ptr-v4-0-b2d72c14dc26@gmail.com
Changes in v4:
- Add missing SoB. (Benno Lossin)
- Use `without_provenance_mut` in alloc. (Boqun Feng)
- Limit strict provenance lints to the `kernel` crate to avoid complex
logic in the build system. This can be revisited on MSRV >= 1.84.0.
- Rebase on rust-next.
- Link to v3: https://lore.kernel.org/r/20250314-ptr-as-ptr-v3-0-e7ba61048f4a@gmail.com
Changes in v3:
- Fixed clippy warning in rust/kernel/firmware.rs. (kernel test robot)
Link: https://lore.kernel.org/all/202503120332.YTCpFEvv-lkp@intel.com/
- s/as u64/as bindings::phys_addr_t/g. (Benno Lossin)
- Use strict provenance APIs and enable lints. (Benno Lossin)
- Link to v2: https://lore.kernel.org/r/20250309-ptr-as-ptr-v2-0-25d60ad922b7@gmail.com
Changes in v2:
- Fixed typo in first commit message.
- Added additional patches, converted to series.
- Link to v1: https://lore.kernel.org/r/20250307-ptr-as-ptr-v1-1-582d06514c98@gmail.com
---
Tamir Duberstein (6):
rust: enable `clippy::ptr_as_ptr` lint
rust: enable `clippy::ptr_cast_constness` lint
rust: enable `clippy::as_ptr_cast_mut` lint
rust: enable `clippy::as_underscore` lint
rust: enable `clippy::cast_lossless` lint
rust: enable `clippy::ref_as_ptr` lint
Makefile | 6 ++++++
drivers/gpu/drm/drm_panic_qr.rs | 2 +-
rust/bindings/lib.rs | 3 +++
rust/kernel/alloc/allocator_test.rs | 2 +-
rust/kernel/alloc/kvec.rs | 4 ++--
rust/kernel/block/mq/operations.rs | 2 +-
rust/kernel/block/mq/request.rs | 6 +++---
rust/kernel/device.rs | 4 ++--
rust/kernel/device_id.rs | 4 ++--
rust/kernel/devres.rs | 19 ++++++++++---------
rust/kernel/dma.rs | 6 +++---
rust/kernel/error.rs | 2 +-
rust/kernel/firmware.rs | 3 ++-
rust/kernel/fs/file.rs | 2 +-
rust/kernel/io.rs | 18 +++++++++---------
rust/kernel/kunit.rs | 11 +++++++----
rust/kernel/list/impl_list_item_mod.rs | 2 +-
rust/kernel/miscdevice.rs | 2 +-
rust/kernel/net/phy.rs | 4 ++--
rust/kernel/of.rs | 6 +++---
rust/kernel/pci.rs | 11 +++++++----
rust/kernel/platform.rs | 4 +++-
rust/kernel/print.rs | 6 +++---
rust/kernel/seq_file.rs | 2 +-
rust/kernel/str.rs | 14 +++++++-------
rust/kernel/sync/poll.rs | 2 +-
rust/kernel/time/hrtimer/pin.rs | 2 +-
rust/kernel/time/hrtimer/pin_mut.rs | 2 +-
rust/kernel/uaccess.rs | 4 ++--
rust/kernel/workqueue.rs | 12 ++++++------
rust/uapi/lib.rs | 3 +++
31 files changed, 96 insertions(+), 74 deletions(-)
---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250307-ptr-as-ptr-21b1867fc4d4
prerequisite-change-id: 20250409-container-of-mutness-b153dab4388d:v1
prerequisite-patch-id: 53d5889db599267f87642bb0ae3063c29bc24863
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
Some small fixes for arch_timer_edge_cases that I stumbled upon
while debugging failures for this selftest on ampere-one.
Changes since v1:
* determine effective counter width based on suggestions from Marc
Changes since v2:
* new patch to fix xval initialization
I've done tests with this on various machines - no issues during
several hundreds of test runs.
v1: https://lore.kernel.org/kvmarm/20250509143312.34224-1-sebott@redhat.com/
v2: https://lore.kernel.org/kvmarm/20250527142434.25209-1-sebott@redhat.com/
Sebastian Ott (4):
KVM: arm64: selftests: fix help text for arch_timer_edge_cases
KVM: arm64: selftests: fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: arch_timer_edge_cases - fix xval init
KVM: arm64: selftests: arch_timer_edge_cases - determine effective counter width
.../kvm/arm64/arch_timer_edge_cases.c | 39 ++++++++++++-------
1 file changed, 25 insertions(+), 14 deletions(-)
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
--
2.49.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the DualPI2 patch v17.
This patch serise adds DualPI Improved with a Square (DualPI2) with following features:
* Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague)
* Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332
* Configurable overload strategies
* Use of sojourn time to reliably estimate queue delay
* Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues
For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332).
Best regards,
Chia-Yu
---
v17 (25-May-2025, Resent at 10-Jun-2025)
- Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>)
- Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>)
- Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>)
- Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>)
- Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>)
- Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>)
v16 (16-MAy-2025)
- Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>)
- Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>)
- Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>)
v15 (09-May-2025)
- Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>)
- Fix one typo in comment of #2
- Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h
v14 (05-May-2025)
- Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun
- Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>)
- Update dualpi2.json to align with the updated variable order in pkt_sched.h
- Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>)
v13 (26-Apr-2025)
- Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>)
v12 (22-Apr-2025)
- Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>)
- Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>)
- Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>)
v11 (15-Apr-2025)
- Replace hstimer_init with hstimer_setup in sch_dualpi2.c
v10 (25-Mar-2025)
- Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>)
- Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>)
v9 (16-Mar-2025)
- Fix mem_usage error in previous version
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking.
In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets.
This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0.
Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 11.55 11.70 ms 350
TCP upload avg : 18.96 N/A Mbits/s 350
TCP upload sum : 18.96 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 10.81 10.70 ms 350
TCP upload avg : 18.91 N/A Mbits/s 350
TCP upload sum : 18.91 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 12.61 12.80 ms 350
TCP upload avg : 9.48 N/A Mbits/s 350
TCP upload sum : 9.48 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.06 10.80 ms 350
TCP upload avg : 9.43 N/A Mbits/s 350
TCP upload sum : 9.43 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 40.86 37.45 ms 350
TCP upload avg : 0.88 N/A Mbits/s 350
TCP upload sum : 0.88 N/A Mbits/s 350
TCP upload::1 : 0.88 0.97 Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.07 10.40 ms 350
TCP upload avg : 0.55 N/A Mbits/s 350
TCP upload sum : 0.55 N/A Mbits/s 350
TCP upload::1 : 0.55 0.59 Mbits/s 350
v8 (11-Mar-2025)
- Fix warning messages in v7
v7 (07-Mar-2025)
- Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>)
v6 (04-Mar-2025)
- Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>)
- Update test cases in dualpi2.json
- Update commit message
v5 (22-Feb-2025)
- A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL:
Unshaped 1gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
- Summary of tcp_4down run 'MQ + FQ_PIE'
avg median # data pts
Ping (ms) ICMP : 1.21 1.37 ms 350
TCP download avg : 235.42 N/A Mbits/s 350
TCP download sum : 941.61 N/A Mbits/s 350
TCP download::1 : 232.54 233.13 Mbits/s 350
TCP download::2 : 232.52 232.80 Mbits/s 350
TCP download::3 : 233.14 233.78 Mbits/s 350
TCP download::4 : 243.41 241.48 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2'
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
Unshaped 1gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
Unshaped 10gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 0.22 0.23 ms 350
TCP download avg : 2354.08 N/A Mbits/s 350
TCP download sum : 9416.31 N/A Mbits/s 350
TCP download::1 : 2353.65 2352.81 Mbits/s 350
TCP download::2 : 2354.54 2354.21 Mbits/s 350
TCP download::3 : 2353.56 2353.78 Mbits/s 350
TCP download::4 : 2354.56 2354.45 Mbits/s 350
- Summary of tcp_4down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 0.20 0.19 ms 350
TCP download avg : 2354.76 N/A Mbits/s 350
TCP download sum : 9419.04 N/A Mbits/s 350
TCP download::1 : 2354.77 2353.89 Mbits/s 350
TCP download::2 : 2353.41 2354.29 Mbits/s 350
TCP download::3 : 2356.18 2354.19 Mbits/s 350
TCP download::4 : 2354.68 2353.15 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 0.24 0.24 ms 350
TCP download avg : 2354.11 N/A Mbits/s 350
TCP download sum : 9416.43 N/A Mbits/s 350
TCP download::1 : 2354.75 2353.93 Mbits/s 350
TCP download::2 : 2353.15 2353.75 Mbits/s 350
TCP download::3 : 2353.49 2353.72 Mbits/s 350
TCP download::4 : 2355.04 2353.73 Mbits/s 350
Unshaped 10gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 7.57 8.69 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9467.82 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 7.82 8.91 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9468.42 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 6.87 7.93 ms 350
TCP download avg : 73.95 N/A Mbits/s 350
TCP download sum : 9465.87 N/A Mbits/s 350
From the results shown above, we see small differences between combinations.
- Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>)
- Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>)
- Update license identifier (Dave Taht <dave.taht(a)gmail.com>)
- Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>)
- Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>)
- Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c
- Fix step_thresh in packets
- Update code comments in sch_dualpi2.c
v4 (22-Oct-2024)
- Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>)
- Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Fix line length warning.
v3 (19-Oct-2024)
- Fix compilaiton error
- Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Oct-2024)
- Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
- Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Fix line length warning
---
Chia-Yu Chang (4):
sched: Struct definition and parsing of dualpi2 qdisc
sched: Dump configuration and statistics of dualpi2 qdisc
selftests/tc-testing: Add selftests for qdisc DualPI2
Documentation: netlink: specs: tc: Add DualPI2 specification
Koen De Schepper (1):
sched: Add enqueue/dequeue of dualpi2 qdisc
Documentation/netlink/specs/tc.yaml | 156 +++
include/net/dropreason-core.h | 6 +
include/uapi/linux/pkt_sched.h | 68 +
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/sch_dualpi2.c | 1146 +++++++++++++++++
tools/testing/selftests/tc-testing/config | 1 +
.../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++
tools/testing/selftests/tc-testing/tdc.sh | 1 +
9 files changed, 1645 insertions(+)
create mode 100644 net/sched/sch_dualpi2.c
create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json
--
2.34.1
┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v19:
- irq part already in v6.16-rc1, only missed pcie/dts part
- rebase to v6.16-rc1
- update commit message for patch IMMUTABLE check.
- Link to v18: https://lore.kernel.org/r/20250414-ep-msi-v18-0-f69b49917464@nxp.com
Changes in v18:
- pci-ep.yaml: sort property order, fix maxvalue to 0x7ffff for msi-map-mask and
iommu-map-mask
- Link to v17: https://lore.kernel.org/r/20250407-ep-msi-v17-0-633ab45a31d0@nxp.com
Changes in v17:
- move document part to pci-ep.yaml
- Link to v16: https://lore.kernel.org/r/20250404-ep-msi-v16-0-d4919d68c0d0@nxp.com
Changes in v16:
- remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
because there are better patches, which under review.
- Add document for pcie-ep msi-map usage
- other change to see each patch's change log
About IMMUTABLE (No change for this part, tglx provide feedback)
> - This IMMUTABLE thing serves no purpose, because you don't randomly
> plug this end-point block on any MSI controller. They come as part
> of an SoC.
"Yes and no. The problem is that the EP implementation is meant to be a
generic library and while GIC-ITS guarantees immutability of the
address/data pair after setup, there are architectures (x86, loongson,
riscv) where the base MSI controller does not and immutability is only
achieved when interrupt remapping is enabled. The latter can be disabled
at boot-time and then the EP implementation becomes a lottery across
affinity changes.
That was my concern about this library implementation and that's why I
asked for a mechanism to ensure that the underlying irqdomain provides a
immutable address/data pair.
So it does not matter for GIC-ITS, but in the larger picture it matters.
Thanks,
tglx
"
So it does not matter for GIC-ITS, but in the larger picture it matters.
- Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com
Changes in v15:
- rebase to v6.14-rc1
- fix build issue find by kernel test robot
- Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (10):
PCI: endpoint: Set ID and of_node for function driver
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
pci: imx6: Add helper function imx_pcie_add_lut_by_rid()
pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode
arm64: dts: imx95: Add msi-map for pci-ep device
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/misc/pci_endpoint_test.c | 82 ++++++++++++
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Makefile | 1 +
drivers/pci/endpoint/functions/pci-epf-test.c | 142 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 90 +++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 48 +++++++
include/linux/pci-ep-msi.h | 28 ++++
include/linux/pci-epf.h | 21 +++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 28 ++++
11 files changed, 459 insertions(+), 8 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
---
Frank Li <Frank.Li(a)nxp.com>
This improves the expressiveness of unprivileged BPF by inserting
speculation barriers instead of rejecting the programs.
The approach was previously presented at LPC'24 [1] and RAID'24 [2].
To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects
potentially-dangerous unprivileged BPF programs as of
commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted
branches"). In [2], we have analyzed 364 object files from open source
projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf
Examples, Parca, and Prevail) and found that this affects 31% to 54% of
programs.
To resolve this in the majority of cases this patchset adds a fall-back
for mitigating Spectre v1 using speculation barriers. The kernel still
optimistically attempts to verify all speculative paths but uses
speculation barriers against v1 when unsafe behavior is detected. This
allows for more programs to be accepted without disabling the BPF
Spectre mitigations (e.g., by setting cpu_mitigations_off()).
For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct (not only loads). See patch 7 ("bpf: Fall back to nospec for
Spectre v1") for a detailed description and references to the relevant
vendor documentation (AMD and Intel x86-64, ARM64, and PowerPC).
In [1] we have measured the overhead of this approach relative to having
mitigations off and including the upstream Spectre v4 mitigations. For
event tracing and stack-sampling profilers, we found that mitigations
increase BPF program execution time by 0% to 62%. For the Loxilb network
load balancer, we have measured a 14% slowdown in SCTP performance but
no significant slowdown for TCP. This overhead only applies to programs
that were previously rejected.
I reran the expressiveness-evaluation with v6.14 and made sure the main
results still match those from [1] and [2] (which used v6.5).
Main design decisions are:
* Do not use separate bytecode insns for v1 and v4 barriers (inspired by
Daniel Borkmann's question at LPC). This simplifies the verifier
significantly and has the only downside that performance on PowerPC is
not as high as it could be.
* Allow archs to still disable v1/v4 mitigations separately by setting
bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can
benefit from improved BPF expressiveness / performance if they are not
vulnerable (e.g., ARM64 for v4 in the kernel).
* Do not remove the empty BPF_NOSPEC implementation for backends for
which it is unknown whether they are vulnerable to Spectre v1.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
Changes:
* v3 -> v4:
- Remove insn parameter from do_check_insn() and extract
process_bpf_exit_full as a function as requested by Eduard
- Investigate apparent sanitize_check_bounds() bug reported by
Kartikeya (does appear to not be a bug but only confusing code),
sent separate patch to document it and add an assert
- Remove already-merged commit 1 ("selftests/bpf: Fix caps for
__xlated/jited_unpriv")
- Drop former commit 10 ("bpf: Allow nospec-protected var-offset stack
access") as it did not include a test and there are other places
where var-off is rejected. Also, none of the tested real-world
programs used var-off in the paper. Therefore keep the old behavior
for now and potentially prepare a patch that converts all cases
later if required.
- Add link to AMD lfence and PowerPC speculation barrier (ori 31,31,0)
documentation
- Move detailed barrier documentation to commit 7 ("bpf: Fall back to
nospec for Spectre v1")
- Link to v3: https://lore.kernel.org/all/20250501073603.1402960-1-luis.gerhorst@fau.de/
* v2 -> v3:
- Fix
https://lore.kernel.org/oe-kbuild-all/202504212030.IF1SLhz6-lkp@intel.com/
and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out
of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to
filter.h (where similar bpf_jit_*() prototypes live) as they would
still have to be duplicated in bpf.h to be usable to
bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an
option).
- Fix
https://lore.kernel.org/oe-kbuild-all/202504220035.SoGveGpj-lkp@intel.com/
by moving the variable declarations out of the switch-case.
- Build touched C files with W=2 and bpf config on x86 to check that
there are no other warnings introduced.
- Found 3 more checkpatch warnings that can be fixed without degrading
readability.
- Rebase to bpf-next 2025-05-01
- Link to v2: https://lore.kernel.org/bpf/20250421091802.3234859-1-luis.gerhorst@fau.de/
* v1 -> v2:
- Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11
("bpf: Fall back to nospec for spec path verification") as suggested
by Alexei. This series therefore no longer changes push_stack() to
return PTR_ERR.
- Add detailed explanation of how lfence works internally and how it
affects the algorithm.
- Add tests checking that nospec instructions are inserted in expected
locations using __xlated_unpriv as suggested by Eduard (also,
include a fix for __xlated_unpriv)
- Add a test for the mitigations from the description of
commit 9183671af6db ("bpf: Fix leakage under speculation on
mispredicted branches")
- Remove unused variables from do_check[_insn]() as suggested by
Eduard.
- Remove INSN_IDX_MODIFIED to improve readability as suggested by
Eduard. This also causes the nospec_result-check to run (and fail)
for jumping-ops. Add a warning to assert that this check must never
succeed in that case.
- Add details on the safety of patch 10 ("bpf: Allow nospec-protected
var-offset stack access") based on the feedback on v1.
- Rebase to bpf-next-250420
- Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/
* RFC -> v1:
- rebase to bpf-next-250313
- tests: mark expected successes/new errors
- add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in
bpf_bypass_spec_v1/v4()
- ensure that nospec with v1-support is implemented for archs for
which GCC supports speculation barriers, except for MIPS
- arm64: emit speculation barrier
- powerpc: change nospec to include v1 barrier
- discuss potential security (archs that do not impl. BPF nospec) and
performance (only PowerPC) regressions
- Link to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/
Luis Gerhorst (9):
bpf: Move insn if/else into do_check_insn()
bpf: Return -EFAULT on misconfigurations
bpf: Return -EFAULT on internal errors
bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4()
bpf, arm64, powerpc: Change nospec to include v1 barrier
bpf: Rename sanitize_stack_spill to nospec_result
bpf: Fall back to nospec for Spectre v1
selftests/bpf: Add test for Spectre v1 mitigation
bpf: Fall back to nospec for sanitization-failures
arch/arm64/net/bpf_jit.h | 5 +
arch/arm64/net/bpf_jit_comp.c | 28 +-
arch/powerpc/net/bpf_jit_comp64.c | 80 ++-
include/linux/bpf.h | 11 +-
include/linux/bpf_verifier.h | 3 +-
include/linux/filter.h | 2 +-
kernel/bpf/core.c | 32 +-
kernel/bpf/verifier.c | 633 ++++++++++--------
tools/testing/selftests/bpf/progs/bpf_misc.h | 4 +
.../selftests/bpf/progs/verifier_and.c | 8 +-
.../selftests/bpf/progs/verifier_bounds.c | 66 +-
.../bpf/progs/verifier_bounds_deduction.c | 45 +-
.../selftests/bpf/progs/verifier_map_ptr.c | 20 +-
.../selftests/bpf/progs/verifier_movsx.c | 16 +-
.../selftests/bpf/progs/verifier_unpriv.c | 65 +-
.../bpf/progs/verifier_value_ptr_arith.c | 101 ++-
.../selftests/bpf/verifier/dead_code.c | 3 +-
tools/testing/selftests/bpf/verifier/jmp32.c | 33 +-
tools/testing/selftests/bpf/verifier/jset.c | 10 +-
19 files changed, 755 insertions(+), 410 deletions(-)
base-commit: cd2e103d57e5615f9bb027d772f93b9efd567224
--
2.49.0
The mm selftests are timing out with the current 180-second limit.
Testing shows that run_vmtests.sh takes approximately 11 minutes
(664 seconds) to complete.
Increase the timeout to 900 seconds (15 minutes) to provide sufficient
buffer for the tests to complete successfully.
Signed-off-by: Shivank Garg <shivankg(a)amd.com>
---
tools/testing/selftests/mm/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/settings b/tools/testing/selftests/mm/settings
index a953c96aa16e..e2206265f67c 100644
--- a/tools/testing/selftests/mm/settings
+++ b/tools/testing/selftests/mm/settings
@@ -1 +1 @@
-timeout=180
+timeout=900
--
2.43.0
Hello everyone,
The schedule for the Automated Testing Summit (ATS) 2025 is now live!
You can now explore the full program and speaker list at:
🔗 https://ats25.sched.com/
This year’s ATS will be packed with talks and discussions focused on scaling test infrastructure, improving collaboration across projects, and pushing the boundaries of automation in the Linux ecosystem.
📍 ATS 2025 will take place as a co-located event at the Open Source Summit North America, on June 26th in Denver, CO.
If you haven’t yet registered, you can do so here:
🔗 https://events.linuxfoundation.org/open-source-summit-north-america/feature…
You can attend in person or virtually. We look forward to seeing you there!
Best regards,
The KernelCI Team
--
Gustavo Padovan
Collabora Ltd.
The test file for the IR decoder used single-line comments at the top
to document its purpose and licensing, which is inconsistent with the style
used throughout the Linux kernel.
in this patch i converted the file header to a proper multi-line comment block
(/*) that aligns with standard kernel practices. This improves
readability, consistency across selftests, and ensures the license and
documentation are clearly visible in a familiar format.
No functional changes have been made.
Signed-off-by: Abdelrahman Fekry <Abdelrahmanfekry375(a)gmail.com>
---
tools/testing/selftests/ir/ir_loopback.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/ir/ir_loopback.c b/tools/testing/selftests/ir/ir_loopback.c
index f4a15cbdd5ea..2de4a6296f35 100644
--- a/tools/testing/selftests/ir/ir_loopback.c
+++ b/tools/testing/selftests/ir/ir_loopback.c
@@ -1,14 +1,15 @@
// SPDX-License-Identifier: GPL-2.0
-// test ir decoder
-//
-// Copyright (C) 2018 Sean Young <sean(a)mess.org>
-
-// When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback
-// will send this IR to the receiver side, where we try to read the decoded
-// IR. Decoding happens in a separate kernel thread, so we will need to
-// wait until that is scheduled, hence we use poll to check for read
-// readiness.
-
+/* Copyright (C) 2018 Sean Young <sean(a)mess.org>
+ *
+ * Selftest for IR decoder
+ *
+ *
+ * When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback
+ * will send this IR to the receiver side, where we try to read the decoded
+ * IR. Decoding happens in a separate kernel thread, so we will need to
+ * wait until that is scheduled, hence we use poll to check for read
+ * readiness.
+*/
#include <linux/lirc.h>
#include <errno.h>
#include <stdio.h>
--
2.25.1
IDT event delivery has a debug hole in which it does not generate #DB
upon returning to userspace before the first userspace instruction is
executed if the Trap Flag (TF) is set.
FRED closes this hole by introducing a software event flag, i.e., bit
17 of the augmented SS: if the bit is set and ERETU would result in
RFLAGS.TF = 1, a single-step trap will be pending upon completion of
ERETU.
However I overlooked properly setting and clearing the bit in different
situations. Thus when FRED is enabled, if the Trap Flag (TF) is set
without an external debugger attached, it can lead to an infinite loop
in the SIGTRAP handler. To avoid this, the software event flag in the
augmented SS must be cleared, ensuring that no single-step trap remains
pending when ERETU completes.
This patch set combines the fix [1] and its corresponding selftest [2]
(requested by Dave Hansen) into one patch set.
[1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/
[2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/
This patch set is based on tip/x86/urgent branch.
Link to v5 of this patch set:
https://lore.kernel.org/lkml/20250606174528.1004756-1-xin@zytor.com/
Changes in v6:
*) Replace a "sub $128, %rsp" with "add $-128, %rsp" (hpa).
*) Declared loop_count_on_same_ip inside sigtrap() (Sohil).
*) s/sigtrap/SIGTRAP (Sohil).
*) Add TB from Sohil to the first patch.
Xin Li (Intel) (2):
x86/fred/signal: Prevent immediate repeat of single step trap on
return from SIGTRAP handler
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
arch/x86/include/asm/sighandling.h | 22 +++++
arch/x86/kernel/signal_32.c | 4 +
arch/x86/kernel/signal_64.c | 4 +
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/sigtrap_loop.c | 101 +++++++++++++++++++++
5 files changed, 132 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c
base-commit: dd2922dcfaa3296846265e113309e5f7f138839f
--
2.49.0
I had cause to look at the vfork() support for GCS and realised that we
don't have any direct test coverage, this series does so by adding
vfork() to nolibc and then using that in basic-gcs to provide some
simple vfork() coverage.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
tools/nolibc: Provide vfork()
kselftest/arm64: Add a test for vfork() with GCS
tools/include/nolibc/sys.h | 29 ++++++++++++
tools/testing/selftests/arm64/gcs/basic-gcs.c | 63 +++++++++++++++++++++++++++
2 files changed, 92 insertions(+)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250528-arm64-gcs-vfork-exit-4a7daf7652ee
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Mark Rutland's recent SME fixes updated the SME ABI to reject any
attempt to write FPSIMD register data via the streaming mode SVE
register set but did not update the sve-ptrace test to take account of
this, resulting in spurious failures. Update the test for this, and
also fix another preexisting issue I noticed while looking at this.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.16-rc1.
- Update fixes tag for patch 1.
- Link to v1: https://lore.kernel.org/r/20250523-kselftest-arm64-ssve-fixups-v1-0-65069a2…
---
Mark Brown (3):
kselftest/arm64: Fix check for setting new VLs in sve-ptrace
kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
tools/testing/selftests/arm64/fp/sve-ptrace.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250523-kselftest-arm64-ssve-fixups-b68ae61c1ebf
Best regards,
--
Mark Brown <broonie(a)kernel.org>
IDT event delivery has a debug hole in which it does not generate #DB
upon returning to userspace before the first userspace instruction is
executed if the Trap Flag (TF) is set.
FRED closes this hole by introducing a software event flag, i.e., bit
17 of the augmented SS: if the bit is set and ERETU would result in
RFLAGS.TF = 1, a single-step trap will be pending upon completion of
ERETU.
However I overlooked properly setting and clearing the bit in different
situations. Thus when FRED is enabled, if the Trap Flag (TF) is set
without an external debugger attached, it can lead to an infinite loop
in the SIGTRAP handler. To avoid this, the software event flag in the
augmented SS must be cleared, ensuring that no single-step trap remains
pending when ERETU completes.
This patch set combines the fix [1] and its corresponding selftest [2]
(requested by Dave Hansen) into one patch set.
[1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/
[2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/
This patch set is based on tip/x86/urgent branch as of today.
Link to v4 of this patch set:
https://lore.kernel.org/lkml/20250605181020.590459-1-xin@zytor.com/
Changes in v5:
*) Accurately rephrase the shortlog (hpa).
*) Do "sub $-128, %rsp" rather than "add $128, %rsp", which is more
efficient in code size (hpa).
*) Add TB from Sohil.
*) Add Cc: stable(a)vger.kernel.org to all patches.
Xin Li (Intel) (2):
x86/fred/signal: Prevent immediate repeat of single step trap on
return from SIGTRAP handler
selftests/x86: Add a test to detect infinite sigtrap handler loop
arch/x86/include/asm/sighandling.h | 22 +++++
arch/x86/kernel/signal_32.c | 4 +
arch/x86/kernel/signal_64.c | 4 +
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/sigtrap_loop.c | 98 ++++++++++++++++++++++
5 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c
base-commit: dd2922dcfaa3296846265e113309e5f7f138839f
--
2.49.0
Hello,
this series is a revival of Xu Kuhoai's work to enable larger arguments
count for BPF programs on ARM64 ([1]). His initial series received some
positive feedback, but lacked some specific case handling around
arguments alignment (see AAPCS64 C.14 rule in section 6.8.2, [2]). There
as been another attempt from Puranjay Mohan, which was unfortunately
missing the same thing ([3]). Since there has been some time between
those series and this new one, I chose to send it as a new series
rather than a new revision of the existing series.
To support the increased argument counts and arguments larger than
registers size (eg: structures), the trampoline does the following:
- for bpf programs: arguments are retrieved from both registers and the
function stack, and pushed in the trampoline stack as an array of u64
to generate the programs context. It is then passed by pointer to the
bpf programs
- when the trampoline is in charge of calling the original function: it
restores the registers content, and generates a new stack layout for
the additional arguments that do not fit in registers.
This new attempt is based on Xu's series and aims to handle the
missing alignment concern raised in the reviews discussions. The main
novelties are then around arguments alignments:
- the first commit is exposing some new info in the BTF function model
passed to the JIT compiler to allow it to deduce the needed alignment
when configuring the trampoline stack
- the second commit is taken from Xu's series, and received the
following modifications:
- the calc_aux_args computes an expected alignment for each argument
- the calc_aux_args computes two different stack space sizes: the one
needed to store the bpf programs context, and the original function
stacked arguments (which needs alignment). Those stack sizes are in
bytes instead of "slots"
- when saving/restoring arguments for bpf program or for the original
function, make sure to align the load/store accordingly, when
relevant
- a few typos fixes and some rewording, raised by the review on the
original series
- the last commit introduces some explicit tests that ensure that the
needed alignment is enforced by the trampoline
I marked the series as RFC because it appears that the new tests trigger
some failures in CI on x86 and s390, despite the series not touching any
code related to those architectures. Some very early investigation/gdb
debugging on the x86 side seems to hint that it could be related to the
same missing alignment too (based on section 3.2.3 in [4], and so the
x86 trampoline would need the same alignment handling ?). For s390 it
looks less clear, as all values captured from the bpf test program are
set to 0 in the CI output, and I don't have the proper setup yet to
check the low level details. I am tempted to isolate those new tests
(which were actually useful to spot real issues while tuning the ARM64
trampoline) and add them to the relevant DENYLIST files for x86/s390,
but I guess this is not the right direction, so I would gladly take a
second opinion on this.
[1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com…
[2] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#id82
[3] https://lore.kernel.org/bpf/20240705125336.46820-1-puranjay@kernel.org/
[4] https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (3):
bpf: add struct largest member size in func model
bpf/selftests: add tests to validate proper arguments alignment on ARM64
bpf/selftests: enable tracing tests for ARM64
Xu Kuohai (1):
bpf, arm64: Support up to 12 function arguments
arch/arm64/net/bpf_jit_comp.c | 235 ++++++++++++++++-----
include/linux/bpf.h | 1 +
kernel/bpf/btf.c | 25 +++
tools/testing/selftests/bpf/DENYLIST.aarch64 | 3 -
.../selftests/bpf/prog_tests/tracing_struct.c | 23 ++
tools/testing/selftests/bpf/progs/tracing_struct.c | 10 +-
.../selftests/bpf/progs/tracing_struct_many_args.c | 67 ++++++
.../testing/selftests/bpf/test_kmods/bpf_testmod.c | 50 +++++
8 files changed, 357 insertions(+), 57 deletions(-)
---
base-commit: 91e7eb701b4bc389e7ddfd80ef6e82d1a6d2d368
change-id: 20250220-many_args_arm64-8bd3747e6948
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Currently each of the iommu page table formats duplicates all of the logic
to maintain the page table and perform map/unmap/etc operations. There are
several different versions of the algorithms between all the different
formats. The io-pgtable system provides an interface to help isolate the
page table code from the iommu driver, but doesn't provide tools to
implement the common algorithms.
This makes it very hard to improve the state of the pagetable code under
the iommu domains as any proposed improvement needs to alter a large
number of different driver code paths. Combined with a lack of software
based testing this makes improvement in this area very hard.
iommufd wants several new page table operations:
- More efficient map/unmap operations, using iommufd's batching logic
- unmap that returns the physical addresses into a batch as it progresses
- cut that allows splitting areas so large pages can have holes
poked in them dynamically (ie guestmemfd hitless shared/private
transitions)
- More agressive freeing of table memory to avoid waste
- Fragmenting large pages so that dirty tracking can be more granular
- Reassembling large pages so that VMs can run at full IO performance
in migration/dirty tracking error flows
- KHO integration for kernel live upgrade
Together these are algorithmically complex enough to be a very significant
task to go and implement in all the page table formats we support. Just
the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86
PAE / AMDv1 / VT-D SS / RISCV)
Instead of doing the duplicated work, this series takes the first step to
consolidate the algorithms into one places. In spirit it is similar to the
work Christoph did a few years back to pull the redundant get_user_pages()
implementations out of the arch code into core MM. This unlocked a great
deal of improvement in that space in the following years. I would like to
see the same benefit in iommu as well.
My first RFC showed a bigger picture with all most all formats and more
algorithms. This series reorganizes that to be narrowly focused on just
enough to convert the AMD driver to use the new mechanism.
kunit tests are provided that allow good testing of the algorithms and all
formats on x86, nothing is arch specific.
AMD is one of the simpler options as the HW is quite uniform with few
different options/bugs while still requiring the complicated contiguous
pages support. The HW also has a very simple range based invalidation
approach that is easy to implement.
The AMD v1 and AMD v2 page table formats are implemented bit for bit
identical to the current code, tested using a compare kunit test that
checks against the io-pgtable version (on github, see below).
Updating the AMD driver to replace the io-pgtable layer with the new stuff
is fairly straightforward now. The layering is fixed up in the new version
so that all the invalidation goes through function pointers.
Several small fixing patches have come out of this as I've been fixing the
problems that the test suite uncovers in the current code, and
implementing the fixed version in iommupt.
On performance, there is a quite wide variety of implementation designs
across all the drivers. Looking at some key performance across
the main formats:
iommu_map():
pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
2^12, 53,66 , 51,63 , 19.19 (AMDV1)
256*2^12, 386,1909 , 367,1795 , 79.79
256*2^21, 362,1633 , 355,1556 , 77.77
2^12, 56,62 , 52,59 , 11.11 (AMDv2)
256*2^12, 405,1355 , 357,1292 , 72.72
256*2^21, 393,1160 , 358,1114 , 67.67
2^12, 55,65 , 53,62 , 14.14 (VTD second stage)
256*2^12, 391,518 , 332,512 , 35.35
256*2^21, 383,635 , 336,624 , 46.46
2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit)
256*2^12, 380,389 , 361,369 , 2.02
256*2^21, 358,419 , 345,400 , 13.13
iommu_unmap():
pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
2^12, 69,88 , 65,85 , 23.23 (AMDv1)
256*2^12, 353,6498 , 331,6029 , 94.94
256*2^21, 373,6014 , 360,5706 , 93.93
2^12, 71,72 , 66,69 , 4.04 (AMDv2)
256*2^12, 228,891 , 206,871 , 76.76
256*2^21, 254,721 , 245,711 , 65.65
2^12, 69,87 , 65,82 , 20.20 (VTD second stage)
256*2^12, 210,321 , 200,315 , 36.36
256*2^21, 255,349 , 238,342 , 30.30
2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit)
256*2^12, 521,357 , 447,346 , -29.29
256*2^21, 489,358 , 433,345 , -25.25
* Above numbers include additional patches to remove the iommu_pgsize()
overheads. gcc 13.3.0, i7-12700
This version provides fairly consistent performance across formats. ARM
unmap performance is quite different because this version supports
contiguous pages and uses a very different algorithm for unmapping. Though
why it is so worse compared to AMDv1 I haven't figured out yet.
The per-format commits include a more detailed chart.
There is a second branch:
https://github.com/jgunthorpe/linux/commits/iommu_pt_all
Containing supporting work and future steps:
- ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats
- VT-D second stage format
- DART v1 & v2 format
- Draft of a iommufd 'cut' operation to break down huge pages
- Draft of support for a DMA incoherent HW page table walker
- A compare test that checks the iommupt formats against the iopgtable
interface, including updating AMD to have a working iopgtable and patches
to make VT-D have an iopgtable for testing.
- A performance test to micro-benchmark map and unmap against iogptable
My strategy is to go one by one for the drivers:
- AMD driver conversion
- RISCV page table and driver
- Intel VT-D driver and VTDSS page table
- ARM SMMUv3
And concurrently work on the algorithm side:
- debugfs content dump, like VT-D has
- Cut support
- Increase/Decrease page size support
- map/unmap batching
- KHO
As we make more algorithm improvements the value to convert the drivers
increases.
This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt
v1:
- AMD driver only, many code changes
RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/
Alejandro Jimenez (1):
iommu/amd: Use the generic iommu page table
Jason Gunthorpe (14):
genpt: Generic Page Table base API
genpt: Add Documentation/ files
iommupt: Add the basic structure of the iommu implementation
iommupt: Add the AMD IOMMU v1 page table format
iommupt: Add iova_to_phys op
iommupt: Add unmap_pages op
iommupt: Add map_pages op
iommupt: Add read_and_clear_dirty op
iommupt: Add a kunit test for Generic Page Table
iommupt: Add a mock pagetable format for iommufd selftest to use
iommufd: Change the selftest to use iommupt instead of xarray
iommupt: Add the x86 64 bit page table format
iommu/amd: Remove AMD io_pgtable support
iommupt: Add a kunit test for the IOMMU implementation
.clang-format | 1 +
Documentation/driver-api/generic_pt.rst | 105 ++
Documentation/driver-api/index.rst | 1 +
drivers/iommu/Kconfig | 2 +
drivers/iommu/Makefile | 1 +
drivers/iommu/amd/Kconfig | 5 +-
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 1 -
drivers/iommu/amd/amd_iommu_types.h | 109 +-
drivers/iommu/amd/io_pgtable.c | 560 --------
drivers/iommu/amd/io_pgtable_v2.c | 370 ------
drivers/iommu/amd/iommu.c | 493 ++++---
drivers/iommu/generic_pt/.kunitconfig | 13 +
drivers/iommu/generic_pt/Kconfig | 72 ++
drivers/iommu/generic_pt/fmt/Makefile | 26 +
drivers/iommu/generic_pt/fmt/amdv1.h | 407 ++++++
drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 +
drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 +
drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 +
drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 +
drivers/iommu/generic_pt/fmt/iommu_template.h | 48 +
drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 12 +
drivers/iommu/generic_pt/fmt/x86_64.h | 241 ++++
drivers/iommu/generic_pt/iommu_pt.h | 1146 +++++++++++++++++
drivers/iommu/generic_pt/kunit_generic_pt.h | 721 +++++++++++
drivers/iommu/generic_pt/kunit_iommu.h | 183 +++
drivers/iommu/generic_pt/kunit_iommu_pt.h | 451 +++++++
drivers/iommu/generic_pt/pt_common.h | 351 +++++
drivers/iommu/generic_pt/pt_defs.h | 312 +++++
drivers/iommu/generic_pt/pt_fmt_defaults.h | 193 +++
drivers/iommu/generic_pt/pt_iter.h | 638 +++++++++
drivers/iommu/generic_pt/pt_log2.h | 130 ++
drivers/iommu/io-pgtable.c | 4 -
drivers/iommu/iommufd/Kconfig | 1 +
drivers/iommu/iommufd/iommufd_test.h | 11 +-
drivers/iommu/iommufd/selftest.c | 439 +++----
include/linux/generic_pt/common.h | 166 +++
include/linux/generic_pt/iommu.h | 264 ++++
include/linux/io-pgtable.h | 2 -
tools/testing/selftests/iommu/iommufd.c | 60 +-
tools/testing/selftests/iommu/iommufd_utils.h | 12 +
41 files changed, 6046 insertions(+), 1574 deletions(-)
create mode 100644 Documentation/driver-api/generic_pt.rst
delete mode 100644 drivers/iommu/amd/io_pgtable.c
delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c
create mode 100644 drivers/iommu/generic_pt/.kunitconfig
create mode 100644 drivers/iommu/generic_pt/Kconfig
create mode 100644 drivers/iommu/generic_pt/fmt/Makefile
create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h
create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h
create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c
create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h
create mode 100644 drivers/iommu/generic_pt/iommu_pt.h
create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h
create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h
create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h
create mode 100644 drivers/iommu/generic_pt/pt_common.h
create mode 100644 drivers/iommu/generic_pt/pt_defs.h
create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h
create mode 100644 drivers/iommu/generic_pt/pt_iter.h
create mode 100644 drivers/iommu/generic_pt/pt_log2.h
create mode 100644 include/linux/generic_pt/common.h
create mode 100644 include/linux/generic_pt/iommu.h
base-commit: db37090502f67e46541e53b91f00bbd565c96bd0
--
2.43.0
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons:
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address the problem would be to add messages such as
"expected warning backtraces start/end here" to the kernel log.
However, that would again require filter scripts, might result in
missing real problematic warning backtraces triggered while the test
is running, and the irrelevant backtrace(s) would still clog the
kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum.
Overview:
Patch#1 Introduces the suppression infrastructure.
Patch#2 Mitigate the impact at WARN*() sites.
Patch#3 Adds selftests to validate the functionality.
Patch#4 Demonstrates real-world usage in the DRM subsystem.
Patch#5 Documents the new API and usage guidelines.
Design Notes:
The objective is to suppress unwanted WARN*() generated messages.
Although most major architectures share common bug handling via `lib/bug.c`
and `report_bug()`, some minor or legacy architectures still rely on their
own platform-specific handling. This divergence must be considered in any
such feature. Additionally, a key challenge in implementing this feature is
the fragmentation of `WARN*()` messages emission: specific part in the
macro, common with BUG*() part in the exception handler.
As a result, any intervention to suppress the message must occur before the
illegal instruction.
Lessons from the Previous Attempt
In earlier iterations, suppression logic was added inside the
`__report_bug()` function to intercept WARN*() messages not producing
messages in the macro.
To implement the check in the check in the bug handler code, two strategies
were considered:
* Strategy #1: Use `kallsyms` to infer the originating functionid, namely
a pointer to the function. Since in any case, the user interface relies
on function names, they must be translated in addresses at suppression-
time or at check-time.
Assuming to translate at suppression-time, the `kallsyms` subsystem needs
to be used to determine the symbol address from the name, and again to
produce the functionid from `bugaddr`. This approach proved unreliable
due to compiler-induced transformations such as inlining, cloning, and
code fragmentation. Attempts to preventing them is also unconvenient
because several `WARN()` sites are in functions intentionally declared
as `__always_inline`.
* Strategy #2: Store function name `__func__` in `struct bug_entry` in
the `__bug_table`. This implementation was used in the previous version.
However, `__func__` is a compiler-generated symbol, which complicates
relocation and linking in position-independent code. Workarounds such as
storing offsets from `.rodata` or embedding string literals directly into
the table would have significantly either increased complexity or
increase the __bug_table size.
Additionally, architectures not using the unified `BUG()` path would
still require ad-hoc handling. Because current WARN*() message production
strategy, a few WARN*() macros still need a check to suppress the part of
the message produced in the macro itself.
Current Proposal: Check Directly in the `WARN()` Macros.
This avoids the need for function symbol resolution or ELF section
modification.
Suppression is implemented directly in the `WARN*()` macros.
A helper function, `__kunit_is_suppressed_warning()`, is used to determine
whether suppression applies. It is marked as `noinstr`, since some `WARN*()`
sites reside in non-instrumentable sections. As it uses `strcmp`, a
`noinstr` version of `strcmp` was introduced.
The implementation is deliberately simple and avoids architecture-specific
optimizations to preserve portability. Since this mechanism compares
function names and is intended for test usage only, performance is not a
primary concern.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b…
and offers a more comprehensive solution of the problem discussed there.
Changes since RFC:
- Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests
Changes since v1:
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags
[I retained those tags since there have been no functional changes]
- Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by
default.
Changes since v2:
- Rebased to v6.9-rc2
- Added comments to drm warning suppression explaining why it is needed.
- Added patch to move conditional code in arch/sh/include/asm/bug.h
to avoid kerneldoc warning
- Added architecture maintainers to Cc: for architecture specific patches
- No functional changes
Changes since v3:
- Rebased to v6.14-rc6
- Dropped net: "kunit: Suppress lock warning noise at end of dev_addr_lists tests"
since 3db3b62955cd6d73afde05a17d7e8e106695c3b9
- Added __kunit_ and KUNIT_ prefixes.
- Tested on interessed architectures.
Changes since v4:
- Rebased to v6.15-rc7
- Dropped all code in __report_bug()
- Moved all checks in WARN*() macros.
- Dropped all architecture specific code.
- Made __kunit_is_suppressed_warning nice to noinstr functions.
Alessandro Carminati (2):
bug/kunit: Core support for suppressing warning backtraces
bug/kunit: Suppressing warning backtraces reduced impact on WARN*()
sites
Guenter Roeck (3):
Add unit tests to verify that warning backtrace suppression works.
drm: Suppress intentional warning backtraces in scaling unit tests
kunit: Add documentation for warning backtrace suppression API
Documentation/dev-tools/kunit/usage.rst | 30 ++++++-
drivers/gpu/drm/tests/drm_rect_test.c | 16 ++++
include/asm-generic/bug.h | 48 +++++++----
include/kunit/bug.h | 62 ++++++++++++++
include/kunit/test.h | 1 +
lib/kunit/Kconfig | 9 ++
lib/kunit/Makefile | 9 +-
lib/kunit/backtrace-suppression-test.c | 105 ++++++++++++++++++++++++
lib/kunit/bug.c | 54 ++++++++++++
9 files changed, 316 insertions(+), 18 deletions(-)
create mode 100644 include/kunit/bug.h
create mode 100644 lib/kunit/backtrace-suppression-test.c
create mode 100644 lib/kunit/bug.c
--
2.34.1
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
HW QUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which allows its IOMMU HW to directly access a queue memory owned
by a guest VM and allows a guest OS to control the HW queue direclty, to
avoid VM Exit overheads to improve the performance.
Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.
Meanwhile, a guest-owned queue needs the guest kernel to control the queue
by reading/writing its consumer and producer indexes, via MMIO acceses to
the hardware MMIO registers. Introduce an mmap infrastructure for iommufd
to support passing through a piece of MMIO region from the host physical
address space to the guest physical address space. The mmap info (offset/
length) used by an mmap syscall must be pre-allocated and returned to the
user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC
call. Thus, it requires a driver-specific user data support in the vIOMMU
allocation flow.
As a real-world use case, this series implements a HW QUEUE support in the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.
// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
4KB 1 35.7 us 5.3 us
16KB 1 41.8 us 6.8 us
64KB 1 68.9 us 9.9 us
128KB 1 109.0 us 12.6 us
256KB 1 187.1 us 18.0 us
4KB 2 96.9 us 6.8 us
16KB 2 97.8 us 7.5 us
64KB 2 151.5 us 10.7 us
128KB 2 257.8 us 12.7 us
256KB 2 443.0 us 17.9 us
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v5
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v5
Changelog
v5
* Rebase on v6.15-rc6
* Add Reviewed-by from Jason and Kevin
* Correct typos in kdoc and update commit logs
* [iommufd] Add a cosmetic fix
* [iommufd] Drop unused num_pfns
* [iommufd] Drop unnecessary check
* [iommufd] Reorder patch sequence
* [iommufd] Use io_remap_pfn_range()
* [iommufd] Use success oriented flow
* [iommufd] Fix max_npages calculation
* [iommufd] Add more selftest coverage
* [iommufd] Drop redundant static_assert
* [iommufd] Fix mmap pfn range validation
* [iommufd] Reject unmap on pinned iovas
* [iommufd] Drop redundant vm_flags_set()
* [iommufd] Drop iommufd_struct_destroy()
* [iommufd] Drop redundant queue iova test
* [iommufd] Use "mmio_addr" and "mmio_pfn"
* [iommufd] Rename to "nesting_parent_iova"
* [iommufd] Make iopt_pin_pages call option
* [iommufd] Add ictx comparison in depend()
* [iommufd] Add iommufd_object_alloc_ucmd()
* [iommufd] Move kcalloc() after validations
* [iommufd] Replace ictx setting with WARN_ON
* [iommufd] Make hw_info's type bidirectional
* [smmu] Add supported_vsmmu_type in impl_ops
* [smmu] Drop impl report in smmu vendor struct
* [tegra] Add IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV
* [tegra] Replace "number of VINTFs" with a note
* [tegra] Drop the redundant lvcmdq pointer setting
* [tegra] Flag IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA
* [tegra] Use "vintf_alloc_vsid" for vdevice_alloc op
v4
https://lore.kernel.org/all/cover.1746757630.git.nicolinc@nvidia.com/
* Rebase on v6.15-rc5
* Add Reviewed-by from Vasant
* Rename "vQUEUE" to "HW QUEUE"
* Use "offset" and "length" for all mmap-related variables
* [iommufd] Use u64 for guest PA
* [iommufd] Fix typo in uAPI doc
* [iommufd] Rename immap_id to offset
* [iommufd] Drop the partial-size mmap support
* [iommufd] Do not replace WARN_ON with WARN_ON_ONCE
* [iommufd] Use "u64 base_addr" for queue base address
* [iommufd] Use u64 base_pfn/num_pfns for immap structure
* [iommufd] Correct the size passed in to mtree_alloc_range()
* [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops
v3
https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/
* Add Reviewed-by from Baolu, Pranjal, and Alok
* Revise kdocs, uAPI docs, and commit logs
* Rename "vCMDQ" back to "vQUEUE" for AMD cases
* [tegra] Add tegra241_vcmdq_hw_flush_timeout()
* [tegra] Rename vsmmu_alloc to alloc_vintf_user
* [tegra] Use writel for SID replacement registers
* [tegra] Move mmap removal call to vsmmu_destroy op
* [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
* [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
* [iommufd] Add an object-type "owner" to immap structure
* [iommufd] Drop the ictx input in the new for-driver APIs
* [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
* [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
* [iommufd] Rename iommufd_ctx_alloc/free_mmap to
_iommufd_alloc/destroy_mmap
v2
https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
* Add Reviewed-by from Jason
* [smmu] Fix vsmmu initial value
* [smmu] Support impl for hw_info
* [tegra] Rename "slot" to "vsid"
* [tegra] Update kdocs and commit logs
* [tegra] Map/unmap LVCMDQ dynamically
* [tegra] Refcount the previous LVCMDQ
* [tegra] Return -EEXIST if LVCMDQ exists
* [tegra] Simplify VINTF cleanup routine
* [tegra] Use vmid and s2_domain in vsmmu
* [tegra] Rename "mmap_pgoff" to "immap_id"
* [tegra] Add more addr and length validation
* [iommufd] Add more narrative to mmap's kdoc
* [iommufd] Add iommufd_struct_depend/undepend()
* [iommufd] Rename vcmdq_free op to vcmdq_destroy
* [iommufd] Fix bug in iommu_copy_struct_to_user()
* [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
* [iommufd] Test the queue memory for its contiguity
* [iommufd] Return -ENXIO if address or length fails
* [iommufd] Do not change @min_last in mock_viommu_alloc()
* [iommufd] Generalize TEGRA241_VCMDQ data in core structure
* [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
* [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (29):
iommufd: Apply obvious cosmetic fixes
iommufd: Introduce iommufd_object_alloc_ucmd helper
iommu: Apply the new iommufd_object_alloc_ucmd helper
iommu: Add iommu_copy_struct_to_user helper
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add coverage for viommu data
iommufd: Do not unmap an owned iopt_area
iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
viommu->ictx
iommufd/viommu: Add driver-allocated vDEVICE support
iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct
iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl
iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers
iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update HW QUEUE
iommu: Allow an input type in hw_info op
iommufd: Allow an input data_type via iommu_hw_info
iommufd/selftest: Update hw_info coverage for an input data_type
iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/tegra241-cmdqv: Simplify deinit flow in
tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 28 +-
drivers/iommu/iommufd/io_pagetable.h | 15 +-
drivers/iommu/iommufd/iommufd_private.h | 41 +-
drivers/iommu/iommufd/iommufd_test.h | 20 +
include/linux/iommu.h | 53 +-
include/linux/iommufd.h | 221 +++++++-
include/uapi/linux/iommufd.h | 150 +++++-
tools/testing/selftests/iommu/iommufd_utils.h | 91 +++-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 33 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 496 +++++++++++++++++-
drivers/iommu/intel/iommu.c | 4 +
drivers/iommu/iommufd/device.c | 137 +----
drivers/iommu/iommufd/driver.c | 97 ++++
drivers/iommu/iommufd/eventq.c | 14 +-
drivers/iommu/iommufd/hw_pagetable.c | 6 +-
drivers/iommu/iommufd/io_pagetable.c | 106 +++-
drivers/iommu/iommufd/iova_bitmap.c | 1 -
drivers/iommu/iommufd/main.c | 80 ++-
drivers/iommu/iommufd/pages.c | 19 +-
drivers/iommu/iommufd/selftest.c | 158 +++++-
drivers/iommu/iommufd/viommu.c | 146 +++++-
tools/testing/selftests/iommu/iommufd.c | 146 +++++-
.../selftests/iommu/iommufd_fail_nth.c | 15 +-
Documentation/userspace-api/iommufd.rst | 12 +
24 files changed, 1794 insertions(+), 295 deletions(-)
--
2.43.0
The bulk of these changes modify the cow and gup_longterm tests to
report unique and stable names for each test, bringing them into line
with the expectations of tooling that works with kselftest. The string
reported as a test result is used by tooling to both deduplicate tests
and track tests between test runs, using the same string for multiple
tests or changing the string depending on test result causes problems
for user interfaces and automation such as bisection.
It was suggested that converting to use kselftest_harness.h would be a
good way of addressing this, however that really wants the set of tests
to run to be known at compile time but both test programs dynamically
enumarate the set of huge page sizes the system supports and test each.
Refactoring to handle this would be even more invasive than these
changes which are large but straightforward and repetitive.
A version of the main gup_longterm cleanup was previously sent
separately, this version factors out the helpers for logging the start
of the test since the cow test looks very similar.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Typo fixes.
- Link to v1: https://lore.kernel.org/r/20250522-selftests-mm-cow-dedupe-v1-0-713cee2fdd6…
---
Mark Brown (4):
selftests/mm: Use standard ksft_finished() in cow and gup_longterm
selftests/mm: Add helper for logging test start and results
selftests/mm: Report unique test names for each cow test
selftests/mm: Fix test result reporting in gup_longterm
tools/testing/selftests/mm/cow.c | 340 +++++++++++++++++++-----------
tools/testing/selftests/mm/gup_longterm.c | 158 ++++++++------
tools/testing/selftests/mm/vm_util.h | 20 ++
3 files changed, 334 insertions(+), 184 deletions(-)
---
base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21
change-id: 20250521-selftests-mm-cow-dedupe-33dcab034558
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Commit 846742f7e32f ("selftests: drv-net: add a warning for
bkg + shell + terminate") added a warning for bkg() used
with terminate=True. The tso test was missed as we didn't
have it running anywhere in NIPA. Add exit_wait=True, to avoid:
# Warning: combining shell and terminate is risky!
# SIGTERM may not reach the child on zsh/ksh!
getting printed twice for every variant.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/hw/tso.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/hw/tso.py b/tools/testing/selftests/drivers/net/hw/tso.py
index e1ecb92f79d9..150d6db241a0 100755
--- a/tools/testing/selftests/drivers/net/hw/tso.py
+++ b/tools/testing/selftests/drivers/net/hw/tso.py
@@ -39,7 +39,7 @@ from lib.py import bkg, cmd, defer, ethtool, ip, rand_port, wait_port_listen
port = rand_port()
listen_cmd = f"socat -{ipver} -t 2 -u TCP-LISTEN:{port},reuseport /dev/null,ignoreeof"
- with bkg(listen_cmd, host=cfg.remote) as nc:
+ with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as nc:
wait_port_listen(port, host=cfg.remote)
if ipver == "4":
--
2.49.0
Add missing config options for the tso.py test, specifically
to make sure the kernel is built with vxlan and gre tunnels.
I noticed this while adding a TSO-capable device QEMU to the CI.
Previously we only run virtio tests and it doesn't report LSO
stats on the QEMU we have.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
v2:
- drop NET_IP_TUNNEL
v1: https://lore.kernel.org/20250602231640.314556-1-kuba@kernel.org
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/hw/config | 5 +++++
1 file changed, 5 insertions(+)
create mode 100644 tools/testing/selftests/drivers/net/hw/config
diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config
new file mode 100644
index 000000000000..88ae719e6f8f
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/config
@@ -0,0 +1,5 @@
+CONFIG_IPV6=y
+CONFIG_IPV6_GRE=y
+CONFIG_NET_IPGRE=y
+CONFIG_NET_IPGRE_DEMUX=y
+CONFIG_VXLAN=y
--
2.49.0
Cong reported an issue where running 'test_sockmap' in the current
bpf-next tree results in an error [1].
The specific test case that triggered the error is a combined test
involving ktls and bpf_msg_pop_data().
Root Cause:
When sending plaintext data, we initially calculated the corresponding
ciphertext length. However, if we later reduced the plaintext data length
via socket policy, we failed to recalculate the ciphertext length.
This results in transmitting buffers containing uninitialized data during
ciphertext transmission.
This causes uninitialized bytes to be appended after a complete
"Application Data" packet, leading to errors on the receiving end when
parsing TLS record.
This issue has existed for a long time but was only exposed after the
following test code was merged.
commit 47eae080410b ("selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap")
Although we already had tests for pop data before this commit, the
pop data length was insufficient (less than 5 bytes). This meant that the
corrupted TLS records with data length <5 bytes were cached without being
parsed, resulting in no error being triggered.
After this fix, all tests pass.
1/ 6 sockmap::txmsg test passthrough:OK
2/ 6 sockmap::txmsg test redirect:OK
3/ 2 sockmap::txmsg test redirect wait send mem:OK
4/ 6 sockmap::txmsg test drop:OK
5/ 6 sockmap::txmsg test ingress redirect:OK
6/ 7 sockmap::txmsg test skb:OK
7/12 sockmap::txmsg test apply:OK
8/12 sockmap::txmsg test cork:OK
9/ 3 sockmap::txmsg test hanging corks:OK
10/11 sockmap::txmsg test push_data:OK
11/17 sockmap::txmsg test pull-data:OK
12/ 9 sockmap::txmsg test pop-data:OK
13/ 6 sockmap::txmsg test push/pop data:OK
14/ 1 sockmap::txmsg test ingress parser:OK
15/ 1 sockmap::txmsg test ingress parser2:OK
16/ 6 sockhash::txmsg test passthrough:OK
17/ 6 sockhash::txmsg test redirect:OK
18/ 2 sockhash::txmsg test redirect wait send mem:OK
19/ 6 sockhash::txmsg test drop:OK
20/ 6 sockhash::txmsg test ingress redirect:OK
21/ 7 sockhash::txmsg test skb:OK
22/12 sockhash::txmsg test apply:OK
23/12 sockhash::txmsg test cork:OK
24/ 3 sockhash::txmsg test hanging corks:OK
25/11 sockhash::txmsg test push_data:OK
26/17 sockhash::txmsg test pull-data:OK
27/ 9 sockhash::txmsg test pop-data:OK
28/ 6 sockhash::txmsg test push/pop data:OK
29/ 1 sockhash::txmsg test ingress parser:OK
30/ 1 sockhash::txmsg test ingress parser2:OK
31/ 6 sockhash:ktls:txmsg test passthrough:OK
32/ 6 sockhash:ktls:txmsg test redirect:OK
33/ 2 sockhash:ktls:txmsg test redirect wait send mem:OK
34/ 6 sockhash:ktls:txmsg test drop:OK
35/ 6 sockhash:ktls:txmsg test ingress redirect:OK
36/ 7 sockhash:ktls:txmsg test skb:OK
37/12 sockhash:ktls:txmsg test apply:OK
38/12 sockhash:ktls:txmsg test cork:OK
39/ 3 sockhash:ktls:txmsg test hanging corks:OK
40/11 sockhash:ktls:txmsg test push_data:OK
41/17 sockhash:ktls:txmsg test pull-data:OK
42/ 9 sockhash:ktls:txmsg test pop-data:OK
43/ 6 sockhash:ktls:txmsg test push/pop data:OK
44/ 1 sockhash:ktls:txmsg test ingress parser:OK
45/ 0 sockhash:ktls:txmsg test ingress parser2:OK
Pass: 45 Fail: 0
[1]: https://lore.kernel.org/bpf/CAM_iQpU7=4xjbefZoxndKoX9gFFMOe7FcWMq5tHBsymbrn…
Jiayuan Chen (2):
bpf,ktls: Fix data corruption when using bpf_msg_pop_data() in ktls
selftests/bpf: Add test to cover ktls with bpf_msg_pop_data
net/tls/tls_sw.c | 15 +++
.../selftests/bpf/prog_tests/sockmap_ktls.c | 91 +++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_ktls.c | 4 +
3 files changed, 110 insertions(+)
--
2.47.1
Some small fixes for arch_timer_edge_cases that I stumbled upon
while debugging failures for this selftest on ampere-one.
Changes since v1: modified patch 3 based on suggestions from Marc.
I've done some tests with this on various machines - seems to be all
good, however on ampere-one I now hit this in 10% of the runs:
==== Test Assertion Failure ====
arm64/arch_timer_edge_cases.c:481: timer_get_cntct(timer) >= DEF_CNT + (timer_get_cntfrq() * (uint64_t)(delta_2_ms) / 1000)
pid=166657 tid=166657 errno=4 - Interrupted system call
1 0x0000000000404db3: test_run at arch_timer_edge_cases.c:933
2 0x0000000000401f9f: main at arch_timer_edge_cases.c:1062
3 0x0000ffffaedd625b: ?? ??:0
4 0x0000ffffaedd633b: ?? ??:0
5 0x00000000004020af: _start at ??:?
timer_get_cntct(timer) >= DEF_CNT + msec_to_cycles(delta_2_ms)
This is not new, it was just hidden behind the other failure. I'll
try to figure out what this is about (seems to be independent of
the wait time)..
Sebastian Ott (3):
KVM: arm64: selftests: fix help text for arch_timer_edge_cases
KVM: arm64: selftests: fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: arch_timer_edge_cases - determine effective counter width
.../kvm/arm64/arch_timer_edge_cases.c | 37 ++++++++++++-------
1 file changed, 24 insertions(+), 13 deletions(-)
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
--
2.49.0
Hello.
Running the mm selftests from the kernel's root directory
on an x86_64 debian machine using:
make defconfig
sudo make kselftest TARGETS=mm
the tests run normally till we reach one which stalls
for 180 seconds and times out according to the following logs:
```
-----------------------------------------------
running ./charge_reserved_hugetlb.sh -cgroup-v2
-----------------------------------------------
CLEANUP DONE
CLEANUP DONE
Test normal case.
private=, populate=, method=0, reserve=
nr hugepages = 10
writing cgroup limit: 20971520
writing reseravation limit: 20971520
Starting:
hugetlb_usage=0
reserved_usage=0
expect_failure is 0
Putting task in cgroup 'hugetlb_cgroup_test'
Method is 0
>>> write_hugetlb_memory.sh: line 22: ./write_to_hugetlbfs: No such file or directory <<<
Waiting for hugetlb memory reservation to reach size 10485760.
0
Waiting for hugetlb memory reservation to reach size 10485760.
0
...
Waiting for hugetlb memory reservation to reach size 10485760.
0
Waiting for hugetlb memory reservation to reach size 10485760.
0
not ok 1 selftests: mm: run_vmtests.sh # TIMEOUT 180 seconds
make[3]: Leaving directory '/linux/tools/testing/selftests/mm'
```
Logs show that the executable "write_to_hugetlbfs" is missing, causing
the test to hang waiting for hugepage reservations.
The executable not found means it was not built by the Make system.
It is mentioned in Makefile:136-142, and only built if ARCH is 64-bit
```
ifneq (,$(filter $(ARCH),arm64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64 s390))
TEST_GEN_FILES += va_high_addr_switch
ifneq ($(ARCH),riscv64)
TEST_GEN_FILES += virtual_address_range
endif
TEST_GEN_FILES += write_to_hugetlbfs
endif
```
So, for some reason, the top-level Makefile provides ARCH as x86.
My proposed solution is similar to existing virtual_address_range check
that is to check for the binary, and if it is not found, skip these 2
test cases: charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh
since they directly and indirectly depend on write_to_hugetlbfs binary.
This is just a workaround, the root issue of different ARCH detection
when running tests from the kernel root directory should still be
addressed. I am not sure how to approach it and open for your suggestions.
Note that this issue does not happen when ran from selftests/mm using
something like
sudo make -C tools/testing/selftests/mm
because then mm/Makefile's ARCH detection runs correctly (x86_64)
Kindly review and share your thoughts.
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux(a)gmail.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index dddd1dd8af14..cdbcfdb62f8a 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -375,8 +375,13 @@ CATEGORY="process_mrelease" run_test ./mrelease_test
CATEGORY="mremap" run_test ./mremap_test
CATEGORY="hugetlb" run_test ./thuge-gen
+
+# the following depend on write_to_hugetlbfs binary
+if [ -x ./write_to_hugetlbfs ]; then
CATEGORY="hugetlb" run_test ./charge_reserved_hugetlb.sh -cgroup-v2
CATEGORY="hugetlb" run_test ./hugetlb_reparenting_test.sh -cgroup-v2
+fi
+
if $RUN_DESTRUCTIVE; then
nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
enable_soft_offline=$(cat /proc/sys/vm/enable_soft_offline)
--
2.47.2
Overview:
This series implements a new PMU scheme on ARM, a partitioned PMU
that exists alongside the existing emulated PMU and may be enabled by
the kernel command line kvm.reserved_host_counters or by the vcpu
ioctl KVM_ARM_PARTITION_PMU. This is a continuation of the RFC posted
earlier this year. [1]
The high level overview and reason for the name is that this
implementation takes advantage of recent CPU features to partition the
PMU counters into a host-reserved set and a guest-reserved set. Guests
are allowed untrapped hardware access to the most frequently used PMU
registers and features for the guest-reserved counters only.
This untrapped hardware access significantly reduces the overhead of
using performance monitoring capabilities such as the `perf` tool
inside a guest VM. Register accesses that aren't trapping to KVM mean
less time spent in the host kernel and more time on the workloads
guests care about. This optimization especially shines during high
`perf` sample rates or large numbers of events that require
multiplexing hardware counters.
Performance:
For example, the following tests were carried out on identical ARM
machines with 10 general purpose counters with identical guest images
run on QEMU, the only difference being my PMU implementation or the
existing one. Some arguments have been simplified here to clarify the
purpose of the test:
1) time perf record -e ${FIFTEEN_HW_EVENTS} -F 1000 -- \
gzip -c tmpfs/random.64M.img >/dev/null
On emulated PMU this command took 4.143s real time with 0.159s system
time. On partitioned PMU this command took 3.139s real time with
0.110s system time, runtime reductions of 24.23% and 30.82%.
2) time perf stat -dd -- \
automated_specint2017.sh
On emulated PMU this benchmark completed in 3789.16s real time with
224.45s system time and a final benchmark score of 4.28. On
partitioned PMU this benchmark completed in 3525.67s real time with
15.98s system time and a final benchmark score of 4.56. That is a
6.95% reduction in runtime, 92.88% reduction in system time, and
6.54% improvement in overall benchmark score.
Seeing these improvements on something as lightweight as perf stat is
remarkable and implies there would have been a much greater
improvement with perf record. I did not test that because I was not
confident it would even finish in a reasonable time on the emulated
PMU
Test 3 was slightly different, I ran the workload in a VM with a
single VCPU pinned to a physical CPU and analyzed from the host where
the physical CPU spent its time using mpstat.
3) perf record -e ${FIFTEEN_HW_EVENTS} -F 4000 -- \
stress-ng --cpu 0 --timeout 30
Over a period of 30s the cpu running with the emulated PMU spent
34.96% of the time in the host kernel and 55.85% of the time in the
guest. The cpu running the partitioned PMU spent 0.97% of its time in
the host kernel and 91.06% of its time in the guest.
Taken together, these tests represent a remarkable performance
improvement for anything perf related using this new PMU
implementation.
Caveats:
Because the most consistent and performant thing to do was untrap
PMCR_EL0, the number of counters visible to the guest via PMCR_EL0.N
is always equal to the value KVM sets for MDCR_EL2.HPMN. Previously
allowed writes to PMCR_EL0.N via {GET,SET}_ONE_REG no longer affect
the guest.
These improvements come at a cost to 7-35 new registers that must be
swapped at every vcpu_load and vcpu_put if the feature is enabled. I
have been informed KVM would like to avoid paying this cost when
possible.
One solution is to make the trapping changes and context swapping lazy
such that the trapping changes and context swapping only take place
after the guest has actually accessed the PMU so guests that never
access the PMU never pay the cost.
This is not done here because it is not crucial to the primary
functionality and I thought review would be more productive as soon as
I had something complete enough for reviewers to easily play with.
However, this or any better ideas are on the table for inclusion in
future re-rolls.
[1] https://lore.kernel.org/kvmarm/20250213180317.3205285-1-coltonlewis@google.…
Colton Lewis (16):
arm64: cpufeature: Add cpucap for HPMN0
arm64: Generate sign macro for sysreg Enums
arm64: cpufeature: Add cpucap for PMICNTR
KVM: arm64: Reorganize PMU functions
KVM: arm64: Introduce method to partition the PMU
perf: arm_pmuv3: Generalize counter bitmasks
perf: arm_pmuv3: Keep out of guest counter partition
KVM: arm64: Set up FGT for Partitioned PMU
KVM: arm64: Writethrough trapped PMEVTYPER register
KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned
KVM: arm64: Writethrough trapped PMOVS register
KVM: arm64: Context switch Partitioned PMU guest registers
perf: pmuv3: Handle IRQs for Partitioned PMU guest counters
KVM: arm64: Inject recorded guest interrupts
KVM: arm64: Add ioctl to partition the PMU when supported
KVM: arm64: selftests: Add test case for partitioned PMU
Marc Zyngier (1):
KVM: arm64: Cleanup PMU includes
Documentation/virt/kvm/api.rst | 16 +
arch/arm/include/asm/arm_pmuv3.h | 24 +
arch/arm64/include/asm/arm_pmuv3.h | 36 +-
arch/arm64/include/asm/kvm_host.h | 208 +++++-
arch/arm64/include/asm/kvm_pmu.h | 82 +++
arch/arm64/kernel/cpufeature.c | 15 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 24 +-
arch/arm64/kvm/debug.c | 13 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 65 +-
arch/arm64/kvm/pmu-emul.c | 629 +----------------
arch/arm64/kvm/pmu-part.c | 358 ++++++++++
arch/arm64/kvm/pmu.c | 630 ++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 54 +-
arch/arm64/tools/cpucaps | 2 +
arch/arm64/tools/gen-sysreg.awk | 1 +
arch/arm64/tools/sysreg | 6 +-
drivers/perf/arm_pmuv3.c | 55 +-
include/kvm/arm_pmu.h | 199 ------
include/linux/perf/arm_pmu.h | 15 +-
include/linux/perf/arm_pmuv3.h | 14 +-
include/uapi/linux/kvm.h | 4 +
tools/include/uapi/linux/kvm.h | 2 +
.../selftests/kvm/arm64/vpmu_counter_access.c | 40 +-
virt/kvm/kvm_main.c | 1 +
25 files changed, 1616 insertions(+), 879 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_pmu.h
create mode 100644 arch/arm64/kvm/pmu-part.c
delete mode 100644 include/kvm/arm_pmu.h
base-commit: 1b85d923ba8c9e6afaf19e26708411adde94fba8
--
2.49.0.1204.g71687c7c1d-goog
As David suggested, currently we don't have a high level test case to
verify the behavior of rmap. This patch set introduce the verification
on rmap by migration.
Patch 1 is a preparation to move ksm related operation into vm_util.
Patch 2 is the new test case.
Currently it covers following four scenarios:
* anonymous page
* shmem page
* pagecache page
* ksm page
Wei Yang (2):
selftests/mm: put general ksm operation into vm_util
selftests/mm: assert rmap behave as expected
MAINTAINERS | 1 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
.../selftests/mm/ksm_functional_tests.c | 76 +--
tools/testing/selftests/mm/rmap.c | 466 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
tools/testing/selftests/mm/vm_util.c | 71 +++
tools/testing/selftests/mm/vm_util.h | 7 +
8 files changed, 563 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/rmap.c
--
2.34.1
Some failure modes are handled poorly by kublk. For example, if ublk_drv
is built as a module but not currently loaded into the kernel, ./kublk
add ... just hangs forever. This happens because in this case (and a few
others), the worker process does not notify its parent (via a write to
the shared eventfd) that it has tried and failed to initialize, so the
parent hangs forever. Fix this by ensuring that we always notify the
parent process of any initialization failure, and have the parent print
a (not very descriptive) log line when this happens.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
tools/testing/selftests/ublk/kublk.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index a98e14e4c245965d817b93843ff9a4011291223b..e2d2042810d4bb472e48a0ed91317d2bdf6e2f2a 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -1112,7 +1112,7 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
__u64 features;
const struct ublk_tgt_ops *ops;
struct ublksrv_ctrl_dev_info *info;
- struct ublk_dev *dev;
+ struct ublk_dev *dev = NULL;
int dev_id = ctx->dev_id;
int ret, i;
@@ -1120,13 +1120,15 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
if (!ops) {
ublk_err("%s: no such tgt type, type %s\n",
__func__, tgt_type);
- return -ENODEV;
+ ret = -ENODEV;
+ goto fail;
}
if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) {
ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n",
__func__, nr_queues, depth);
- return -EINVAL;
+ ret = -EINVAL;
+ goto fail;
}
/* default to 1:1 threads:queues if nthreads is unspecified */
@@ -1136,30 +1138,37 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
if (nthreads > UBLK_MAX_THREADS) {
ublk_err("%s: %u is too many threads (max %u)\n",
__func__, nthreads, UBLK_MAX_THREADS);
- return -EINVAL;
+ ret = -EINVAL;
+ goto fail;
}
if (nthreads != nr_queues && !ctx->per_io_tasks) {
ublk_err("%s: threads %u must be same as queues %u if "
"not using per_io_tasks\n",
__func__, nthreads, nr_queues);
- return -EINVAL;
+ ret = -EINVAL;
+ goto fail;
}
dev = ublk_ctrl_init();
if (!dev) {
ublk_err("%s: can't alloc dev id %d, type %s\n",
__func__, dev_id, tgt_type);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto fail;
}
/* kernel doesn't support get_features */
ret = ublk_ctrl_get_features(dev, &features);
- if (ret < 0)
- return -EINVAL;
+ if (ret < 0) {
+ ret = -EINVAL;
+ goto fail;
+ }
- if (!(features & UBLK_F_CMD_IOCTL_ENCODE))
- return -ENOTSUP;
+ if (!(features & UBLK_F_CMD_IOCTL_ENCODE)) {
+ ret = -ENOTSUP;
+ goto fail;
+ }
info = &dev->dev_info;
info->dev_id = ctx->dev_id;
@@ -1200,7 +1209,8 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
fail:
if (ret < 0)
ublk_send_dev_event(ctx, dev, -1);
- ublk_ctrl_deinit(dev);
+ if (dev)
+ ublk_ctrl_deinit(dev);
return ret;
}
@@ -1262,6 +1272,8 @@ static int cmd_dev_add(struct dev_ctx *ctx)
shmctl(ctx->_shmid, IPC_RMID, NULL);
/* wait for child and detach from it */
wait(NULL);
+ if (exit_code == EXIT_FAILURE)
+ ublk_err("%s: command failed\n", __func__);
exit(exit_code);
} else {
exit(EXIT_FAILURE);
---
base-commit: c09a8b00f850d3ca0af998bff1fac4a3f6d11768
change-id: 20250603-ublk_init_fail-b498905159eb
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
well, i checked the script using checkpatch.pl and
it shows that the patch has no warnings or errors
and its ready to be sent
v2:
- fixed multiple trailing whitespace errors and
- the Signed-off-by mismatch
The test file for the IR decoder used single-line comments
at the top to document its purpose and licensing,
which is inconsistent with the style used throughout the
Linux kernel.
In this patch i converted the file header to
a proper multi-line comment block
(/*) that aligns with standard kernel practices.
This improves readability, consistency across selftests,
and ensures the license and documentation are
clearly visible in a familiar format.
No functional changes have been made.
Signed-off-by: Abdelrahman Fekry <abdelrahmanfekry375(a)gmail.com>
---
tools/testing/selftests/ir/ir_loopback.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/ir/ir_loopback.c b/tools/testing/selftests/ir/ir_loopback.c
index f4a15cbdd5ea..c94faa975630 100644
--- a/tools/testing/selftests/ir/ir_loopback.c
+++ b/tools/testing/selftests/ir/ir_loopback.c
@@ -1,14 +1,17 @@
// SPDX-License-Identifier: GPL-2.0
-// test ir decoder
-//
-// Copyright (C) 2018 Sean Young <sean(a)mess.org>
-
-// When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback
-// will send this IR to the receiver side, where we try to read the decoded
-// IR. Decoding happens in a separate kernel thread, so we will need to
-// wait until that is scheduled, hence we use poll to check for read
-// readiness.
-
+/* Copyright (C) 2018 Sean Young <sean(a)mess.org>
+ *
+ * Selftest for IR decoder
+ *
+ *
+ * When sending LIRC_MODE_SCANCODE, the IR will be encoded.
+ * rc-loopback will send this IR to the receiver side,
+ * where we try to read the decoded IR.
+ * Decoding happens in a separate kernel thread,
+ * so we will need to wait until that is scheduled,
+ * hence we use poll to check for read
+ * readiness.
+ */
#include <linux/lirc.h>
#include <errno.h>
#include <stdio.h>
--
2.25.1
This improves the expressiveness of unprivileged BPF by inserting
speculation barriers instead of rejecting the programs.
The approach was previously presented at LPC'24 [1] and RAID'24 [2].
To mitigate the Spectre v1 (PHT) vulnerability, the kernel rejects
potentially-dangerous unprivileged BPF programs as of
commit 9183671af6db ("bpf: Fix leakage under speculation on mispredicted
branches"). In [2], we have analyzed 364 object files from open source
projects (Linux Samples and Selftests, BCC, Loxilb, Cilium, libbpf
Examples, Parca, and Prevail) and found that this affects 31% to 54% of
programs.
To resolve this in the majority of cases this patchset adds a fall-back
for mitigating Spectre v1 using speculation barriers. The kernel still
optimistically attempts to verify all speculative paths but uses
speculation barriers against v1 when unsafe behavior is detected. This
allows for more programs to be accepted without disabling the BPF
Spectre mitigations (e.g., by setting cpu_mitigations_off()).
For this, it relies on the fact that speculation barriers prevent all
later instructions if the speculation was not correct:
* On x86_64, lfence acts as full speculation barrier, not only as a
load fence [3]:
An LFENCE instruction or a serializing instruction will ensure that
no later instructions execute, even speculatively, until all prior
instructions complete locally. [...] Inserting an LFENCE instruction
after a bounds check prevents later operations from executing before
the bound check completes.
This was experimentally confirmed in [4].
* ARM's SB speculation barrier instruction also affects "any instruction
that appears later in the program order than the barrier" [5].
In [1] we have measured the overhead of this approach relative to having
mitigations off and including the upstream Spectre v4 mitigations. For
event tracing and stack-sampling profilers, we found that mitigations
increase BPF program execution time by 0% to 62%. For the Loxilb network
load balancer, we have measured a 14% slowdown in SCTP performance but
no significant slowdown for TCP. This overhead only applies to programs
that were previously rejected.
I reran the expressiveness-evaluation with v6.14 and made sure the main
results still match those from [1] and [2] (which used v6.5).
Main design decisions are:
* Do not use separate bytecode insns for v1 and v4 barriers (inspired by
Daniel Borkmann's question at LPC). This simplifies the verifier
significantly and has the only downside that performance on PowerPC is
not as high as it could be.
* Allow archs to still disable v1/v4 mitigations separately by setting
bpf_jit_bypass_spec_v1/v4(). This has the benefit that archs can
benefit from improved BPF expressiveness / performance if they are not
vulnerable (e.g., ARM64 for v4 in the kernel).
* Do not remove the empty BPF_NOSPEC implementation for backends for
which it is unknown whether they are vulnerable to Spectre v1.
[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/softwa…
("Managed Runtime Speculative Execution Side Channel Mitigations")
[4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a
tool to analyze speculative execution attacks and mitigations" -
Section 4.6 "Stopping Speculative Execution")
[5] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/S…
("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set Architecture (2020-12)")
Changes:
* v2 -> v3:
- Fix
https://lore.kernel.org/oe-kbuild-all/202504212030.IF1SLhz6-lkp@intel.com/
and similar by moving the bpf_jit_bypass_spec_v1/v4() prototypes out
of the #ifdef CONFIG_BPF_SYSCALL. Decided not to move them to
filter.h (where similar bpf_jit_*() prototypes live) as they would
still have to be duplicated in bpf.h to be usable to
bpf_bypass_spec_v1/v4() (unless including filter.h in bpf.h is an
option).
- Fix
https://lore.kernel.org/oe-kbuild-all/202504220035.SoGveGpj-lkp@intel.com/
by moving the variable declarations out of the switch-case.
- Build touched C files with W=2 and bpf config on x86 to check that
there are no other warnings introduced.
- Found 3 more checkpatch warnings that can be fixed without degrading
readability.
- Rebase to bpf-next 2025-05-01
- Link to v2: https://lore.kernel.org/bpf/20250421091802.3234859-1-luis.gerhorst@fau.de/
* v1 -> v2:
- Drop former commits 9 ("bpf: Return PTR_ERR from push_stack()") and 11
("bpf: Fall back to nospec for spec path verification") as suggested
by Alexei. This series therefore no longer changes push_stack() to
return PTR_ERR.
- Add detailed explanation of how lfence works internally and how it
affects the algorithm.
- Add tests checking that nospec instructions are inserted in expected
locations using __xlated_unpriv as suggested by Eduard (also,
include a fix for __xlated_unpriv)
- Add a test for the mitigations from the description of
commit 9183671af6db ("bpf: Fix leakage under speculation on
mispredicted branches")
- Remove unused variables from do_check[_insn]() as suggested by
Eduard.
- Remove INSN_IDX_MODIFIED to improve readability as suggested by
Eduard. This also causes the nospec_result-check to run (and fail)
for jumping-ops. Add a warning to assert that this check must never
succeed in that case.
- Add details on the safety of patch 10 ("bpf: Allow nospec-protected
var-offset stack access") based on the feedback on v1.
- Rebase to bpf-next-250420
- Link to v1: https://lore.kernel.org/all/20250313172127.1098195-1-luis.gerhorst@fau.de/
* RFC -> v1:
- rebase to bpf-next-250313
- tests: mark expected successes/new errors
- add bpt_jit_bypass_spec_v1/v4() to avoid #ifdef in
bpf_bypass_spec_v1/v4()
- ensure that nospec with v1-support is implemented for archs for
which GCC supports speculation barriers, except for MIPS
- arm64: emit speculation barrier
- powerpc: change nospec to include v1 barrier
- discuss potential security (archs that do not impl. BPF nospec) and
performance (only PowerPC) regressions
- Link to RFC: https://lore.kernel.org/bpf/20250224203619.594724-1-luis.gerhorst@fau.de/
Luis Gerhorst (11):
selftests/bpf: Fix caps for __xlated/jited_unpriv
bpf: Move insn if/else into do_check_insn()
bpf: Return -EFAULT on misconfigurations
bpf: Return -EFAULT on internal errors
bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4()
bpf, arm64, powerpc: Change nospec to include v1 barrier
bpf: Rename sanitize_stack_spill to nospec_result
bpf: Fall back to nospec for Spectre v1
selftests/bpf: Add test for Spectre v1 mitigation
bpf: Allow nospec-protected var-offset stack access
bpf: Fall back to nospec for sanitization-failures
arch/arm64/net/bpf_jit.h | 5 +
arch/arm64/net/bpf_jit_comp.c | 28 +-
arch/powerpc/net/bpf_jit_comp64.c | 80 ++-
include/linux/bpf.h | 11 +-
include/linux/bpf_verifier.h | 3 +-
include/linux/filter.h | 2 +-
kernel/bpf/core.c | 32 +-
kernel/bpf/verifier.c | 653 ++++++++++--------
tools/testing/selftests/bpf/progs/bpf_misc.h | 4 +
.../selftests/bpf/progs/verifier_and.c | 8 +-
.../selftests/bpf/progs/verifier_bounds.c | 66 +-
.../bpf/progs/verifier_bounds_deduction.c | 45 +-
.../selftests/bpf/progs/verifier_map_ptr.c | 20 +-
.../selftests/bpf/progs/verifier_movsx.c | 16 +-
.../selftests/bpf/progs/verifier_unpriv.c | 65 +-
.../bpf/progs/verifier_value_ptr_arith.c | 101 ++-
tools/testing/selftests/bpf/test_loader.c | 14 +-
.../selftests/bpf/verifier/dead_code.c | 3 +-
tools/testing/selftests/bpf/verifier/jmp32.c | 33 +-
tools/testing/selftests/bpf/verifier/jset.c | 10 +-
20 files changed, 771 insertions(+), 428 deletions(-)
base-commit: 358b1c0f56ebb6996fcec7dcdcf6bae5dcbc8b6c
--
2.49.0
The BTF dumper code currently displays arrays of characters as just that -
arrays, with each character formatted individually. Sometimes this is what
makes sense, but it's nice to be able to treat that array as a string.
This change adds a special case to the btf_dump functionality to allow
arrays of single-byte integer values to be printed as character strings.
Characters for which isprint() returns false are printed as hex-escaped
values. This is enabled when the new ".emit_strings" is set to 1 in the
btf_dump_type_data_opts structure.
As an example, here's what it looks like to dump the string "hello" using
a few different field values for btf_dump_type_data_opts (.compact = 1):
- .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',]
- .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',]
- .emit_strings = 1, .skip_names = 0: (char[6])"hello"
- .emit_strings = 1, .skip_names = 1: "hello"
Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1:
- .emit_strings = 0: ['h',-1,]
- .emit_strings = 1: "h\xff"
Signed-off-by: Blake Jones <blakejones(a)google.com>
---
tools/lib/bpf/btf.h | 3 ++-
tools/lib/bpf/btf_dump.c | 44 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 4392451d634b..ccfd905f03df 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -326,9 +326,10 @@ struct btf_dump_type_data_opts {
bool compact; /* no newlines/indentation */
bool skip_names; /* skip member/type names */
bool emit_zeroes; /* show 0-valued fields */
+ bool emit_strings; /* print char arrays as strings */
size_t :0;
};
-#define btf_dump_type_data_opts__last_field emit_zeroes
+#define btf_dump_type_data_opts__last_field emit_strings
LIBBPF_API int
btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 460c3e57fadb..336a6646e0fa 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -68,6 +68,7 @@ struct btf_dump_data {
bool compact;
bool skip_names;
bool emit_zeroes;
+ bool emit_strings;
__u8 indent_lvl; /* base indent level */
char indent_str[BTF_DATA_INDENT_STR_LEN];
/* below are used during iteration */
@@ -2028,6 +2029,43 @@ static int btf_dump_var_data(struct btf_dump *d,
return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0);
}
+static int btf_dump_string_data(struct btf_dump *d,
+ const struct btf_type *t,
+ __u32 id,
+ const void *data)
+{
+ const struct btf_array *array = btf_array(t);
+ __u32 i;
+
+ btf_dump_data_pfx(d);
+ btf_dump_printf(d, "\"");
+
+ for (i = 0; i < array->nelems; i++, data++) {
+ char c;
+
+ if (data >= d->typed_dump->data_end)
+ return -E2BIG;
+
+ c = *(char *)data;
+ if (c == '\0') {
+ /*
+ * When printing character arrays as strings, NUL bytes
+ * are always treated as string terminators; they are
+ * never printed.
+ */
+ break;
+ }
+ if (isprint(c))
+ btf_dump_printf(d, "%c", c);
+ else
+ btf_dump_printf(d, "\\x%02x", *(__u8 *)data);
+ }
+
+ btf_dump_printf(d, "\"");
+
+ return 0;
+}
+
static int btf_dump_array_data(struct btf_dump *d,
const struct btf_type *t,
__u32 id,
@@ -2055,8 +2093,11 @@ static int btf_dump_array_data(struct btf_dump *d,
* char arrays, so if size is 1 and element is
* printable as a char, we'll do that.
*/
- if (elem_size == 1)
+ if (elem_size == 1) {
+ if (d->typed_dump->emit_strings)
+ return btf_dump_string_data(d, t, id, data);
d->typed_dump->is_array_char = true;
+ }
}
/* note that we increment depth before calling btf_dump_print() below;
@@ -2544,6 +2585,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->compact = OPTS_GET(opts, compact, false);
d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
+ d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);
--
2.49.0.1204.g71687c7c1d-goog
Add missing config options for the tso.py test, specifically
to make sure the kernel is built with vxlan and gre tunnels.
I noticed this while adding a TSO-capable device QEMU to the CI.
Previously we only run virtio tests and it doesn't report LSO
stats on the QEMU we have.
Fixes: 0d0f4174f6c8 ("selftests: drv-net: add a simple TSO test")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/hw/config | 6 ++++++
1 file changed, 6 insertions(+)
create mode 100644 tools/testing/selftests/drivers/net/hw/config
diff --git a/tools/testing/selftests/drivers/net/hw/config b/tools/testing/selftests/drivers/net/hw/config
new file mode 100644
index 000000000000..ea4b70d71563
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/config
@@ -0,0 +1,6 @@
+CONFIG_IPV6=y
+CONFIG_IPV6_GRE=y
+CONFIG_NET_IP_TUNNEL=y
+CONFIG_NET_IPGRE=y
+CONFIG_NET_IPGRE_DEMUX=y
+CONFIG_VXLAN=y
--
2.49.0
We have the logic to include net/lib automatically for net related
selftests. However, currently, this logic is only in install target
which means only `make install` will have net/lib included. This commit
adds the logic to all target so that all `make`, `make run_tests` and
`make install` will have net/lib included in net related selftests.
Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com>
---
Changes in v3:
- Don't remove INSTALL_DEP_TARGETS in install target so that net/lib is
copied to INSTALL_PATH
Changes in v2:
- Make the commit message clearer.
tools/testing/selftests/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6aa11cd3db42..339b31e6a6b5 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -205,7 +205,7 @@ export KHDR_INCLUDES
all:
@ret=1; \
- for TARGET in $(TARGETS); do \
+ for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
mkdir $$BUILD_TARGET -p; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \
--
2.43.0
This reverts commit a571a9a1b120264e24b41eddf1ac5140131bfa84.
The commit in question breaks kunit for older compilers:
$ gcc --version
gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
$ ./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=x86_64 O=.kunit olddefconfig
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_INIT_STACK_ALL_PATTERN=y
Link: https://lore.kernel.org/20250529083811.778bc31b@kernel.org
Fixes: a571a9a1b120 ("kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
I'd like to take this in via netdev since it fixes our CI.
We'll send it to Linus next week.
CC: brendan.higgins(a)linux.dev
CC: davidgow(a)google.com
CC: rmoar(a)google.com
CC: broonie(a)kernel.org
CC: rf(a)opensource.cirrus.com
CC: mic(a)digikod.net
CC: skhan(a)linuxfoundation.org
CC: linux-kselftest(a)vger.kernel.org
CC: kunit-dev(a)googlegroups.com
---
tools/testing/kunit/configs/all_tests.config | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index 48b132cd9d2a..2f093048d985 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -10,7 +10,6 @@ CONFIG_KUNIT_EXAMPLE_TEST=y
CONFIG_KUNIT_ALL_TESTS=y
CONFIG_FORTIFY_SOURCE=y
-CONFIG_INIT_STACK_ALL_PATTERN=y
CONFIG_IIO=y
--
2.49.0
The newly added anon_inode_test test fails to build due to attempting to
include a nonexisting overlayfs/wrapper.h:
anon_inode_test.c:10:10: fatal error: overlayfs/wrappers.h: No such file or directory
10 | #include "overlayfs/wrappers.h"
| ^~~~~~~~~~~~~~~~~~~~~~
This is due to 0bd92b9fe538 ("selftests/filesystems: move wrapper.h out
of overlayfs subdir") which was added in the vfs-6.16.selftests branch
which was based on -rc5 and did not contain the newly added test so once
things were merged into mainline the build started failing - both
parent commits are fine.
Fixes: 3e406741b1989 ("Merge tag 'vfs-6.16-rc1.selftests' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto mainline and adjust fixes commit now the two branches got
merged there.
- Link to v1: https://lore.kernel.org/r/20250518-selftests-anon-inode-build-v1-1-71eff818…
---
tools/testing/selftests/filesystems/anon_inode_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/filesystems/anon_inode_test.c b/tools/testing/selftests/filesystems/anon_inode_test.c
index e8e0ef1460d2..73e0a4d4fb2f 100644
--- a/tools/testing/selftests/filesystems/anon_inode_test.c
+++ b/tools/testing/selftests/filesystems/anon_inode_test.c
@@ -7,7 +7,7 @@
#include <sys/stat.h>
#include "../kselftest_harness.h"
-#include "overlayfs/wrappers.h"
+#include "wrappers.h"
TEST(anon_inode_no_chown)
{
---
base-commit: f66bc387efbee59978e076ce9bf123ac353b389c
change-id: 20250516-selftests-anon-inode-build-007e206e8422
Best regards,
--
Mark Brown <broonie(a)kernel.org>
When CONFIG_RANDSTRUCT is not enabled, all randstruct tests are skipped.
Move this logic from run-time to config-time, to avoid people building
and running tests that do not do anything.
Fixes: b370f7eacdcfe1dd ("lib/tests: Add randstruct KUnit test")
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
FTR, after adding "select HAVE_GCC_PLUGINS" to arch/m68k/Kconfig and
installing gcc-13-plugin-dev-m68k-linux-gnu, I could enable
CONFIG_RANDSTRUCT_FULL and run the tests (all passed!), so probably I
should send a patch to select HAVE_GCC_PLUGINS?
---
lib/Kconfig.debug | 4 ++--
lib/tests/randstruct_kunit.c | 9 ---------
2 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a8b4febad716a4be..407f2ed7fcb3e94c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2895,10 +2895,10 @@ config OVERFLOW_KUNIT_TEST
config RANDSTRUCT_KUNIT_TEST
tristate "Test randstruct structure layout randomization at runtime" if !KUNIT_ALL_TESTS
depends on KUNIT
+ depends on RANDSTRUCT
default KUNIT_ALL_TESTS
help
- Builds unit tests for the checking CONFIG_RANDSTRUCT=y, which
- randomizes structure layouts.
+ Builds unit tests for checking structure layout randomization.
config STACKINIT_KUNIT_TEST
tristate "Test level of stack variable initialization" if !KUNIT_ALL_TESTS
diff --git a/lib/tests/randstruct_kunit.c b/lib/tests/randstruct_kunit.c
index f3a2d63c4cfbe7dc..2c95eca76d2411bc 100644
--- a/lib/tests/randstruct_kunit.c
+++ b/lib/tests/randstruct_kunit.c
@@ -305,14 +305,6 @@ static void randstruct_initializers(struct kunit *test)
#undef init_members
}
-static int randstruct_test_init(struct kunit *test)
-{
- if (!IS_ENABLED(CONFIG_RANDSTRUCT))
- kunit_skip(test, "Not built with CONFIG_RANDSTRUCT=y");
-
- return 0;
-}
-
static struct kunit_case randstruct_test_cases[] = {
KUNIT_CASE(randstruct_layout_same),
KUNIT_CASE(randstruct_layout_mixed),
@@ -324,7 +316,6 @@ static struct kunit_case randstruct_test_cases[] = {
static struct kunit_suite randstruct_test_suite = {
.name = "randstruct",
- .init = randstruct_test_init,
.test_cases = randstruct_test_cases,
};
--
2.43.0
When CONFIG_FORTIFY_SOURCE is not enabled, all fortify tests are
skipped. Move this logic from run-time to config-time, to avoid people
building and running tests that do not do anything.
This basically reverts commit 1a78f8cb5daac774 ("fortify: Allow KUnit
test to build without FORTIFY") in v6.9, which was v3 of commit
a9dc8d0442294b42 ("fortify: Allow KUnit test to build without FORTIFY")
in v6.5, which was quickly reverted in commit 5e2956ee46244ffb ("Revert
"fortify: Allow KUnit test to build without FORTIFY"").
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
Let's keep on playing whack-a-mole ;-)
---
lib/Kconfig.debug | 1 +
lib/tests/fortify_kunit.c | 8 --------
2 files changed, 1 insertion(+), 8 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 407f2ed7fcb3e94c..ca5afd192c9fbf51 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2912,6 +2912,7 @@ config STACKINIT_KUNIT_TEST
config FORTIFY_KUNIT_TEST
tristate "Test fortified str*() and mem*() function internals at runtime" if !KUNIT_ALL_TESTS
depends on KUNIT
+ depends on FORTIFY_SOURCE
default KUNIT_ALL_TESTS
help
Builds unit tests for checking internals of FORTIFY_SOURCE as used
diff --git a/lib/tests/fortify_kunit.c b/lib/tests/fortify_kunit.c
index 29ffc62a71e3f968..10b0e1b12cdc3ae2 100644
--- a/lib/tests/fortify_kunit.c
+++ b/lib/tests/fortify_kunit.c
@@ -48,11 +48,6 @@ void fortify_add_kunit_error(int write);
#include <linux/string.h>
#include <linux/vmalloc.h>
-/* Handle being built without CONFIG_FORTIFY_SOURCE */
-#ifndef __compiletime_strlen
-# define __compiletime_strlen __builtin_strlen
-#endif
-
static struct kunit_resource read_resource;
static struct kunit_resource write_resource;
static int fortify_read_overflows;
@@ -1071,9 +1066,6 @@ static void fortify_test_kmemdup(struct kunit *test)
static int fortify_test_init(struct kunit *test)
{
- if (!IS_ENABLED(CONFIG_FORTIFY_SOURCE))
- kunit_skip(test, "Not built with CONFIG_FORTIFY_SOURCE=y");
-
fortify_read_overflows = 0;
kunit_add_named_resource(test, NULL, NULL, &read_resource,
"fortify_read_overflows",
--
2.43.0
During performance analysis of console subsystem latency, I discovered that
netconsole registers console handlers even when no active targets exist.
These orphaned console handlers are invoked on every printk() call, get
the lock, iterate through empty target lists, and consume CPU cycles
without performing any useful work.
This patch series addresses the inefficiency by:
1. Implementing dynamic console registration/unregistration based on target
availability, ensuring console handlers are only active when needed
2. Adding automatic cleanup of unused console registrations when targets
are disabled or removed
3. Extending the selftest suite to cover non-extended console format,
which was previously untested
The optimization reduces printk() overhead by eliminating unnecessary
function calls and list traversals when netconsole targets are not
configured, improving overall system performance during heavy logging
scenarios.
---
Changes in v2:
- Added selftests to test the new mechanism
- Unregister the console if the last target got disabled
- Sending to net-next instead of net (Jakub)
- Link to v1: https://lore.kernel.org/r/20250528-netcons_ext-v1-1-69f71e404e00@debian.org
---
Breno Leitao (4):
netconsole: Only register console drivers when targets are configured
netconsole: Add automatic console unregistration on target removal
selftests: netconsole: Do not exit from inside the validation function
selftests: netconsole: Add support for basic netconsole target format
drivers/net/netconsole.c | 61 +++++++++++++++++++---
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 27 ++++++++--
.../testing/selftests/drivers/net/netcons_basic.sh | 50 +++++++++++-------
3 files changed, 107 insertions(+), 31 deletions(-)
---
base-commit: 914873bc7df913db988284876c16257e6ab772c6
change-id: 20250528-netcons_ext-572982619bea
Best regards,
--
Breno Leitao <leitao(a)debian.org>
This picks up from Michal Rostecki's work[0]. Per Michal's guidance I
have omitted Co-authored tags, as the end result is quite different.
Link: https://lore.kernel.org/rust-for-linux/20240819153656.28807-2-vadorovsky@pr… [0]
Closes: https://github.com/Rust-for-Linux/linux/issues/1075
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v11:
- Use `quote_spanned!` to avoid `use<'a, T>` and generally reduce manual
token construction.
- Add a commit to simplify `quote_spanned!`.
- Drop first commit in favor of
https://lore.kernel.org/rust-for-linux/20240906164448.2268368-1-paddymills@….
(Miguel Ojeda)
- Correctly handle expressions such as `pr_info!("{a}", a = a = a)`.
(Benno Lossin)
- Avoid dealing with `}}` escapes, which is not needed. (Benno Lossin)
- Revert some unnecessary changes. (Benno Lossin)
- Rename `c_str_avoid_literals!` to `str_to_cstr!`. (Benno Lossin &
Alice Ryhl).
- Link to v10: https://lore.kernel.org/r/20250524-cstr-core-v10-0-6412a94d9d75@gmail.com
Changes in v10:
- Rebase on cbeaa41dfe26b72639141e87183cb23e00d4b0dd.
- Implement Alice's suggestion to use a proc macro to work around orphan
rules otherwise preventing `core::ffi::CStr` to be directly printed
with `{}`.
- Link to v9: https://lore.kernel.org/r/20250317-cstr-core-v9-0-51d6cc522f62@gmail.com
Changes in v9:
- Rebase on rust-next.
- Restore `impl Display for BStr` which exists upstream[1].
- Link: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#impl-Display… [1]
- Link to v8: https://lore.kernel.org/r/20250203-cstr-core-v8-0-cb3f26e78686@gmail.com
Changes in v8:
- Move `{from,as}_char_ptr` back to `CStrExt`. This reduces the diff
some.
- Restore `from_bytes_with_nul_unchecked_mut`, `to_cstring`.
- Link to v7: https://lore.kernel.org/r/20250202-cstr-core-v7-0-da1802520438@gmail.com
Changes in v7:
- Rebased on mainline.
- Restore functionality added in commit a321f3ad0a5d ("rust: str: add
{make,to}_{upper,lower}case() to CString").
- Used `diff.algorithm patience` to improve diff readability.
- Link to v6: https://lore.kernel.org/r/20250202-cstr-core-v6-0-8469cd6d29fd@gmail.com
Changes in v6:
- Split the work into several commits for ease of review.
- Restore `{from,as}_char_ptr` to allow building on ARM (see commit
message).
- Add `CStrExt` to `kernel::prelude`. (Alice Ryhl)
- Remove `CStrExt::from_bytes_with_nul_unchecked_mut` and restore
`DerefMut for CString`. (Alice Ryhl)
- Rename and hide `kernel::c_str!` to encourage use of C-String
literals.
- Drop implementation and invocation changes in kunit.rs. (Trevor Gross)
- Drop docs on `Display` impl. (Trevor Gross)
- Rewrite docs in the style of the standard library.
- Restore the `test_cstr_debug` unit tests to demonstrate that the
implementation has changed.
Changes in v5:
- Keep the `test_cstr_display*` unit tests.
Changes in v4:
- Provide the `CStrExt` trait with `display()` method, which returns a
`CStrDisplay` wrapper with `Display` implementation. This addresses
the lack of `Display` implementation for `core::ffi::CStr`.
- Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`,
which might be useful and is going to prevent manual, unsafe casts.
- Fix a typo (s/preffered/prefered/).
Changes in v3:
- Fix the commit message.
- Remove redundant braces in `use`, when only one item is imported.
Changes in v2:
- Do not remove `c_str` macro. While it's preferred to use C-string
literals, there are two cases where `c_str` is helpful:
- When working with macros, which already return a Rust string literal
(e.g. `stringify!`).
- When building macros, where we want to take a Rust string literal as an
argument (for caller's convenience), but still use it as a C-string
internally.
- Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`,
`new_mutex`). Use the `c_str` macro to convert these literals to C-string
literals.
- Use `c_str` in kunit.rs for converting the output of `stringify!` to a
`CStr`.
- Remove `DerefMut` implementation for `CString`.
---
Tamir Duberstein (5):
rust: macros: reduce collections in `quote!` macro
rust: support formatting of foreign types
rust: replace `CStr` with `core::ffi::CStr`
rust: replace `kernel::c_str!` with C-Strings
rust: remove core::ffi::CStr reexport
drivers/block/rnull.rs | 4 +-
drivers/gpu/drm/drm_panic_qr.rs | 5 +-
drivers/gpu/nova-core/driver.rs | 2 +-
drivers/gpu/nova-core/firmware.rs | 2 +-
drivers/net/phy/ax88796b_rust.rs | 8 +-
drivers/net/phy/qt2025.rs | 6 +-
rust/kernel/block/mq.rs | 2 +-
rust/kernel/device.rs | 9 +-
rust/kernel/devres.rs | 2 +-
rust/kernel/driver.rs | 4 +-
rust/kernel/error.rs | 10 +-
rust/kernel/faux.rs | 5 +-
rust/kernel/firmware.rs | 16 +-
rust/kernel/fmt.rs | 77 ++++++
rust/kernel/kunit.rs | 21 +-
rust/kernel/lib.rs | 3 +-
rust/kernel/miscdevice.rs | 5 +-
rust/kernel/net/phy.rs | 12 +-
rust/kernel/of.rs | 5 +-
rust/kernel/pci.rs | 2 +-
rust/kernel/platform.rs | 6 +-
rust/kernel/prelude.rs | 5 +-
rust/kernel/print.rs | 4 +-
rust/kernel/seq_file.rs | 6 +-
rust/kernel/str.rs | 443 ++++++++++-------------------------
rust/kernel/sync.rs | 7 +-
rust/kernel/sync/condvar.rs | 4 +-
rust/kernel/sync/lock.rs | 4 +-
rust/kernel/sync/lock/global.rs | 6 +-
rust/kernel/sync/poll.rs | 1 +
rust/kernel/workqueue.rs | 9 +-
rust/macros/fmt.rs | 99 ++++++++
rust/macros/kunit.rs | 10 +-
rust/macros/lib.rs | 19 ++
rust/macros/module.rs | 2 +-
rust/macros/quote.rs | 111 ++++-----
samples/rust/rust_driver_faux.rs | 4 +-
samples/rust/rust_driver_pci.rs | 4 +-
samples/rust/rust_driver_platform.rs | 4 +-
samples/rust/rust_misc_device.rs | 3 +-
scripts/rustdoc_test_gen.rs | 6 +-
41 files changed, 485 insertions(+), 472 deletions(-)
---
base-commit: 7a17bbc1d952057898cb0739e60665908fbb8c72
change-id: 20250201-cstr-core-d4b9b69120cf
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
[ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ]
With -Wmissing-prototypes the compiler will warn about non-static
functions which don't have a prototype defined.
As they are not used from a different compilation unit they don't need to
be defined globally.
Avoid the issue by marking the functions static.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES** This commit should be backported to stable kernel trees.
**Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real
compiler warning issue (`-Wmissing-prototypes`) that affects build
cleanliness and code quality. Modern build systems increasingly use
stricter warning flags, making this fix valuable for stable trees. 2.
**Zero Functional Risk**: The changes are purely cosmetic from a runtime
perspective. Adding `static` to functions that were already internal has
no impact on functionality, memory layout, or behavior - it only affects
compiler symbol visibility and warnings. 3. **Minimal and Contained**:
The diff is extremely small (4 function signatures with `static` added)
and isolated to the kselftest harness framework. There are no complex
logic changes or cross-subsystem impacts. 4. **Testing Infrastructure
Improvement**: While the kselftest framework isn't critical runtime
code, it's important for kernel testing and validation. Improving build
compliance in testing infrastructure benefits stable kernel maintenance.
5. **Standard Practice**: Compiler warning fixes of this nature (adding
missing `static` keywords) are routinely backported to stable trees as
they represent good coding practices without functional risk. 6.
**Different from Similar Commits**: Unlike the referenced similar
commits (all marked "NO") which involved feature additions, API changes,
or structural modifications, this commit is purely a build compliance
fix with no behavioral changes. The commit meets all stable tree
criteria: it fixes an issue (compiler warnings), has minimal risk (no
functional changes), and improves code quality without introducing new
features or architectural changes. Tools like `kselftest_harness.h:241`,
`kselftest_harness.h:290`, `kselftest_harness.h:970`, and
`kselftest_harness.h:1188` are the specific locations where these low-
risk improvements are made.
tools/testing/selftests/kselftest_harness.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 666c9fde76da9..7c337b4fa054d 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -258,7 +258,7 @@
* A bare "return;" statement may be used to return early.
*/
#define FIXTURE_SETUP(fixture_name) \
- void fixture_name##_setup( \
+ static void fixture_name##_setup( \
struct __test_metadata __attribute__((unused)) *_metadata, \
FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
const FIXTURE_VARIANT(fixture_name) \
@@ -307,7 +307,7 @@
__FIXTURE_TEARDOWN(fixture_name)
#define __FIXTURE_TEARDOWN(fixture_name) \
- void fixture_name##_teardown( \
+ static void fixture_name##_teardown( \
struct __test_metadata __attribute__((unused)) *_metadata, \
FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
const FIXTURE_VARIANT(fixture_name) \
@@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext)
kill(-(t->pid), SIGKILL);
}
-void __wait_for_test(struct __test_metadata *t)
+static void __wait_for_test(struct __test_metadata *t)
{
struct sigaction action = {
.sa_sigaction = __timeout_handler,
@@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv,
return !has_positive;
}
-void __run_test(struct __fixture_metadata *f,
- struct __fixture_variant_metadata *variant,
- struct __test_metadata *t)
+static void __run_test(struct __fixture_metadata *f,
+ struct __fixture_variant_metadata *variant,
+ struct __test_metadata *t)
{
struct __test_xfail *xfail;
char test_name[1024];
--
2.39.5
From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
[ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ]
With -Wmissing-prototypes the compiler will warn about non-static
functions which don't have a prototype defined.
As they are not used from a different compilation unit they don't need to
be defined globally.
Avoid the issue by marking the functions static.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES** This commit should be backported to stable kernel trees.
**Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real
compiler warning issue (`-Wmissing-prototypes`) that affects build
cleanliness and code quality. Modern build systems increasingly use
stricter warning flags, making this fix valuable for stable trees. 2.
**Zero Functional Risk**: The changes are purely cosmetic from a runtime
perspective. Adding `static` to functions that were already internal has
no impact on functionality, memory layout, or behavior - it only affects
compiler symbol visibility and warnings. 3. **Minimal and Contained**:
The diff is extremely small (4 function signatures with `static` added)
and isolated to the kselftest harness framework. There are no complex
logic changes or cross-subsystem impacts. 4. **Testing Infrastructure
Improvement**: While the kselftest framework isn't critical runtime
code, it's important for kernel testing and validation. Improving build
compliance in testing infrastructure benefits stable kernel maintenance.
5. **Standard Practice**: Compiler warning fixes of this nature (adding
missing `static` keywords) are routinely backported to stable trees as
they represent good coding practices without functional risk. 6.
**Different from Similar Commits**: Unlike the referenced similar
commits (all marked "NO") which involved feature additions, API changes,
or structural modifications, this commit is purely a build compliance
fix with no behavioral changes. The commit meets all stable tree
criteria: it fixes an issue (compiler warnings), has minimal risk (no
functional changes), and improves code quality without introducing new
features or architectural changes. Tools like `kselftest_harness.h:241`,
`kselftest_harness.h:290`, `kselftest_harness.h:970`, and
`kselftest_harness.h:1188` are the specific locations where these low-
risk improvements are made.
tools/testing/selftests/kselftest_harness.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 666c9fde76da9..7c337b4fa054d 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -258,7 +258,7 @@
* A bare "return;" statement may be used to return early.
*/
#define FIXTURE_SETUP(fixture_name) \
- void fixture_name##_setup( \
+ static void fixture_name##_setup( \
struct __test_metadata __attribute__((unused)) *_metadata, \
FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
const FIXTURE_VARIANT(fixture_name) \
@@ -307,7 +307,7 @@
__FIXTURE_TEARDOWN(fixture_name)
#define __FIXTURE_TEARDOWN(fixture_name) \
- void fixture_name##_teardown( \
+ static void fixture_name##_teardown( \
struct __test_metadata __attribute__((unused)) *_metadata, \
FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
const FIXTURE_VARIANT(fixture_name) \
@@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext)
kill(-(t->pid), SIGKILL);
}
-void __wait_for_test(struct __test_metadata *t)
+static void __wait_for_test(struct __test_metadata *t)
{
struct sigaction action = {
.sa_sigaction = __timeout_handler,
@@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv,
return !has_positive;
}
-void __run_test(struct __fixture_metadata *f,
- struct __fixture_variant_metadata *variant,
- struct __test_metadata *t)
+static void __run_test(struct __fixture_metadata *f,
+ struct __fixture_variant_metadata *variant,
+ struct __test_metadata *t)
{
struct __test_xfail *xfail;
char test_name[1024];
--
2.39.5
From: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
[ Upstream commit c2bcc8e9577a35f9cf4707f8bb0b58bce30991aa ]
With -Wmissing-prototypes the compiler will warn about non-static
functions which don't have a prototype defined.
As they are not used from a different compilation unit they don't need to
be defined globally.
Avoid the issue by marking the functions static.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/20250505-nolibc-kselftest-harness-v4-4-ee4dd52571…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES** This commit should be backported to stable kernel trees.
**Rationale:** 1. **Legitimate Build Fix**: The commit addresses a real
compiler warning issue (`-Wmissing-prototypes`) that affects build
cleanliness and code quality. Modern build systems increasingly use
stricter warning flags, making this fix valuable for stable trees. 2.
**Zero Functional Risk**: The changes are purely cosmetic from a runtime
perspective. Adding `static` to functions that were already internal has
no impact on functionality, memory layout, or behavior - it only affects
compiler symbol visibility and warnings. 3. **Minimal and Contained**:
The diff is extremely small (4 function signatures with `static` added)
and isolated to the kselftest harness framework. There are no complex
logic changes or cross-subsystem impacts. 4. **Testing Infrastructure
Improvement**: While the kselftest framework isn't critical runtime
code, it's important for kernel testing and validation. Improving build
compliance in testing infrastructure benefits stable kernel maintenance.
5. **Standard Practice**: Compiler warning fixes of this nature (adding
missing `static` keywords) are routinely backported to stable trees as
they represent good coding practices without functional risk. 6.
**Different from Similar Commits**: Unlike the referenced similar
commits (all marked "NO") which involved feature additions, API changes,
or structural modifications, this commit is purely a build compliance
fix with no behavioral changes. The commit meets all stable tree
criteria: it fixes an issue (compiler warnings), has minimal risk (no
functional changes), and improves code quality without introducing new
features or architectural changes. Tools like `kselftest_harness.h:241`,
`kselftest_harness.h:290`, `kselftest_harness.h:970`, and
`kselftest_harness.h:1188` are the specific locations where these low-
risk improvements are made.
tools/testing/selftests/kselftest_harness.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 666c9fde76da9..7c337b4fa054d 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -258,7 +258,7 @@
* A bare "return;" statement may be used to return early.
*/
#define FIXTURE_SETUP(fixture_name) \
- void fixture_name##_setup( \
+ static void fixture_name##_setup( \
struct __test_metadata __attribute__((unused)) *_metadata, \
FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
const FIXTURE_VARIANT(fixture_name) \
@@ -307,7 +307,7 @@
__FIXTURE_TEARDOWN(fixture_name)
#define __FIXTURE_TEARDOWN(fixture_name) \
- void fixture_name##_teardown( \
+ static void fixture_name##_teardown( \
struct __test_metadata __attribute__((unused)) *_metadata, \
FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
const FIXTURE_VARIANT(fixture_name) \
@@ -987,7 +987,7 @@ static void __timeout_handler(int sig, siginfo_t *info, void *ucontext)
kill(-(t->pid), SIGKILL);
}
-void __wait_for_test(struct __test_metadata *t)
+static void __wait_for_test(struct __test_metadata *t)
{
struct sigaction action = {
.sa_sigaction = __timeout_handler,
@@ -1205,9 +1205,9 @@ static bool test_enabled(int argc, char **argv,
return !has_positive;
}
-void __run_test(struct __fixture_metadata *f,
- struct __fixture_variant_metadata *variant,
- struct __test_metadata *t)
+static void __run_test(struct __fixture_metadata *f,
+ struct __fixture_variant_metadata *variant,
+ struct __test_metadata *t)
{
struct __test_xfail *xfail;
char test_name[1024];
--
2.39.5
When CONFIG_DAMON_SYSFS is disabled, the selftests fail with the
following outputs,
not ok 2 selftests: damon: sysfs_update_schemes_tried_regions_wss_estimation.py # exit=1
not ok 3 selftests: damon: damos_quota.py # exit=1
not ok 4 selftests: damon: damos_quota_goal.py # exit=1
not ok 5 selftests: damon: damos_apply_interval.py # exit=1
not ok 6 selftests: damon: damos_tried_regions.py # exit=1
not ok 7 selftests: damon: damon_nr_regions.py # exit=1
not ok 11 selftests: damon: sysfs_update_schemes_tried_regions_hang.py # exit=1
The root cause of this issue is that all the testcases above do not
check the sysfs interface of DAMON whether it exists or not. With this
patch applied, all the testcases above now pass successfully.
Signed-off-by: Enze Li <lienze(a)kylinos.cn>
---
tools/testing/selftests/damon/_damon_sysfs.py | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/damon/_damon_sysfs.py b/tools/testing/selftests/damon/_damon_sysfs.py
index 6e136dc3df19..cab67addfb00 100644
--- a/tools/testing/selftests/damon/_damon_sysfs.py
+++ b/tools/testing/selftests/damon/_damon_sysfs.py
@@ -15,6 +15,10 @@ if sysfs_root is None:
print('Seems sysfs not mounted?')
exit(ksft_skip)
+if not os.path.exists(sysfs_root):
+ print('Seems DAMON disabled?')
+ exit(ksft_skip)
+
def write_file(path, string):
"Returns error string if failed, or None otherwise"
string = '%s' % string
base-commit: 0f70f5b08a47a3bc1a252e5f451a137cde7c98ce
--
2.43.0
We have the logic to include net/lib automatically for net related
selftests. However, currently, this logic is only in install target
which means only `make install` will have net/lib included. This commit
moves the logic to all target so that all `make`, `make run_tests` and
`make install` will have net/lib included in net related selftests.
Reviewed-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com>
---
Changes in v2:
- Make the commit message clearer.
tools/testing/selftests/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6aa11cd3db42..5b04d83ad9a1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -205,7 +205,7 @@ export KHDR_INCLUDES
all:
@ret=1; \
- for TARGET in $(TARGETS); do \
+ for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
mkdir $$BUILD_TARGET -p; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \
@@ -270,7 +270,7 @@ ifdef INSTALL_PATH
install -m 744 run_kselftest.sh $(INSTALL_PATH)/
rm -f $(TEST_LIST)
@ret=1; \
- for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \
+ for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install \
INSTALL_PATH=$(INSTALL_PATH)/$$TARGET \
--
2.43.0
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
HW QUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which allows its IOMMU HW to directly access a queue memory owned
by a guest VM and allows a guest OS to control the HW queue direclty, to
avoid VM Exit overheads to improve the performance.
Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.
Meanwhile, a guest-owned queue needs the guest kernel to control the queue
by reading/writing its consumer and producer indexes, via MMIO acceses to
the hardware MMIO registers. Introduce an mmap infrastructure for iommufd
to support passing through a piece of MMIO region from the host physical
address space to the guest physical address space. The mmap info (offset/
length) used by an mmap syscall must be pre-allocated and returned to the
user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC
call. Thus, it requires a driver-specific user data support in the vIOMMU
allocation flow.
As a real-world use case, this series implements a HW QUEUE support in the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.
// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
4KB 1 35.7 us 5.3 us
16KB 1 41.8 us 6.8 us
64KB 1 68.9 us 9.9 us
128KB 1 109.0 us 12.6 us
256KB 1 187.1 us 18.0 us
4KB 2 96.9 us 6.8 us
16KB 2 97.8 us 7.5 us
64KB 2 151.5 us 10.7 us
128KB 2 257.8 us 12.7 us
256KB 2 443.0 us 17.9 us
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v4
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v4
Changelog
v4
* Rebase on v6.15-rc5
* Add Reviewed-by from Vasant
* Rename "vQUEUE" to "HW QUEUE"
* Use "offset" and "length" for all mmap-related variables
* [iommufd] Use u64 for guest PA
* [iommufd] Fix typo in uAPI doc
* [iommufd] Rename immap_id to offset
* [iommufd] Drop the partial-size mmap support
* [iommufd] Do not replace WARN_ON with WARN_ON_ONCE
* [iommufd] Use "u64 base_addr" for queue base address
* [iommufd] Use u64 base_pfn/num_pfns for immap structure
* [iommufd] Correct the size passed in to mtree_alloc_range()
* [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops
v3
https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/
* Add Reviewed-by from Baolu, Pranjal, and Alok
* Revise kdocs, uAPI docs, and commit logs
* Rename "vCMDQ" back to "vQUEUE" for AMD cases
* [tegra] Add tegra241_vcmdq_hw_flush_timeout()
* [tegra] Rename vsmmu_alloc to alloc_vintf_user
* [tegra] Use writel for SID replacement registers
* [tegra] Move mmap removal call to vsmmu_destroy op
* [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
* [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
* [iommufd] Add an object-type "owner" to immap structure
* [iommufd] Drop the ictx input in the new for-driver APIs
* [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
* [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
* [iommufd] Rename iommufd_ctx_alloc/free_mmap to
_iommufd_alloc/destroy_mmap
v2
https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
* Add Reviewed-by from Jason
* [smmu] Fix vsmmu initial value
* [smmu] Support impl for hw_info
* [tegra] Rename "slot" to "vsid"
* [tegra] Update kdocs and commit logs
* [tegra] Map/unmap LVCMDQ dynamically
* [tegra] Refcount the previous LVCMDQ
* [tegra] Return -EEXIST if LVCMDQ exists
* [tegra] Simplify VINTF cleanup routine
* [tegra] Use vmid and s2_domain in vsmmu
* [tegra] Rename "mmap_pgoff" to "immap_id"
* [tegra] Add more addr and length validation
* [iommufd] Add more narrative to mmap's kdoc
* [iommufd] Add iommufd_struct_depend/undepend()
* [iommufd] Rename vcmdq_free op to vcmdq_destroy
* [iommufd] Fix bug in iommu_copy_struct_to_user()
* [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
* [iommufd] Test the queue memory for its contiguity
* [iommufd] Return -ENXIO if address or length fails
* [iommufd] Do not change @min_last in mock_viommu_alloc()
* [iommufd] Generalize TEGRA241_VCMDQ data in core structure
* [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
* [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (23):
iommufd/viommu: Add driver-allocated vDEVICE support
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommu: Add iommu_copy_struct_to_user helper
iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
viommu->ictx
iommufd/driver: Add iommufd_struct_destroy to revert
iommufd_viommu_alloc
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add covearge for viommu data
iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct
iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl
iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers
iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update HW QUEUE
iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/tegra241-cmdqv: Simplify deinit flow in
tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +-
drivers/iommu/iommufd/io_pagetable.h | 8 +
drivers/iommu/iommufd/iommufd_private.h | 28 +-
drivers/iommu/iommufd/iommufd_test.h | 20 +
include/linux/iommu.h | 43 +-
include/linux/iommufd.h | 186 ++++++-
include/uapi/linux/iommufd.h | 116 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 52 +-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 43 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 490 +++++++++++++++++-
drivers/iommu/iommufd/device.c | 117 +----
drivers/iommu/iommufd/driver.c | 94 ++++
drivers/iommu/iommufd/io_pagetable.c | 95 ++++
drivers/iommu/iommufd/main.c | 80 ++-
drivers/iommu/iommufd/selftest.c | 133 ++++-
drivers/iommu/iommufd/viommu.c | 121 ++++-
tools/testing/selftests/iommu/iommufd.c | 97 +++-
.../selftests/iommu/iommufd_fail_nth.c | 11 +-
Documentation/userspace-api/iommufd.rst | 12 +
19 files changed, 1577 insertions(+), 194 deletions(-)
base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
--
2.43.0
Problem
=======
When host APEI is unable to claim synchronous external abort (SEA)
during stage-2 guest abort, today KVM directly injects an async SError
into the VCPU then resumes it. The injected SError usually results in
unpleasant guest kernel panic.
One of the major situation of guest SEA is when VCPU consumes recoverable
uncorrected memory error (UER), which is not uncommon at all in modern
datacenter servers with large amounts of physical memory. Although SError
and guest panic is sufficient to stop the propagation of corrupted memory
there is still room to recover from memory UER in a more graceful manner.
Proposed Solution
=================
Alternatively KVM can replay the SEA to the faulting VCPU, via existing
KVM_SET_VCPU_EVENTS API. If the memory poison consumption or the fault
that cause SEA is not from guest kernel, the blast radius can be limited
to the consuming or faulting guest userspace process, so the VM can keep
running.
In addition, instead of doing under the hood without involving userspace,
there are benefits to redirect the SEA to VMM:
- VM customers care about the disruptions caused by memory errors, and
VMM usually has the responsibility to start the process of notifying
the customers of memory error events in their VMs. For example some
cloud provider emits a critical log in their observability UI [1], and
provides playbook for customers on how to mitigate disruptions to
their workloads.
- VMM can protect future memory error consumption or faults by unmapping
the poisoned pages from stage-2 page table with KVM userfault [2],
which is more performant than splitting the memslot that contains
the poisoned guest pages.
- VMM can keep track SEA events in the VM. When VMM thinks the status
on the host or the VM is bad enough, e.g. number of distinct SEAs
exceeds a threshold, it can restart the VM on another healthy host.
- Behavior parity with x86 architecture. When machine check exception
(MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to
let VMM either recover from the MCE, or terminate itself with VM.
The prior RFC proposes to implement SIGBUS on arm64 as well, but
Marc preferred VCPU exit over signal [3]. However, implementation
aside, returning SEA to VMM is on par with returning MCE to VMM.
Once SEA is redirected to VMM, among other actions, VMM is encouraged
to inject external aborts into the faulting VCPU, which is already
supported by KVM on arm64, although not fully supported by
KVM_SET_VCPU_EVENTS but complemented in this patchset.
New UAPIs
=========
This patchset introduces following userspace-visiable changes to empower
VMM to control what happens next for guest SEA:
- KVM_CAP_ARM_SEA_TO_USER. If userspace enables this new capability at VM
creation, KVM will not inject SError while taking SEA, but VM exit to
userspace.
- KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details
about the SEA is provided in arm_sea as much as possible, including
ESR value at EL2, if guest virtual and physical addresses (GPA and GVA)
are available and the values if available.
- KVM_CAP_ARM_INJECT_EXT_IABT. VMM today can inject external data abort
to VCPU via KVM_SET_VCPU_EVENTS API. However, in case of instruction
abort, VMM cannot inject it via KVM_SET_VCPU_EVENTS.
KVM_CAP_ARM_INJECT_EXT_IABT is just a natural extend to
KVM_CAP_ARM_INJECT_EXT_DABT that tells VMM KVM_SET_VCPU_EVENTS now
supports external instruction abort.
Patchset utilizes commit 26fbdf369227 ("KVM: arm64: Don't translate
FAR if invalid/unsafe") from [4], available already in kvmarm/next.
[4] makes KVM safely do address translation for HPFAR_EL2, including at
the event of SEA, and indicate if HPFAR_EL2 is valid in NS bit.
This patchset depends on [4] to tell userspace if GPA is valid and
its value if valid.
Patchset is based on commit 68ec8b4e84446 ("Merge branch
kvm-arm64/pkvm-6.16 into kvmarm-master/next")
[1] https://cloud.google.com/solutions/sap/docs/manage-host-errors
[2] https://lpc.events/event/18/contributions/1757/attachments/1442/3073/LPC_%2…
[3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org
[4] https://lore.kernel.org/all/174369514508.3034362.13165690020799838042.b4-ty…
Jiaqi Yan (5):
KVM: arm64: VM exit to userspace to handle SEA
KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid
KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER
KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT
Documentation: kvm: new uAPI for handling SEA
Raghavendra Rao Ananta (1):
KVM: arm64: Allow userspace to inject external instruction aborts
Documentation/virt/kvm/api.rst | 120 ++++++-
arch/arm64/include/asm/kvm_emulate.h | 12 +
arch/arm64/include/asm/kvm_host.h | 8 +
arch/arm64/include/asm/kvm_ras.h | 21 +-
arch/arm64/include/uapi/asm/kvm.h | 3 +-
arch/arm64/kvm/Makefile | 3 +-
arch/arm64/kvm/arm.c | 6 +
arch/arm64/kvm/guest.c | 13 +-
arch/arm64/kvm/inject_fault.c | 3 +
arch/arm64/kvm/kvm_ras.c | 54 +++
arch/arm64/kvm/mmu.c | 12 +-
include/uapi/linux/kvm.h | 12 +
tools/arch/arm64/include/uapi/asm/kvm.h | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../testing/selftests/kvm/arm64/inject_iabt.c | 100 ++++++
.../testing/selftests/kvm/arm64/sea_to_user.c | 324 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
17 files changed, 654 insertions(+), 43 deletions(-)
create mode 100644 arch/arm64/kvm/kvm_ras.c
create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c
--
2.49.0.967.g6a0df3ecc3-goog
Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests.config. This helps
to detect use of uninitialized local variables.
This option found an uninitialized data bug in the cs_dsp test.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
---
tools/testing/kunit/configs/all_tests.config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index cdd9782f9646..4a60bb71fe72 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -10,6 +10,7 @@ CONFIG_KUNIT_EXAMPLE_TEST=y
CONFIG_KUNIT_ALL_TESTS=y
CONFIG_FORTIFY_SOURCE=y
+CONFIG_INIT_STACK_ALL_PATTERN=y
CONFIG_IIO=y
--
2.39.5
This patch set aims to allow ublk server threads to better balance load
amongst themselves by decoupling server threads from ublk_queues/hctxs,
so that multiple threads can service I/Os that are issued from a single
CPU. This can improve performance for workloads in which ublk server CPU
is a bottleneck, and for which load is issued from CPUs which are not
balanced across ublk_queues/hctxs.
Performance
-----------
First create two ublk devices with:
ublkb0: ./kublk add -t null -q 2 --nthreads 2
ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks
Then run load with:
taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS
taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS
Since ublkb1 has per-io-tasks, the second command is able to make use of
both ublk server worker threads and therefore has increased max
throughput.
Caveats:
- This testing was done on a system with 2 numa nodes, but the penalty
of having I/O cross a numa (or LLC) boundary in the per_io_tasks case
is quite high. So these numbers were obtained after moving all ublk
server threads and the application threads to CPUs on the same numa
node/LLC.
- One might expect the scaling to be linear - because ublkb1 can make
use of twice as many ublk server threads, it should be able to drive
twice the throughput. However this is not true (the improvement is
~15%), and needs further investigation.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v8:
- Fix queue_rqs batch dispatch OOPS when dispatching a list of requests
associated to > 1 ublk_queue (Ming Lei, Caleb Sander Mateos)
- Simplify queue_rqs (Caleb Sander Mateos)
- Narrow a couple of types (Ming Lei)
- Add stress test for per io daemons (Ming Lei)
- Link to v7: https://lore.kernel.org/r/20250527-ublk_task_per_io-v7-0-cbdbaf283baa@pures…
Changes in v7:
- Fix queue_rqs batch dispatch for per-io daemons
- Kick round-robin tag allocation changes to a followup
- Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander
Mateos)
- Move some variable assignments to avoid redundant computation (Caleb
Sander Mateos)
- Switch from storing pointers in ublk_io to computing based on address
with container_of in a couple places (Ming Lei)
- Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures…
Changes in v6:
- Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei)
- Add test for this feature (Ming Lei)
- Add documentation for this feature (Ming Lei)
- Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures…
Changes in v5:
- Set io->task before ublk_mark_io_ready (Caleb Sander Mateos)
- Set io->task atomically, read it atomically when needed
- Return 0 on success from command-specific helpers in
__ublk_ch_uring_cmd (Caleb Sander Mateos)
- Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander
Mateos)
- Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures…
Changes in v4:
- Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it
in another set
- Prevent data races by marking data structures which should be
read-only in the I/O path as const (Ming Lei)
- Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures…
Changes in v3:
- Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure
that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb
Sander Mateos)
- Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures…
Changes in v2:
- Remove changes split into other patches
- To ease error handling/synchronization, associate each I/O (instead of
each queue) to the last task that issues a FETCH_REQ against it. Only
that task is allowed to operate on the I/O.
- Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com
---
Uday Shankar (9):
ublk: have a per-io daemon instead of a per-queue daemon
selftests: ublk: kublk: plumb q_id in io_uring user_data
selftests: ublk: kublk: tie sqe allocation to io instead of queue
selftests: ublk: kublk: lift queue initialization out of thread
selftests: ublk: kublk: move per-thread data out of ublk_queue
selftests: ublk: kublk: decouple ublk_queues from ublk server threads
selftests: ublk: add functional test for per io daemons
selftests: ublk: add stress test for per io daemons
Documentation: ublk: document UBLK_F_PER_IO_DAEMON
Documentation/block/ublk.rst | 35 ++-
drivers/block/ublk_drv.c | 111 +++----
include/uapi/linux/ublk_cmd.h | 9 +
tools/testing/selftests/ublk/Makefile | 2 +
tools/testing/selftests/ublk/fault_inject.c | 4 +-
tools/testing/selftests/ublk/file_backed.c | 20 +-
tools/testing/selftests/ublk/kublk.c | 344 ++++++++++++++-------
tools/testing/selftests/ublk/kublk.h | 73 +++--
tools/testing/selftests/ublk/null.c | 22 +-
tools/testing/selftests/ublk/stripe.c | 17 +-
tools/testing/selftests/ublk/test_common.sh | 5 +
tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++
tools/testing/selftests/ublk/test_stress_06.sh | 36 +++
.../selftests/ublk/trace/count_ios_per_tid.bt | 11 +
14 files changed, 512 insertions(+), 232 deletions(-)
---
base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d
change-id: 20250408-ublk_task_per_io-c693cf608d7a
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
The anon_inode_test test fails to build due to attempting to include
a nonexisting overlayfs/wrapper.h:
anon_inode_test.c:10:10: fatal error: overlayfs/wrappers.h: No such file or directory
10 | #include "overlayfs/wrappers.h"
| ^~~~~~~~~~~~~~~~~~~~~~
This is due to 0bd92b9fe538 ("selftests/filesystems: move wrapper.h out
of overlayfs subdir") which was added in the vfs-6.16.selftests branch
which was based on -rc5 and does not contain the newly added test so
once things were merged into vfs.all in the build started failing - both
parent commits are fine.
Fixes: feaa00dbff45a ("Merge branch 'vfs-6.16.selftests' into vfs.all")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/filesystems/anon_inode_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/filesystems/anon_inode_test.c b/tools/testing/selftests/filesystems/anon_inode_test.c
index e8e0ef1460d2..73e0a4d4fb2f 100644
--- a/tools/testing/selftests/filesystems/anon_inode_test.c
+++ b/tools/testing/selftests/filesystems/anon_inode_test.c
@@ -7,7 +7,7 @@
#include <sys/stat.h>
#include "../kselftest_harness.h"
-#include "overlayfs/wrappers.h"
+#include "wrappers.h"
TEST(anon_inode_no_chown)
{
---
base-commit: feaa00dbff45ad9a0dcd04a92f88c745bf880f55
change-id: 20250516-selftests-anon-inode-build-007e206e8422
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This picks up from Michal Rostecki's work[0]. Per Michal's guidance I
have omitted Co-authored tags, as the end result is quite different.
Link: https://lore.kernel.org/rust-for-linux/20240819153656.28807-2-vadorovsky@pr… [0]
Closes: https://github.com/Rust-for-Linux/linux/issues/1075
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v10:
- Rebase on cbeaa41dfe26b72639141e87183cb23e00d4b0dd.
- Implement Alice's suggestion to use a proc macro to work around orphan
rules otherwise preventing `core::ffi::CStr` to be directly printed
with `{}`.
- Link to v9: https://lore.kernel.org/r/20250317-cstr-core-v9-0-51d6cc522f62@gmail.com
Changes in v9:
- Rebase on rust-next.
- Restore `impl Display for BStr` which exists upstream[1].
- Link: https://doc.rust-lang.org/nightly/std/bstr/struct.ByteStr.html#impl-Display… [1]
- Link to v8: https://lore.kernel.org/r/20250203-cstr-core-v8-0-cb3f26e78686@gmail.com
Changes in v8:
- Move `{from,as}_char_ptr` back to `CStrExt`. This reduces the diff
some.
- Restore `from_bytes_with_nul_unchecked_mut`, `to_cstring`.
- Link to v7: https://lore.kernel.org/r/20250202-cstr-core-v7-0-da1802520438@gmail.com
Changes in v7:
- Rebased on mainline.
- Restore functionality added in commit a321f3ad0a5d ("rust: str: add
{make,to}_{upper,lower}case() to CString").
- Used `diff.algorithm patience` to improve diff readability.
- Link to v6: https://lore.kernel.org/r/20250202-cstr-core-v6-0-8469cd6d29fd@gmail.com
Changes in v6:
- Split the work into several commits for ease of review.
- Restore `{from,as}_char_ptr` to allow building on ARM (see commit
message).
- Add `CStrExt` to `kernel::prelude`. (Alice Ryhl)
- Remove `CStrExt::from_bytes_with_nul_unchecked_mut` and restore
`DerefMut for CString`. (Alice Ryhl)
- Rename and hide `kernel::c_str!` to encourage use of C-String
literals.
- Drop implementation and invocation changes in kunit.rs. (Trevor Gross)
- Drop docs on `Display` impl. (Trevor Gross)
- Rewrite docs in the style of the standard library.
- Restore the `test_cstr_debug` unit tests to demonstrate that the
implementation has changed.
Changes in v5:
- Keep the `test_cstr_display*` unit tests.
Changes in v4:
- Provide the `CStrExt` trait with `display()` method, which returns a
`CStrDisplay` wrapper with `Display` implementation. This addresses
the lack of `Display` implementation for `core::ffi::CStr`.
- Provide `from_bytes_with_nul_unchecked_mut()` method in `CStrExt`,
which might be useful and is going to prevent manual, unsafe casts.
- Fix a typo (s/preffered/prefered/).
Changes in v3:
- Fix the commit message.
- Remove redundant braces in `use`, when only one item is imported.
Changes in v2:
- Do not remove `c_str` macro. While it's preferred to use C-string
literals, there are two cases where `c_str` is helpful:
- When working with macros, which already return a Rust string literal
(e.g. `stringify!`).
- When building macros, where we want to take a Rust string literal as an
argument (for caller's convenience), but still use it as a C-string
internally.
- Use Rust literals as arguments in macros (`new_mutex`, `new_condvar`,
`new_mutex`). Use the `c_str` macro to convert these literals to C-string
literals.
- Use `c_str` in kunit.rs for converting the output of `stringify!` to a
`CStr`.
- Remove `DerefMut` implementation for `CString`.
---
Tamir Duberstein (5):
rust: retitle "Example" section as "Examples"
rust: support formatting of foreign types
rust: replace `CStr` with `core::ffi::CStr`
rust: replace `kernel::c_str!` with C-Strings
rust: remove core::ffi::CStr reexport
drivers/block/rnull.rs | 2 +-
drivers/gpu/drm/drm_panic_qr.rs | 5 +-
drivers/gpu/nova-core/driver.rs | 2 +-
drivers/gpu/nova-core/firmware.rs | 2 +-
drivers/net/phy/ax88796b_rust.rs | 8 +-
drivers/net/phy/qt2025.rs | 6 +-
rust/kernel/block/mq.rs | 2 +-
rust/kernel/device.rs | 9 +-
rust/kernel/devres.rs | 2 +-
rust/kernel/driver.rs | 4 +-
rust/kernel/error.rs | 10 +-
rust/kernel/faux.rs | 5 +-
rust/kernel/firmware.rs | 16 +-
rust/kernel/fmt.rs | 77 +++++++
rust/kernel/kunit.rs | 21 +-
rust/kernel/lib.rs | 3 +-
rust/kernel/miscdevice.rs | 5 +-
rust/kernel/net/phy.rs | 12 +-
rust/kernel/of.rs | 5 +-
rust/kernel/pci.rs | 2 +-
rust/kernel/platform.rs | 6 +-
rust/kernel/prelude.rs | 5 +-
rust/kernel/print.rs | 4 +-
rust/kernel/seq_file.rs | 6 +-
rust/kernel/str.rs | 415 ++++++++++-------------------------
rust/kernel/sync.rs | 7 +-
rust/kernel/sync/condvar.rs | 4 +-
rust/kernel/sync/lock.rs | 4 +-
rust/kernel/sync/lock/global.rs | 6 +-
rust/kernel/sync/poll.rs | 1 +
rust/kernel/workqueue.rs | 1 +
rust/macros/fmt.rs | 118 ++++++++++
rust/macros/kunit.rs | 6 +-
rust/macros/lib.rs | 21 +-
rust/macros/module.rs | 2 +-
samples/rust/rust_driver_faux.rs | 4 +-
samples/rust/rust_driver_pci.rs | 4 +-
samples/rust/rust_driver_platform.rs | 4 +-
samples/rust/rust_misc_device.rs | 3 +-
scripts/rustdoc_test_gen.rs | 2 +-
40 files changed, 426 insertions(+), 395 deletions(-)
---
base-commit: cbeaa41dfe26b72639141e87183cb23e00d4b0dd
change-id: 20250201-cstr-core-d4b9b69120cf
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
This patch set aims to allow ublk server threads to better balance load
amongst themselves by decoupling server threads from ublk_queues/hctxs,
so that multiple threads can service I/Os that are issued from a single
CPU. This can improve performance for workloads in which ublk server CPU
is a bottleneck, and for which load is issued from CPUs which are not
balanced across ublk_queues/hctxs.
Performance
-----------
First create two ublk devices with:
ublkb0: ./kublk add -t null -q 2 --nthreads 2
ublkb1: ./kublk add -t null -q 2 --nthreads 2 --per_io_tasks
Then run load with:
taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb0: 1.90M IOPS
taskset -c 1 fio/t/io_uring -r5 -p0 /dev/ublkb1: 2.18M IOPS
Since ublkb1 has per-io-tasks, the second command is able to make use of
both ublk server worker threads and therefore has increased max
throughput.
Caveats:
- This testing was done on a system with 2 numa nodes, but the penalty
of having I/O cross a numa (or LLC) boundary in the per_io_tasks case
is quite high. So these numbers were obtained after moving all ublk
server threads and the application threads to CPUs on the same numa
node/LLC.
- One might expect the scaling to be linear - because ublkb1 can make
use of twice as many ublk server threads, it should be able to drive
twice the throughput. However this is not true (the improvement is
~15%), and needs further investigation.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v7:
- Fix queue_rqs batch dispatch for per-io daemons
- Kick round-robin tag allocation changes to a followup
- Add explicit feature flag for per-task daemons (Ming Lei, Caleb Sander
Mateos)
- Move some variable assignments to avoid redundant computation (Caleb
Sander Mateos)
- Switch from storing pointers in ublk_io to computing based on address
with container_of in a couple places (Ming Lei)
- Link to v6: https://lore.kernel.org/r/20250507-ublk_task_per_io-v6-0-a2a298783c01@pures…
Changes in v6:
- Add a feature flag for this feature, called UBLK_F_RR_TAGS (Ming Lei)
- Add test for this feature (Ming Lei)
- Add documentation for this feature (Ming Lei)
- Link to v5: https://lore.kernel.org/r/20250416-ublk_task_per_io-v5-0-9261ad7bff20@pures…
Changes in v5:
- Set io->task before ublk_mark_io_ready (Caleb Sander Mateos)
- Set io->task atomically, read it atomically when needed
- Return 0 on success from command-specific helpers in
__ublk_ch_uring_cmd (Caleb Sander Mateos)
- Rename ublk_handle_need_get_data to ublk_get_data (Caleb Sander
Mateos)
- Link to v4: https://lore.kernel.org/r/20250415-ublk_task_per_io-v4-0-54210b91a46f@pures…
Changes in v4:
- Drop "ublk: properly serialize all FETCH_REQs" since Ming is taking it
in another set
- Prevent data races by marking data structures which should be
read-only in the I/O path as const (Ming Lei)
- Link to v3: https://lore.kernel.org/r/20250410-ublk_task_per_io-v3-0-b811e8f4554a@pures…
Changes in v3:
- Check for UBLK_IO_FLAG_ACTIVE on I/O again after taking lock to ensure
that two concurrent FETCH_REQs on the same I/O can't succeed (Caleb
Sander Mateos)
- Link to v2: https://lore.kernel.org/r/20250408-ublk_task_per_io-v2-0-b97877e6fd50@pures…
Changes in v2:
- Remove changes split into other patches
- To ease error handling/synchronization, associate each I/O (instead of
each queue) to the last task that issues a FETCH_REQ against it. Only
that task is allowed to operate on the I/O.
- Link to v1: https://lore.kernel.org/r/20241002224437.3088981-1-ushankar@purestorage.com
---
Uday Shankar (8):
ublk: have a per-io daemon instead of a per-queue daemon
selftests: ublk: kublk: plumb q_id in io_uring user_data
selftests: ublk: kublk: tie sqe allocation to io instead of queue
selftests: ublk: kublk: lift queue initialization out of thread
selftests: ublk: kublk: move per-thread data out of ublk_queue
selftests: ublk: kublk: decouple ublk_queues from ublk server threads
selftests: ublk: add test for per io daemons
Documentation: ublk: document UBLK_F_PER_IO_DAEMON
Documentation/block/ublk.rst | 35 ++-
drivers/block/ublk_drv.c | 108 +++----
include/uapi/linux/ublk_cmd.h | 9 +
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/fault_inject.c | 4 +-
tools/testing/selftests/ublk/file_backed.c | 20 +-
tools/testing/selftests/ublk/kublk.c | 345 ++++++++++++++-------
tools/testing/selftests/ublk/kublk.h | 73 +++--
tools/testing/selftests/ublk/null.c | 22 +-
tools/testing/selftests/ublk/stripe.c | 17 +-
tools/testing/selftests/ublk/test_generic_12.sh | 55 ++++
.../selftests/ublk/trace/count_ios_per_tid.bt | 11 +
12 files changed, 470 insertions(+), 230 deletions(-)
---
base-commit: 533c87e2ed742454957f14d7bef9f48d5a72e72d
change-id: 20250408-ublk_task_per_io-c693cf608d7a
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
In some application like bpftrace [1], need to get the cwd from the pid.
This patch provides a new kfunc that can get the cwd of the process from
the pid.
[1] https://github.com/bpftrace/bpftrace/issues/3314
Rong Tao (2):
bpf: Add bpf_task_cwd_from_pid() kfunc
selftests/bpf: Add selftests for bpf_task_cwd_from_pid()
kernel/bpf/helpers.c | 45 ++++++++++++++++++
.../selftests/bpf/prog_tests/task_kfunc.c | 3 ++
.../selftests/bpf/progs/task_kfunc_common.h | 1 +
.../selftests/bpf/progs/task_kfunc_success.c | 47 +++++++++++++++++++
4 files changed, 96 insertions(+)
--
2.49.0
This commit adds a new kernel selftest to verify RTNLGRP_IPV4_MCADDR
and RTNLGRP_IPV6_MCADDR notifications. The test works by adding and
removing a dummy interface and then confirming that the system
correctly receives join and removal notifications for the 224.0.0.1
and ff02::1 multicast addresses.
The test relies on the iproute2 version to be 6.13+.
Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net TEST_PROGS=rtnetlink.sh \
TEST_GEN_PROGS="" run_tests
Cc: Maciej Żenczykowski <maze(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com>
---
tools/testing/selftests/net/rtnetlink.sh | 34 ++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 2e8243a65b50..9dbcaaeaf8cd 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -21,6 +21,7 @@ ALL_TESTS="
kci_test_vrf
kci_test_encap
kci_test_macsec
+ kci_test_mcast_addr_notification
kci_test_ipsec
kci_test_ipsec_offload
kci_test_fdb_get
@@ -1334,6 +1335,39 @@ kci_test_mngtmpaddr()
return $ret
}
+kci_test_mcast_addr_notification()
+{
+ local tmpfile
+ local monitor_pid
+ local match_result
+
+ tmpfile=$(mktemp)
+
+ ip monitor maddr > $tmpfile &
+ monitor_pid=$!
+ sleep 1
+
+ run_cmd ip link add name test-dummy1 type dummy
+ run_cmd ip link set test-dummy1 up
+ run_cmd ip link del dev test-dummy1
+ sleep 1
+
+ match_result=$(grep -cE "test-dummy1.*(224.0.0.1|ff02::1)" $tmpfile)
+
+ kill $monitor_pid
+ rm $tmpfile
+ # There should be 4 line matches as follows.
+ # 13: test-dummy1 inet6 mcast ff02::1 scope global
+ # 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet mcast 224.0.0.1 scope global
+ # Deleted 13: test-dummy1 inet6 mcast ff02::1 scope global
+ if [ $match_result -ne 4 ];then
+ end_test "FAIL: mcast addr notification"
+ return 1
+ fi
+ end_test "PASS: mcast addr notification"
+}
+
kci_test_rtnl()
{
local current_test
--
2.49.0.1204.g71687c7c1d-goog
Let's test some basic functionality using /dev/mem. These tests will
implicitly cover some PAT (Page Attribute Handling) handling on x86.
These tests will only run when /dev/mem access to the first two pages
in physical address space is possible and allowed; otherwise, the tests
are skipped.
On current x86-64 with PAT inside a VM, all tests pass:
TAP version 13
1..6
# Starting 6 tests from 1 test cases.
# RUN pfnmap.madvise_disallowed ...
# OK pfnmap.madvise_disallowed
ok 1 pfnmap.madvise_disallowed
# RUN pfnmap.munmap_split ...
# OK pfnmap.munmap_split
ok 2 pfnmap.munmap_split
# RUN pfnmap.mremap_fixed ...
# OK pfnmap.mremap_fixed
ok 3 pfnmap.mremap_fixed
# RUN pfnmap.mremap_shrink ...
# OK pfnmap.mremap_shrink
ok 4 pfnmap.mremap_shrink
# RUN pfnmap.mremap_expand ...
# OK pfnmap.mremap_expand
ok 5 pfnmap.mremap_expand
# RUN pfnmap.fork ...
# OK pfnmap.fork
ok 6 pfnmap.fork
# PASSED: 6 / 6 tests passed.
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
However, we are able to trigger:
[ 27.888251] x86/PAT: pfnmap:1790 freeing invalid memtype [mem 0x00000000-0x00000fff]
There are probably more things worth testing in the future, such as
MAP_PRIVATE handling. But this set of tests is sufficient to cover most of
the things we will rework regarding PAT handling.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
Hopefully I didn't miss any review feedback.
v1 -> v2:
* Rewrite using kselftest_harness, which simplifies a lot of things
* Add to .gitignore and run_vmtests.sh
* Register signal handler on demand
* Use volatile trick to force a read (not factoring out FORCE_READ just yet)
* Drop mprotect() test case
* Add some more comments why we test certain things
* Use NULL for mmap() first parameter instead of 0
* Smaller fixes
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/pfnmap.c | 196 ++++++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
4 files changed, 202 insertions(+)
create mode 100644 tools/testing/selftests/mm/pfnmap.c
diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index 91db34941a143..824266982aa36 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -20,6 +20,7 @@ mremap_test
on-fault-limit
transhuge-stress
pagemap_ioctl
+pfnmap
*.tmp*
protection_keys
protection_keys_32
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index ad4d6043a60f0..ae6f994d3add7 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -84,6 +84,7 @@ TEST_GEN_FILES += mremap_test
TEST_GEN_FILES += mseal_test
TEST_GEN_FILES += on-fault-limit
TEST_GEN_FILES += pagemap_ioctl
+TEST_GEN_FILES += pfnmap
TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
diff --git a/tools/testing/selftests/mm/pfnmap.c b/tools/testing/selftests/mm/pfnmap.c
new file mode 100644
index 0000000000000..8a9d19b6020c7
--- /dev/null
+++ b/tools/testing/selftests/mm/pfnmap.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Basic VM_PFNMAP tests relying on mmap() of '/dev/mem'
+ *
+ * Copyright 2025, Red Hat, Inc.
+ *
+ * Author(s): David Hildenbrand <david(a)redhat.com>
+ */
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <linux/mman.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include "../kselftest_harness.h"
+#include "vm_util.h"
+
+static sigjmp_buf sigjmp_buf_env;
+
+static void signal_handler(int sig)
+{
+ siglongjmp(sigjmp_buf_env, -EFAULT);
+}
+
+static int test_read_access(char *addr, size_t size, size_t pagesize)
+{
+ size_t offs;
+ int ret;
+
+ if (signal(SIGSEGV, signal_handler) == SIG_ERR)
+ return -EINVAL;
+
+ ret = sigsetjmp(sigjmp_buf_env, 1);
+ if (!ret) {
+ for (offs = 0; offs < size; offs += pagesize)
+ /* Force a read that the compiler cannot optimize out. */
+ *((volatile char *)(addr + offs));
+ }
+ if (signal(SIGSEGV, signal_handler) == SIG_ERR)
+ return -EINVAL;
+
+ return ret;
+}
+
+FIXTURE(pfnmap)
+{
+ size_t pagesize;
+ int dev_mem_fd;
+ char *addr1;
+ size_t size1;
+ char *addr2;
+ size_t size2;
+};
+
+FIXTURE_SETUP(pfnmap)
+{
+ self->pagesize = getpagesize();
+
+ self->dev_mem_fd = open("/dev/mem", O_RDONLY);
+ if (self->dev_mem_fd < 0)
+ SKIP(return, "Cannot open '/dev/mem'\n");
+
+ /* We'll require the first two pages throughout our tests ... */
+ self->size1 = self->pagesize * 2;
+ self->addr1 = mmap(NULL, self->size1, PROT_READ, MAP_SHARED,
+ self->dev_mem_fd, 0);
+ if (self->addr1 == MAP_FAILED)
+ SKIP(return, "Cannot mmap '/dev/mem'\n");
+
+ /* ... and want to be able to read from them. */
+ if (test_read_access(self->addr1, self->size1, self->pagesize))
+ SKIP(return, "Cannot read-access mmap'ed '/dev/mem'\n");
+
+ self->size2 = 0;
+ self->addr2 = MAP_FAILED;
+}
+
+FIXTURE_TEARDOWN(pfnmap)
+{
+ if (self->addr2 != MAP_FAILED)
+ munmap(self->addr2, self->size2);
+ if (self->addr1 != MAP_FAILED)
+ munmap(self->addr1, self->size1);
+ if (self->dev_mem_fd >= 0)
+ close(self->dev_mem_fd);
+}
+
+TEST_F(pfnmap, madvise_disallowed)
+{
+ int advices[] = {
+ MADV_DONTNEED,
+ MADV_DONTNEED_LOCKED,
+ MADV_FREE,
+ MADV_WIPEONFORK,
+ MADV_COLD,
+ MADV_PAGEOUT,
+ MADV_POPULATE_READ,
+ MADV_POPULATE_WRITE,
+ };
+ int i;
+
+ /* All these advices must be rejected. */
+ for (i = 0; i < ARRAY_SIZE(advices); i++) {
+ EXPECT_LT(madvise(self->addr1, self->pagesize, advices[i]), 0);
+ EXPECT_EQ(errno, EINVAL);
+ }
+}
+
+TEST_F(pfnmap, munmap_split)
+{
+ /*
+ * Unmap the first page. This munmap() call is not really expected to
+ * fail, but we might be able to trigger other internal issues.
+ */
+ ASSERT_EQ(munmap(self->addr1, self->pagesize), 0);
+
+ /*
+ * Remap the first page while the second page is still mapped. This
+ * makes sure that any PAT tracking on x86 will allow for mmap()'ing
+ * a page again while some parts of the first mmap() are still
+ * around.
+ */
+ self->size2 = self->pagesize;
+ self->addr2 = mmap(NULL, self->pagesize, PROT_READ, MAP_SHARED,
+ self->dev_mem_fd, 0);
+ ASSERT_NE(self->addr2, MAP_FAILED);
+}
+
+TEST_F(pfnmap, mremap_fixed)
+{
+ char *ret;
+
+ /* Reserve a destination area. */
+ self->size2 = self->size1;
+ self->addr2 = mmap(NULL, self->size2, PROT_READ, MAP_ANON | MAP_PRIVATE,
+ -1, 0);
+ ASSERT_NE(self->addr2, MAP_FAILED);
+
+ /* mremap() over our destination. */
+ ret = mremap(self->addr1, self->size1, self->size2,
+ MREMAP_FIXED | MREMAP_MAYMOVE, self->addr2);
+ ASSERT_NE(ret, MAP_FAILED);
+}
+
+TEST_F(pfnmap, mremap_shrink)
+{
+ char *ret;
+
+ /* Shrinking is expected to work. */
+ ret = mremap(self->addr1, self->size1, self->size1 - self->pagesize, 0);
+ ASSERT_NE(ret, MAP_FAILED);
+}
+
+TEST_F(pfnmap, mremap_expand)
+{
+ /*
+ * Growing is not expected to work, and getting it right would
+ * be challenging. So this test primarily serves as an early warning
+ * that something that probably should never work suddenly works.
+ */
+ self->size2 = self->size1 + self->pagesize;
+ self->addr2 = mremap(self->addr1, self->size1, self->size2, MREMAP_MAYMOVE);
+ ASSERT_EQ(self->addr2, MAP_FAILED);
+}
+
+TEST_F(pfnmap, fork)
+{
+ pid_t pid;
+ int ret;
+
+ /* fork() a child and test if the child can access the pages. */
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (!pid) {
+ EXPECT_EQ(test_read_access(self->addr1, self->size1,
+ self->pagesize), 0);
+ exit(0);
+ }
+
+ wait(&ret);
+ if (WIFEXITED(ret))
+ ret = WEXITSTATUS(ret);
+ else
+ ret = -EINVAL;
+ ASSERT_EQ(ret, 0);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 188b125bf1f6b..dddd1dd8af145 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -63,6 +63,8 @@ separated by spaces:
test soft dirty page bit semantics
- pagemap
test pagemap_scan IOCTL
+- pfnmap
+ tests for VM_PFNMAP handling
- cow
test copy-on-write semantics
- thp
@@ -472,6 +474,8 @@ fi
CATEGORY="pagemap" run_test ./pagemap_ioctl
+CATEGORY="pfnmap" run_test ./pfnmap
+
# COW tests
CATEGORY="cow" run_test ./cow
--
2.49.0
Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency
and to centralize abstraction.
Since `kernel::ffi::c_void` is a straightforward re-export of
`core::ffi::c_void`, both are functionally equivalent. However, using
`kernel::ffi::c_void` improves consistency across the kernel's Rust code
and provides a unified reference point in case the definition ever needs
to change, even if such a change is unlikely.
Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com>
Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089/topic/x/near/52…
---
So in sum, I believe it's reasonable to keep the diff unchanged... but
I'm happy to adjust if you'd prefer a different approach.
---
Changes in v2:
- Add "Link" tag to the related discussion on Zulip
- Reword the commit message to clarify `kernel::ffi::c_void` is a re-export
- Link to v1: https://lore.kernel.org/rust-for-linux/20250526162429.1114862-1-y.j3ms.n@gm…
---
rust/kernel/kunit.rs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 81833a687b75..bd6fc712dd79 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -6,7 +6,8 @@
//!
//! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html>
-use core::{ffi::c_void, fmt};
+use core::fmt;
+use kernel::ffi::c_void;
/// Prints a KUnit error-level message.
///
base-commit: f4daa80d6be7d3c55ca72a8e560afc4e21f886aa
--
2.39.5
Use a match expression with slice patterns instead of length checks and
indexing. The result is more idiomatic, which is a better example for
future Rust code authors.
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
scripts/rustdoc_test_gen.rs | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 1ca253594d38..a3dc251221e0 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -85,24 +85,25 @@ fn find_candidates(
}
}
- assert!(
- valid_paths.len() > 0,
- "No path candidates found for `{file}`. This is likely a bug in the build system, or some \
- files went away while compiling."
- );
-
- if valid_paths.len() > 1 {
- eprintln!("Several path candidates found:");
- for path in valid_paths {
- eprintln!(" {path:?}");
+ match valid_paths.as_slice() {
+ [] => panic!(
+ "No path candidates found for `{file}`. This is likely a bug in the build system, or \
+ some files went away while compiling."
+ ),
+ [valid_path] => {
+ valid_path.to_str().unwrap()
+ }
+ valid_paths => {
+ eprintln!("Several path candidates found:");
+ for path in valid_paths {
+ eprintln!(" {path:?}");
+ }
+ panic!(
+ "Several path candidates found for `{file}`, please resolve the ambiguity by \
+ renaming a file or folder."
+ );
}
- panic!(
- "Several path candidates found for `{file}`, please resolve the ambiguity by renaming \
- a file or folder."
- );
}
-
- valid_paths[0].to_str().unwrap()
}
fn main() {
---
base-commit: bfc3cd87559bc593bb32bb1482f9cae3308b6398
change-id: 20250527-idiomatic-match-slice-26a79d100e4d
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
This commit introduces a new vmtest.sh runner for vsock.
It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H,
H2G, and loopback. The testing tools from tools/testing/vsock/ are
reused. Currently, only vsock_test is used.
VMCI and hyperv support is included in the config file to be built with
the -b option, though not used in the tests.
Only tested on x86.
To run:
$ make -C tools/testing/selftests TARGETS=vsock
$ tools/testing/selftests/vsock/vmtest.sh
or
$ make -C tools/testing/selftests TARGETS=vsock run_tests
Example runs (after make -C tools/testing/selftests TARGETS=vsock):
$ ./tools/testing/selftests/vsock/vmtest.sh
1..3
ok 0 vm_server_host_client
ok 1 vm_client_host_server
ok 2 vm_loopback
SUMMARY: PASS=3 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_m7DI.log
$ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback
1..1
ok 0 vm_loopback
SUMMARY: PASS=1 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_a1IO.log
$ mkdir -p ~/scratch
$ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch
[... omitted ...]
$ cd ~/scratch
$ ./run_kselftest.sh
TAP version 13
1..1
# timeout set to 300
# selftests: vsock: vmtest.sh
# 1..3
# ok 0 vm_server_host_client
# ok 1 vm_client_host_server
# ok 2 vm_loopback
# SUMMARY: PASS=3 SKIP=0 FAIL=0
# Log: /tmp/vsock_vmtest_svEl.log
ok 1 selftests: vsock: vmtest.sh
Future work can include vsock_diag_test.
Because vsock requires a VM to test anything other than loopback, this
patch adds vmtest.sh as a kselftest itself. This is different than other
systems that have a "vmtest.sh", where it is used as a utility script to
spin up a VM to run the selftests as a guest (but isn't hooked into
kselftest).
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
---
Changes in v9:
- make kselftest build target depend on tools/testing/vsock sources (Stefano)
- add check_vng() for vng version checking (Stefano)
- add virtme_ssh_channel=tcp to kernel cmdline (Stefano)
- Link to v8: https://lore.kernel.org/r/20250522-vsock-vmtest-v8-1-367619bef134@gmail.com
Changes in v8:
- remove NIPA comment from commit msg
- remove tap_* functions and TAP_PREFIX
- add -b for building kernel
- Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com
Changes in v7:
- fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS
- updated commit message with updated output
- updated commit message with commands for installing/running as
kselftest
- Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com
Changes in v6:
- add make cmd in commit message in vmtest.sh example (Stefano)
- check nonzero size of QEMU_PIDFILE using -s conditional (Stefano)
- display log file path after tests so it is easier to find amongst other random names
- cleanup qemu pidfile if qemu is unable to remove it
- make oops/warning failures more obvious with 'FAIL' prefix in log
(simply saying 'detected' wasn't clear enough to identify failing
condition)
- Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com
Changes in v5:
- make log file a tmpfile (Paolo)
- make sure both default and user defined QEMU gets handled by the dependency check (Paolo)
- increased VM boot up timeout from 1m to 3m for slow hosts (Paolo)
- rename vm_setup -> vm_start (Paolo)
- derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage
- Remove unused 'unset IFS' line (Paolo)
- leave space after variable declarations (Paolo)
- make QEMU_PIDFILE a tmp file (Paolo)
- make everything readonly that is only read (Paolo)
- source ktap_helpers.sh for KSFT_PASS and friends (Paolo)
- don't check for timeout util (Paolo)
- add missing usage string for -q qemu arg
- add tap prefix to SUMMARY line since it isn't part of TAP protocol
- exit with the correct status code based on failure/pass counts
- Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com
Changes in v4:
- do not use special tab delimiter for help string parsing (Stefano + Paolo)
- fix paths for when installing kselftest and running out-of-tree (Paolo)
- change vng to using running kernel instead of compiled kernel (Paolo)
- use multi-line string for QEMU_OPTS (Stefano)
- change timeout to 300s (Paolo)
- skip if tools are not found and use kselftests status codes (Paolo)
- remove build from vmtest.sh (Paolo)
- change 2222 -> SSH_HOST_PORT (Stefano)
- add tap-format output
- add vmtest.log to gitignore
- check for vsock_test binary and remind user to build it if missing
- create a proper build in makefile
- style fixes
- add ssh, timeout, and pkill to dependency check, just in case
- fix numerical comparison in conditionals
- check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh)
- fix tracking of pass/fail bug
- fix stderr redirection bug
- Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com
Changes in v3:
- use common conditional syntax for checking variables
- use return value instead of global rc
- fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER
- use SIGTERM instead of SIGKILL on cleanup
- use peer-cid=1 for loopback
- change sleep delay times into globals
- fix test_vm_loopback logging
- add test selection in arguments
- make QEMU an argument
- check that vng binary is on path
- use QEMU variable
- change <tab><backslash> to <space><backslash>
- fix hardcoded file paths
- add comment in commit msg about script that vmtest.sh was based off of
- Add tools/testing/selftest/vsock/Makefile for kselftest
- Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com
Changes in v2:
- add kernel oops and warnings checker
- change testname variable to use FUNCNAME
- fix spacing in test_vm_server_host_client
- add -s skip build option to vmtest.sh
- add test_vm_loopback
- pass port to vm_wait_for_listener
- fix indentation in vmtest.sh
- add vmci and hyperv to config
- changed whitespace from tabs to spaces in help string
- Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com
---
MAINTAINERS | 1 +
tools/testing/selftests/vsock/.gitignore | 2 +
tools/testing/selftests/vsock/Makefile | 17 ++
tools/testing/selftests/vsock/config | 114 ++++++++
tools/testing/selftests/vsock/settings | 1 +
tools/testing/selftests/vsock/vmtest.sh | 487 +++++++++++++++++++++++++++++++
6 files changed, 622 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h
F: include/uapi/linux/vm_sockets_diag.h
F: include/uapi/linux/vsockmon.h
F: net/vmw_vsock/
+F: tools/testing/selftests/vsock/
F: tools/testing/vsock/
VMALLOC
diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84
--- /dev/null
+++ b/tools/testing/selftests/vsock/.gitignore
@@ -0,0 +1,2 @@
+vmtest.log
+vsock_test
diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..c407c0afd9388ee692d59a95f304182f067596e9
--- /dev/null
+++ b/tools/testing/selftests/vsock/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CURDIR := $(abspath .)
+TOOLSDIR := $(abspath ../../..)
+VSOCK_TEST_DIR := $(TOOLSDIR)/testing/vsock
+VSOCK_TEST_SRCS := $(wildcard $(VSOCK_TEST_DIR)/*.c $(VSOCK_TEST_DIR)/*.h)
+
+$(OUTPUT)/vsock_test: $(VSOCK_TEST_DIR)/vsock_test
+ install -m 755 $< $@
+
+$(VSOCK_TEST_DIR)/vsock_test: $(VSOCK_TEST_SRCS)
+ $(MAKE) -C $(VSOCK_TEST_DIR) vsock_test
+TEST_PROGS += vmtest.sh
+TEST_GEN_FILES := vsock_test
+
+include ../lib.mk
+
diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config
new file mode 100644
index 0000000000000000000000000000000000000000..3bffaaf98fff92dc0e3bc1286afa3d8d5d52f4c7
--- /dev/null
+++ b/tools/testing/selftests/vsock/config
@@ -0,0 +1,114 @@
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_BPF=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT=y
+CONFIG_HAVE_EBPF_JIT=y
+CONFIG_BPF_EVENTS=y
+CONFIG_FTRACE_SYSCALLS=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_HAVE_DYNAMIC_FTRACE=y
+CONFIG_DYNAMIC_FTRACE=y
+CONFIG_HAVE_KPROBES=y
+CONFIG_KPROBES=y
+CONFIG_KPROBE_EVENTS=y
+CONFIG_ARCH_SUPPORTS_UPROBES=y
+CONFIG_UPROBES=y
+CONFIG_UPROBE_EVENTS=y
+CONFIG_DEBUG_FS=y
+CONFIG_FW_CFG_SYSFS=y
+CONFIG_FW_CFG_SYSFS_CMDLINE=y
+CONFIG_DRM=y
+CONFIG_DRM_VIRTIO_GPU=y
+CONFIG_DRM_VIRTIO_GPU_KMS=y
+CONFIG_DRM_BOCHS=y
+CONFIG_VIRTIO_IOMMU=y
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_SND_SEQUENCER=y
+CONFIG_SND_PCI=y
+CONFIG_SND_INTEL8X0=y
+CONFIG_SND_HDA_CODEC_REALTEK=y
+CONFIG_SECURITYFS=y
+CONFIG_CGROUP_BPF=y
+CONFIG_SQUASHFS=y
+CONFIG_SQUASHFS_XZ=y
+CONFIG_SQUASHFS_ZSTD=y
+CONFIG_FUSE_FS=y
+CONFIG_SERIO=y
+CONFIG_PCI=y
+CONFIG_INPUT=y
+CONFIG_INPUT_KEYBOARD=y
+CONFIG_KEYBOARD_ATKBD=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_X86_VERBOSE_BOOTUP=y
+CONFIG_VGA_CONSOLE=y
+CONFIG_FB=y
+CONFIG_FB_VESA=y
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_DRV_CMOS=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+CONFIG_KVM_GUEST=y
+CONFIG_KVM=y
+CONFIG_KVM_INTEL=y
+CONFIG_KVM_AMD=y
+CONFIG_VSOCKETS=y
+CONFIG_VSOCKETS_DIAG=y
+CONFIG_VSOCKETS_LOOPBACK=y
+CONFIG_VMWARE_VMCI_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS_COMMON=y
+CONFIG_HYPERV_VSOCKETS=y
+CONFIG_VMWARE_VMCI=y
+CONFIG_VHOST_VSOCK=y
+CONFIG_HYPERV=y
+CONFIG_UEVENT_HELPER=n
+CONFIG_VIRTIO=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_NET=y
+CONFIG_NET_CORE=y
+CONFIG_NETDEVICES=y
+CONFIG_NETWORK_FILESYSTEMS=y
+CONFIG_INET=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
+CONFIG_9P_FS=y
+CONFIG_VIRTIO_NET=y
+CONFIG_CMDLINE_OVERRIDE=n
+CONFIG_BINFMT_SCRIPT=y
+CONFIG_SHMEM=y
+CONFIG_TMPFS=y
+CONFIG_UNIX=y
+CONFIG_MODULE_SIG_FORCE=n
+CONFIG_DEVTMPFS=y
+CONFIG_TTY=y
+CONFIG_VT=y
+CONFIG_UNIX98_PTYS=y
+CONFIG_EARLY_PRINTK=y
+CONFIG_INOTIFY_USER=y
+CONFIG_BLOCK=y
+CONFIG_SCSI_LOWLEVEL=y
+CONFIG_SCSI=y
+CONFIG_SCSI_VIRTIO=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+CONFIG_I6300ESB_WDT=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
+CONFIG_OVERLAY_FS=y
+CONFIG_DAX=y
+CONFIG_DAX_DRIVER=y
+CONFIG_FS_DAX=y
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
+CONFIG_ZONE_DEVICE=y
+CONFIG_FUSE_FS=y
+CONFIG_VIRTIO_FS=y
+CONFIG_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS=y
diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings
new file mode 100644
index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019
--- /dev/null
+++ b/tools/testing/selftests/vsock/settings
@@ -0,0 +1 @@
+timeout=300
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
new file mode 100755
index 0000000000000000000000000000000000000000..edacebfc163251eee9cd495eb5e704dc7adc958e
--- /dev/null
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -0,0 +1,487 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2025 Meta Platforms, Inc. and affiliates
+#
+# Dependencies:
+# * virtme-ng
+# * busybox-static (used by virtme-ng)
+# * qemu (used by virtme-ng)
+
+readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
+readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
+
+source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh
+
+readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test
+readonly TEST_GUEST_PORT=51000
+readonly TEST_HOST_PORT=50000
+readonly TEST_HOST_PORT_LISTENER=50001
+readonly SSH_GUEST_PORT=22
+readonly SSH_HOST_PORT=2222
+readonly VSOCK_CID=1234
+readonly WAIT_PERIOD=3
+readonly WAIT_PERIOD_MAX=60
+readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
+readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+
+# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a
+# control port forwarded for vsock_test. Because virtme-ng doesn't support
+# adding an additional port to forward to the device created from "--ssh" and
+# virtme-init mistakenly sets identical IPs to the ssh device and additional
+# devices, we instead opt out of using --ssh, add the device manually, and also
+# add the kernel cmdline options that virtme-init uses to setup the interface.
+readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}"
+readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}"
+readonly QEMU_OPTS="\
+ -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
+ -device virtio-net-pci,netdev=n0 \
+ -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \
+ --pidfile ${QEMU_PIDFILE} \
+"
+readonly KERNEL_CMDLINE="\
+ virtme.dhcp net.ifnames=0 biosdevname=0 \
+ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \
+"
+readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
+readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly TEST_DESCS=(
+ "Run vsock_test in server mode on the VM and in client mode on the host."
+ "Run vsock_test in client mode on the VM and in server mode on the host."
+ "Run vsock_test using the loopback transport in the VM."
+)
+
+VERBOSE=0
+
+usage() {
+ local name
+ local desc
+ local i
+
+ echo
+ echo "$0 [OPTIONS] [TEST]..."
+ echo "If no TEST argument is given, all tests will be run."
+ echo
+ echo "Options"
+ echo " -b: build the kernel from the current source tree and use it for guest VMs"
+ echo " -q: set the path to or name of qemu binary"
+ echo " -v: verbose output"
+ echo
+ echo "Available tests"
+
+ for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
+ name=${TEST_NAMES[${i}]}
+ desc=${TEST_DESCS[${i}]}
+ printf "\t%-35s%-35s\n" "${name}" "${desc}"
+ done
+ echo
+
+ exit 1
+}
+
+die() {
+ echo "$*" >&2
+ exit "${KSFT_FAIL}"
+}
+
+vm_ssh() {
+ ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
+ return $?
+}
+
+cleanup() {
+ if [[ -s "${QEMU_PIDFILE}" ]]; then
+ pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
+ fi
+
+ # If failure occurred during or before qemu start up, then we need
+ # to clean this up ourselves.
+ if [[ -e "${QEMU_PIDFILE}" ]]; then
+ rm "${QEMU_PIDFILE}"
+ fi
+}
+
+check_args() {
+ local found
+
+ for arg in "$@"; do
+ found=0
+ for name in "${TEST_NAMES[@]}"; do
+ if [[ "${name}" = "${arg}" ]]; then
+ found=1
+ break
+ fi
+ done
+
+ if [[ "${found}" -eq 0 ]]; then
+ echo "${arg} is not an available test" >&2
+ usage
+ fi
+ done
+
+ for arg in "$@"; do
+ if ! command -v > /dev/null "test_${arg}"; then
+ echo "Test ${arg} not found" >&2
+ usage
+ fi
+ done
+}
+
+check_deps() {
+ for dep in vng ${QEMU} busybox pkill ssh; do
+ if [[ ! -x $(command -v "${dep}") ]]; then
+ echo -e "skip: dependency ${dep} not found!\n"
+ exit "${KSFT_SKIP}"
+ fi
+ done
+
+ if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then
+ printf "skip: %s not found!" "${VSOCK_TEST}"
+ printf " Please build the kselftest vsock target.\n"
+ exit "${KSFT_SKIP}"
+ fi
+}
+
+check_vng() {
+ local tested_versions
+ local version
+ local ok
+
+ tested_versions=("1.33" "1.36")
+ version="$(vng --version)"
+
+ ok=0
+ for tv in "${tested_versions[@]}"; do
+ if [[ "${version}" == *"${tv}"* ]]; then
+ ok=1
+ break
+ fi
+ done
+
+ if [[ ! "${ok}" -eq 1 ]]; then
+ printf "warning: vng version '%s' has not been tested and may " "${version}" >&2
+ printf "not function properly.\n\tThe following versions have been tested: " >&2
+ echo "${tested_versions[@]}" >&2
+ fi
+}
+
+handle_build() {
+ if [[ ! "${BUILD}" -eq 1 ]]; then
+ return
+ fi
+
+ if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then
+ echo "-b requires vmtest.sh called from the kernel source tree" >&2
+ exit 1
+ fi
+
+ pushd "${KERNEL_CHECKOUT}" &>/dev/null
+
+ if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then
+ die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})"
+ fi
+
+ if ! make -j$(nproc); then
+ die "failed to build kernel from source tree (${KERNEL_CHECKOUT})"
+ fi
+
+ popd &>/dev/null
+}
+
+vm_start() {
+ local logfile=/dev/null
+ local verbose_opt=""
+ local kernel_opt=""
+ local qemu
+
+ qemu=$(command -v "${QEMU}")
+
+ if [[ "${VERBOSE}" -eq 1 ]]; then
+ verbose_opt="--verbose"
+ logfile=/dev/stdout
+ fi
+
+ if [[ "${BUILD}" -eq 1 ]]; then
+ kernel_opt="${KERNEL_CHECKOUT}"
+ fi
+
+ vng \
+ --run \
+ ${kernel_opt} \
+ ${verbose_opt} \
+ --qemu-opts="${QEMU_OPTS}" \
+ --qemu="${qemu}" \
+ --user root \
+ --append "${KERNEL_CMDLINE}" \
+ --rw &> ${logfile} &
+
+ if ! timeout ${WAIT_TOTAL} \
+ bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then
+ die "failed to boot VM"
+ fi
+}
+
+vm_wait_for_ssh() {
+ local i
+
+ i=0
+ while true; do
+ if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
+ die "Timed out waiting for guest ssh"
+ fi
+ if vm_ssh -- true; then
+ break
+ fi
+ i=$(( i + 1 ))
+ sleep ${WAIT_PERIOD}
+ done
+}
+
+# derived from selftests/net/net_helper.sh
+wait_for_listener()
+{
+ local port=$1
+ local interval=$2
+ local max_intervals=$3
+ local protocol=tcp
+ local pattern
+ local i
+
+ pattern=":$(printf "%04X" "${port}") "
+
+ # for tcp protocol additionally check the socket state
+ [ "${protocol}" = "tcp" ] && pattern="${pattern}0A"
+ for i in $(seq "${max_intervals}"); do
+ if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \
+ grep -q "${pattern}"; then
+ break
+ fi
+ sleep "${interval}"
+ done
+}
+
+vm_wait_for_listener() {
+ local port=$1
+
+ vm_ssh <<EOF
+$(declare -f wait_for_listener)
+wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+EOF
+}
+
+host_wait_for_listener() {
+ wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+}
+
+__log_stdin() {
+ cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+__log_args() {
+ echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+log() {
+ local prefix="$1"
+
+ shift
+ local redirect=
+ if [[ ${VERBOSE} -eq 0 ]]; then
+ redirect=/dev/null
+ else
+ redirect=/dev/stdout
+ fi
+
+ if [[ "$#" -eq 0 ]]; then
+ __log_stdin | tee -a "${LOG}" > ${redirect}
+ else
+ __log_args "$@" | tee -a "${LOG}" > ${redirect}
+ fi
+}
+
+log_setup() {
+ log "setup" "$@"
+}
+
+log_host() {
+ local testname=$1
+
+ shift
+ log "test:${testname}:host" "$@"
+}
+
+log_guest() {
+ local testname=$1
+
+ shift
+ log "test:${testname}:guest" "$@"
+}
+
+test_vm_server_host_client() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=server \
+ --control-port="${TEST_GUEST_PORT}" \
+ --peer-cid=2 \
+ 2>&1 | log_guest "${testname}" &
+
+ vm_wait_for_listener "${TEST_GUEST_PORT}"
+
+ ${VSOCK_TEST} \
+ --mode=client \
+ --control-host=127.0.0.1 \
+ --peer-cid="${VSOCK_CID}" \
+ --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}"
+
+ return $?
+}
+
+test_vm_client_host_server() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ ${VSOCK_TEST} \
+ --mode "server" \
+ --control-port "${TEST_HOST_PORT_LISTENER}" \
+ --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" &
+
+ host_wait_for_listener
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=client \
+ --control-host=10.0.2.2 \
+ --peer-cid=2 \
+ --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}"
+
+ return $?
+}
+
+test_vm_loopback() {
+ local testname="${FUNCNAME[0]#test_}"
+ local port=60000 # non-forwarded local port
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=server \
+ --control-port="${port}" \
+ --peer-cid=1 2>&1 | log_guest "${testname}" &
+
+ vm_wait_for_listener "${port}"
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=client \
+ --control-host="127.0.0.1" \
+ --control-port="${port}" \
+ --peer-cid=1 2>&1 | log_guest "${testname}"
+
+ return $?
+}
+
+run_test() {
+ local host_oops_cnt_before
+ local host_warn_cnt_before
+ local vm_oops_cnt_before
+ local vm_warn_cnt_before
+ local host_oops_cnt_after
+ local host_warn_cnt_after
+ local vm_oops_cnt_after
+ local vm_warn_cnt_after
+ local name
+ local rc
+
+ host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
+ host_warn_cnt_before=$(dmesg --level=warn | wc -l)
+ vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
+ vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l)
+
+ name=$(echo "${1}" | awk '{ print $1 }')
+ eval test_"${name}"
+ rc=$?
+
+ host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on host" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ host_warn_cnt_after=$(dmesg --level=warn | wc -l)
+ if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
+ echo "FAIL: kernel warning detected on host" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l)
+ if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
+ echo "FAIL: kernel warning detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ return "${rc}"
+}
+
+QEMU="qemu-system-$(uname -m)"
+
+while getopts :hvsq:b o
+do
+ case $o in
+ v) VERBOSE=1;;
+ b) BUILD=1;;
+ q) QEMU=$OPTARG;;
+ h|*) usage;;
+ esac
+done
+shift $((OPTIND-1))
+
+trap cleanup EXIT
+
+if [[ ${#} -eq 0 ]]; then
+ ARGS=("${TEST_NAMES[@]}")
+else
+ ARGS=("$@")
+fi
+
+check_args "${ARGS[@]}"
+check_deps
+check_vng
+handle_build
+
+echo "1..${#ARGS[@]}"
+
+log_setup "Booting up VM"
+vm_start
+vm_wait_for_ssh
+log_setup "VM booted up"
+
+cnt_pass=0
+cnt_fail=0
+cnt_skip=0
+cnt_total=0
+for arg in "${ARGS[@]}"; do
+ run_test "${arg}"
+ rc=$?
+ if [[ ${rc} -eq $KSFT_PASS ]]; then
+ cnt_pass=$(( cnt_pass + 1 ))
+ echo "ok ${cnt_total} ${arg}"
+ elif [[ ${rc} -eq $KSFT_SKIP ]]; then
+ cnt_skip=$(( cnt_skip + 1 ))
+ echo "ok ${cnt_total} ${arg} # SKIP"
+ elif [[ ${rc} -eq $KSFT_FAIL ]]; then
+ cnt_fail=$(( cnt_fail + 1 ))
+ echo "not ok ${cnt_total} ${arg} # exit=$rc"
+ fi
+ cnt_total=$(( cnt_total + 1 ))
+done
+
+echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
+echo "Log: ${LOG}"
+
+if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then
+ exit "$KSFT_PASS"
+else
+ exit "$KSFT_FAIL"
+fi
---
base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a
change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)gmail.com>
Improvements that build on top of the very basic `#[test]` support merged in
v6.15.
They are fairly minimal changes, but they allow us to map `assert*!`s back to
KUnit, plus to add support for test functions that return `Result`s.
In essence, they get our `#[test]`s essentially on par with the documentation
tests.
I also took the chance to convert some host `#[test]`s we had to KUnit in order
to showcase the feature.
Finally, I added documentation that was lacking from the original submission.
I hope this helps.
Miguel Ojeda (7):
rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s
rust: kunit: support checked `-> Result`s in KUnit `#[test]`s
rust: add `kunit_tests` to the prelude
rust: str: convert `rusttest` tests into KUnit
rust: str: take advantage of the `-> Result` support in KUnit
`#[test]`'s
Documentation: rust: rename `#[test]`s to "`rusttest` host tests"
Documentation: rust: testing: add docs on the new KUnit `#[test]`
tests
Documentation/rust/testing.rst | 80 ++++++++++++++++++++++++++++++++--
init/Kconfig | 3 ++
rust/Makefile | 3 +-
rust/kernel/kunit.rs | 29 ++++++++++--
rust/kernel/prelude.rs | 2 +-
rust/kernel/str.rs | 76 ++++++++++++++------------------
rust/macros/helpers.rs | 16 +++++++
rust/macros/kunit.rs | 31 ++++++++++++-
rust/macros/lib.rs | 6 ++-
9 files changed, 191 insertions(+), 55 deletions(-)
base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e
--
2.49.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v7:
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
The full patch series can be found in
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
Best regards,
Chia-Yu
Chia-Yu Chang (2):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: accecn: AccECN option failure handling
Ilpo Järvinen (13):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option ceb/cep heuristic
tcp: accecn: AccECN ACE field multi-wrap heuristic
tcp: accecn: try to fit AccECN option with SACK
tcp: try to avoid safer when ACKs are thinned
.../networking/net_cachelines/tcp_sock.rst | 14 +
include/linux/tcp.h | 34 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 221 ++++++-
include/uapi/linux/tcp.h | 7 +
net/ipv4/syncookies.c | 3 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 608 +++++++++++++++++-
net/ipv4/tcp_ipv4.c | 7 +-
net/ipv4/tcp_minisocks.c | 92 ++-
net/ipv4/tcp_output.c | 296 ++++++++-
net/ipv6/syncookies.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
14 files changed, 1237 insertions(+), 98 deletions(-)
--
2.34.1
Make test result message more descriptive and grammatically correct.
Signed-off-by: Brigham Campbell <me(a)brighamcampbell.com>
---
No changes in v3. I'm resending this patch to adjust patch format
suggestions made by Shuah.
tools/testing/selftests/x86/mov_ss_trap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/mov_ss_trap.c b/tools/testing/selftests/x86/mov_ss_trap.c
index f22cb6b382f9..d80033c0d7eb 100644
--- a/tools/testing/selftests/x86/mov_ss_trap.c
+++ b/tools/testing/selftests/x86/mov_ss_trap.c
@@ -269,6 +269,6 @@ int main()
);
}
- printf("[OK]\tI aten't dead\n");
+ printf("[OK]\tkernel handled MOV SS without crashing test\n");
return 0;
}
--
2.49.0
v2: https://lore.kernel.org/netdev/20250519023517.4062941-1-almasrymina@google.…
Changelog:
- Collect acks and tested-bys (Thanks!)
- Drop the patch that removed ksft_disruptive. That seems to not have
any relation to behavior when test fails.
- Address comments.
---
Minor cleanups to the devmem tcp code, and not-so-minor improvements to
the ksft.
For the cleanups:
- Address comment from Paolo post-merge.
- Fix whitespace.
- Add improvement dropped from Taehee's fix patch.
For the ksft:
- Add support for ipv4 environment.
- Add support for drivers that are limited to 5-tuple flow steering.
- Improve test by sending 1K data instead of just "hello\nworld"
Cc: sdf(a)fomichev.me
Cc: ap420073(a)gmail.com
Cc: praan(a)google.com
Cc: shivajikant(a)google.com
Mina Almasry (8):
net: devmem: move list_add to net_devmem_bind_dmabuf.
page_pool: fix ugly page_pool formatting
net: devmem: preserve sockc_err
net: devmem: ksft: add ipv4 support
net: devmem: ksft: add exit_wait to make rx test pass
net: devmem: ksft: add 5 tuple FS support
net: devmem: ksft: upgrade rx test to send 1K data
net: devmem: ncdevmem: remove unused variable
net/core/devmem.c | 5 +++-
net/core/devmem.h | 5 +++-
net/core/netdev-genl.c | 8 ++-----
net/core/page_pool.c | 4 ++--
net/ipv4/tcp.c | 24 ++++++++-----------
.../selftests/drivers/net/hw/devmem.py | 18 +++++++-------
.../selftests/drivers/net/hw/ncdevmem.c | 16 ++++++++++---
7 files changed, 44 insertions(+), 36 deletions(-)
base-commit: ea15e046263b19e91ffd827645ae5dfa44ebd044
--
2.49.0.1151.ga128411c76-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the DualPI2 patch v17.
This patch serise adds DualPI Improved with a Square (DualPI2) with following features:
* Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague)
* Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332
* Configurable overload strategies
* Use of sojourn time to reliably estimate queue delay
* Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues
For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332).
Best regards,
Chia-Yu
---
v17 (25-May-2025)
- Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>)
- Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>)
- Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>)
- Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>)
- Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>)
- Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>)
v16 (16-MAy-2025)
- Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>)
- Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>)
- Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>)
v15 (09-May-2025)
- Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>)
- Fix one typo in comment of #2
- Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h
v14 (05-May-2025)
- Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun
- Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>)
- Update dualpi2.json to align with the updated variable order in pkt_sched.h
- Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>)
v13 (26-Apr-2025)
- Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>)
v12 (22-Apr-2025)
- Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>)
- Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>)
- Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>)
v11 (15-Apr-2025)
- Replace hstimer_init with hstimer_setup in sch_dualpi2.c
v10 (25-Mar-2025)
- Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>)
- Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>)
v9 (16-Mar-2025)
- Fix mem_usage error in previous version
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking.
In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets.
This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0.
Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 11.55 11.70 ms 350
TCP upload avg : 18.96 N/A Mbits/s 350
TCP upload sum : 18.96 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 10.81 10.70 ms 350
TCP upload avg : 18.91 N/A Mbits/s 350
TCP upload sum : 18.91 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 12.61 12.80 ms 350
TCP upload avg : 9.48 N/A Mbits/s 350
TCP upload sum : 9.48 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.06 10.80 ms 350
TCP upload avg : 9.43 N/A Mbits/s 350
TCP upload sum : 9.43 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 40.86 37.45 ms 350
TCP upload avg : 0.88 N/A Mbits/s 350
TCP upload sum : 0.88 N/A Mbits/s 350
TCP upload::1 : 0.88 0.97 Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.07 10.40 ms 350
TCP upload avg : 0.55 N/A Mbits/s 350
TCP upload sum : 0.55 N/A Mbits/s 350
TCP upload::1 : 0.55 0.59 Mbits/s 350
v8 (11-Mar-2025)
- Fix warning messages in v7
v7 (07-Mar-2025)
- Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>)
v6 (04-Mar-2025)
- Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>)
- Update test cases in dualpi2.json
- Update commit message
v5 (22-Feb-2025)
- A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL:
Unshaped 1gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
- Summary of tcp_4down run 'MQ + FQ_PIE'
avg median # data pts
Ping (ms) ICMP : 1.21 1.37 ms 350
TCP download avg : 235.42 N/A Mbits/s 350
TCP download sum : 941.61 N/A Mbits/s 350
TCP download::1 : 232.54 233.13 Mbits/s 350
TCP download::2 : 232.52 232.80 Mbits/s 350
TCP download::3 : 233.14 233.78 Mbits/s 350
TCP download::4 : 243.41 241.48 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2'
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
Unshaped 1gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
Unshaped 10gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 0.22 0.23 ms 350
TCP download avg : 2354.08 N/A Mbits/s 350
TCP download sum : 9416.31 N/A Mbits/s 350
TCP download::1 : 2353.65 2352.81 Mbits/s 350
TCP download::2 : 2354.54 2354.21 Mbits/s 350
TCP download::3 : 2353.56 2353.78 Mbits/s 350
TCP download::4 : 2354.56 2354.45 Mbits/s 350
- Summary of tcp_4down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 0.20 0.19 ms 350
TCP download avg : 2354.76 N/A Mbits/s 350
TCP download sum : 9419.04 N/A Mbits/s 350
TCP download::1 : 2354.77 2353.89 Mbits/s 350
TCP download::2 : 2353.41 2354.29 Mbits/s 350
TCP download::3 : 2356.18 2354.19 Mbits/s 350
TCP download::4 : 2354.68 2353.15 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 0.24 0.24 ms 350
TCP download avg : 2354.11 N/A Mbits/s 350
TCP download sum : 9416.43 N/A Mbits/s 350
TCP download::1 : 2354.75 2353.93 Mbits/s 350
TCP download::2 : 2353.15 2353.75 Mbits/s 350
TCP download::3 : 2353.49 2353.72 Mbits/s 350
TCP download::4 : 2355.04 2353.73 Mbits/s 350
Unshaped 10gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 7.57 8.69 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9467.82 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 7.82 8.91 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9468.42 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 6.87 7.93 ms 350
TCP download avg : 73.95 N/A Mbits/s 350
TCP download sum : 9465.87 N/A Mbits/s 350
From the results shown above, we see small differences between combinations.
- Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>)
- Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>)
- Update license identifier (Dave Taht <dave.taht(a)gmail.com>)
- Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>)
- Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>)
- Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c
- Fix step_thresh in packets
- Update code comments in sch_dualpi2.c
v4 (22-Oct-2024)
- Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>)
- Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Fix line length warning.
v3 (19-Oct-2024)
- Fix compilaiton error
- Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Oct-2024)
- Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
- Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Fix line length warning
---
Chia-Yu Chang (4):
sched: Struct definition and parsing of dualpi2 qdisc
sched: Dump configuration and statistics of dualpi2 qdisc
selftests/tc-testing: Add selftests for qdisc DualPI2
Documentation: netlink: specs: tc: Add DualPI2 specification
Koen De Schepper (1):
sched: Add enqueue/dequeue of dualpi2 qdisc
Documentation/netlink/specs/tc.yaml | 156 +++
include/net/dropreason-core.h | 6 +
include/uapi/linux/pkt_sched.h | 68 +
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/sch_dualpi2.c | 1146 +++++++++++++++++
tools/testing/selftests/tc-testing/config | 1 +
.../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++
tools/testing/selftests/tc-testing/tdc.sh | 1 +
9 files changed, 1645 insertions(+)
create mode 100644 net/sched/sch_dualpi2.c
create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json
--
2.34.1
Fix several spelling and grammatical mistakes in output messages from
the net selftests to improve readability.
Only the message strings for the test output have been modified. No
changes to the functional logic of the tests have been made.
Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan(a)magd.ox.ac.uk>
---
.../testing/selftests/net/netfilter/conntrack_vrf.sh | 4 ++--
tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +-
tools/testing/selftests/net/rps_default_mask.sh | 12 ++++++------
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/conntrack_vrf.sh b/tools/testing/selftests/net/netfilter/conntrack_vrf.sh
index e95ecb37c2b1..806d2bfbd6e7 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_vrf.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_vrf.sh
@@ -236,9 +236,9 @@ EOF
ip netns exec "$ns1" ping -q -w 1 -c 1 "$DUMMYNET".2 > /dev/null
if ip netns exec "$ns0" nft list counter t fibcount | grep -q "packets 1"; then
- echo "PASS: fib lookup returned exepected output interface"
+ echo "PASS: fib lookup returned expected output interface"
else
- echo "FAIL: fib lookup did not return exepected output interface"
+ echo "FAIL: fib lookup did not return expected output interface"
ret=1
return
fi
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 8a0396bfaf99..b521e0dea506 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -1877,7 +1877,7 @@ class OvsPacket(GenericNetlinkSocket):
elif msg["cmd"] == OvsPacket.OVS_PACKET_CMD_EXECUTE:
up.execute(msg)
else:
- print("Unkonwn cmd: %d" % msg["cmd"])
+ print("Unknown cmd: %d" % msg["cmd"])
except NetlinkError as ne:
raise ne
diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh
index 4287a8529890..b200019b3c80 100755
--- a/tools/testing/selftests/net/rps_default_mask.sh
+++ b/tools/testing/selftests/net/rps_default_mask.sh
@@ -54,16 +54,16 @@ cleanup
echo 1 > /proc/sys/net/core/rps_default_mask
setup
-chk_rps "changing rps_default_mask dont affect existing devices" "" lo $INITIAL_RPS_DEFAULT_MASK
+chk_rps "changing rps_default_mask doesn't affect existing devices" "" lo $INITIAL_RPS_DEFAULT_MASK
echo 3 > /proc/sys/net/core/rps_default_mask
-chk_rps "changing rps_default_mask dont affect existing netns" $NETNS lo 0
+chk_rps "changing rps_default_mask doesn't affect existing netns" $NETNS lo 0
ip link add name $VETH type veth peer netns $NETNS name $VETH
ip link set dev $VETH up
ip -n $NETNS link set dev $VETH up
-chk_rps "changing rps_default_mask affect newly created devices" "" $VETH 3
-chk_rps "changing rps_default_mask don't affect newly child netns[II]" $NETNS $VETH 0
+chk_rps "changing rps_default_mask affects newly created devices" "" $VETH 3
+chk_rps "changing rps_default_mask doesn't affect newly child netns[II]" $NETNS $VETH 0
ip link del dev $VETH
ip netns del $NETNS
@@ -72,8 +72,8 @@ chk_rps "rps_default_mask is 0 by default in child netns" "$NETNS" lo 0
ip netns exec $NETNS sysctl -qw net.core.rps_default_mask=1
ip link add name $VETH type veth peer netns $NETNS name $VETH
-chk_rps "changing rps_default_mask in child ns don't affect the main one" "" lo $INITIAL_RPS_DEFAULT_MASK
+chk_rps "changing rps_default_mask in child ns doesn't affect the main one" "" lo $INITIAL_RPS_DEFAULT_MASK
chk_rps "changing rps_default_mask in child ns affects new childns devices" $NETNS $VETH 1
-chk_rps "changing rps_default_mask in child ns don't affect existing devices" $NETNS lo 0
+chk_rps "changing rps_default_mask in child ns doesn't affect existing devices" $NETNS lo 0
exit $ret
--
2.39.5
This commit introduces a new vmtest.sh runner for vsock.
It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H,
H2G, and loopback. The testing tools from tools/testing/vsock/ are
reused. Currently, only vsock_test is used.
VMCI and hyperv support is included in the config file to be built with
the -b option, though not used in the tests.
Only tested on x86.
To run:
$ make -C tools/testing/selftests TARGETS=vsock
$ tools/testing/selftests/vsock/vmtest.sh
or
$ make -C tools/testing/selftests TARGETS=vsock run_tests
Example runs (after make -C tools/testing/selftests TARGETS=vsock):
$ ./tools/testing/selftests/vsock/vmtest.sh
1..3
ok 0 vm_server_host_client
ok 1 vm_client_host_server
ok 2 vm_loopback
SUMMARY: PASS=3 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_m7DI.log
$ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback
1..1
ok 0 vm_loopback
SUMMARY: PASS=1 SKIP=0 FAIL=0
Log: /tmp/vsock_vmtest_a1IO.log
$ mkdir -p ~/scratch
$ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch
[... omitted ...]
$ cd ~/scratch
$ ./run_kselftest.sh
TAP version 13
1..1
# timeout set to 300
# selftests: vsock: vmtest.sh
# 1..3
# ok 0 vm_server_host_client
# ok 1 vm_client_host_server
# ok 2 vm_loopback
# SUMMARY: PASS=3 SKIP=0 FAIL=0
# Log: /tmp/vsock_vmtest_svEl.log
ok 1 selftests: vsock: vmtest.sh
Future work can include vsock_diag_test.
Because vsock requires a VM to test anything other than loopback, this
patch adds vmtest.sh as a kselftest itself. This is different than other
systems that have a "vmtest.sh", where it is used as a utility script to
spin up a VM to run the selftests as a guest (but isn't hooked into
kselftest).
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
---
Changes in v8:
- remove NIPA comment from commit msg
- remove tap_* functions and TAP_PREFIX
- add -b for building kernel
- Link to v7: https://lore.kernel.org/r/20250515-vsock-vmtest-v7-1-ba6fa86d6c2c@gmail.com
Changes in v7:
- fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS
- updated commit message with updated output
- updated commit message with commands for installing/running as
kselftest
- Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023900@gmail.com
Changes in v6:
- add make cmd in commit message in vmtest.sh example (Stefano)
- check nonzero size of QEMU_PIDFILE using -s conditional (Stefano)
- display log file path after tests so it is easier to find amongst other random names
- cleanup qemu pidfile if qemu is unable to remove it
- make oops/warning failures more obvious with 'FAIL' prefix in log
(simply saying 'detected' wasn't clear enough to identify failing
condition)
- Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45ceb@gmail.com
Changes in v5:
- make log file a tmpfile (Paolo)
- make sure both default and user defined QEMU gets handled by the dependency check (Paolo)
- increased VM boot up timeout from 1m to 3m for slow hosts (Paolo)
- rename vm_setup -> vm_start (Paolo)
- derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage
- Remove unused 'unset IFS' line (Paolo)
- leave space after variable declarations (Paolo)
- make QEMU_PIDFILE a tmp file (Paolo)
- make everything readonly that is only read (Paolo)
- source ktap_helpers.sh for KSFT_PASS and friends (Paolo)
- don't check for timeout util (Paolo)
- add missing usage string for -q qemu arg
- add tap prefix to SUMMARY line since it isn't part of TAP protocol
- exit with the correct status code based on failure/pass counts
- Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262cd6@gmail.com
Changes in v4:
- do not use special tab delimiter for help string parsing (Stefano + Paolo)
- fix paths for when installing kselftest and running out-of-tree (Paolo)
- change vng to using running kernel instead of compiled kernel (Paolo)
- use multi-line string for QEMU_OPTS (Stefano)
- change timeout to 300s (Paolo)
- skip if tools are not found and use kselftests status codes (Paolo)
- remove build from vmtest.sh (Paolo)
- change 2222 -> SSH_HOST_PORT (Stefano)
- add tap-format output
- add vmtest.log to gitignore
- check for vsock_test binary and remind user to build it if missing
- create a proper build in makefile
- style fixes
- add ssh, timeout, and pkill to dependency check, just in case
- fix numerical comparison in conditionals
- check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh)
- fix tracking of pass/fail bug
- fix stderr redirection bug
- Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163f3e@gmail.com
Changes in v3:
- use common conditional syntax for checking variables
- use return value instead of global rc
- fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER
- use SIGTERM instead of SIGKILL on cleanup
- use peer-cid=1 for loopback
- change sleep delay times into globals
- fix test_vm_loopback logging
- add test selection in arguments
- make QEMU an argument
- check that vng binary is on path
- use QEMU variable
- change <tab><backslash> to <space><backslash>
- fix hardcoded file paths
- add comment in commit msg about script that vmtest.sh was based off of
- Add tools/testing/selftest/vsock/Makefile for kselftest
- Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a27331e8@gmail.com
Changes in v2:
- add kernel oops and warnings checker
- change testname variable to use FUNCNAME
- fix spacing in test_vm_server_host_client
- add -s skip build option to vmtest.sh
- add test_vm_loopback
- pass port to vm_wait_for_listener
- fix indentation in vmtest.sh
- add vmci and hyperv to config
- changed whitespace from tabs to spaces in help string
- Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com
---
MAINTAINERS | 1 +
tools/testing/selftests/vsock/.gitignore | 2 +
tools/testing/selftests/vsock/Makefile | 16 ++
tools/testing/selftests/vsock/config | 114 ++++++++
tools/testing/selftests/vsock/settings | 1 +
tools/testing/selftests/vsock/vmtest.sh | 460 +++++++++++++++++++++++++++++++
6 files changed, 594 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 657a67f9031ef7798c19ac63e6383d4cb18a9e1f..3fbdd7bbfce7196a3cc7db70203317c6bd0e51fd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -25751,6 +25751,7 @@ F: include/uapi/linux/vm_sockets.h
F: include/uapi/linux/vm_sockets_diag.h
F: include/uapi/linux/vsockmon.h
F: net/vmw_vsock/
+F: tools/testing/selftests/vsock/
F: tools/testing/vsock/
VMALLOC
diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..9c5bf379480f829a14713d5f5dc7d525bc272e84
--- /dev/null
+++ b/tools/testing/selftests/vsock/.gitignore
@@ -0,0 +1,2 @@
+vmtest.log
+vsock_test
diff --git a/tools/testing/selftests/vsock/Makefile b/tools/testing/selftests/vsock/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..7ab4970e5e8a019be33f96a36f95c00573d7bfcf
--- /dev/null
+++ b/tools/testing/selftests/vsock/Makefile
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CURDIR := $(abspath .)
+TOOLSDIR := $(abspath ../../..)
+
+$(OUTPUT)/vsock_test: $(TOOLSDIR)/testing/vsock/vsock_test
+ install -m 755 $< $@
+
+$(TOOLSDIR)/testing/vsock/vsock_test:
+ $(MAKE) -C $(TOOLSDIR)/testing/vsock vsock_test
+
+TEST_PROGS += vmtest.sh
+TEST_GEN_FILES := vsock_test
+
+include ../lib.mk
+
diff --git a/tools/testing/selftests/vsock/config b/tools/testing/selftests/vsock/config
new file mode 100644
index 0000000000000000000000000000000000000000..3bffaaf98fff92dc0e3bc1286afa3d8d5d52f4c7
--- /dev/null
+++ b/tools/testing/selftests/vsock/config
@@ -0,0 +1,114 @@
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_BPF=y
+CONFIG_BPF_SYSCALL=y
+CONFIG_BPF_JIT=y
+CONFIG_HAVE_EBPF_JIT=y
+CONFIG_BPF_EVENTS=y
+CONFIG_FTRACE_SYSCALLS=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_HAVE_DYNAMIC_FTRACE=y
+CONFIG_DYNAMIC_FTRACE=y
+CONFIG_HAVE_KPROBES=y
+CONFIG_KPROBES=y
+CONFIG_KPROBE_EVENTS=y
+CONFIG_ARCH_SUPPORTS_UPROBES=y
+CONFIG_UPROBES=y
+CONFIG_UPROBE_EVENTS=y
+CONFIG_DEBUG_FS=y
+CONFIG_FW_CFG_SYSFS=y
+CONFIG_FW_CFG_SYSFS_CMDLINE=y
+CONFIG_DRM=y
+CONFIG_DRM_VIRTIO_GPU=y
+CONFIG_DRM_VIRTIO_GPU_KMS=y
+CONFIG_DRM_BOCHS=y
+CONFIG_VIRTIO_IOMMU=y
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_SND_SEQUENCER=y
+CONFIG_SND_PCI=y
+CONFIG_SND_INTEL8X0=y
+CONFIG_SND_HDA_CODEC_REALTEK=y
+CONFIG_SECURITYFS=y
+CONFIG_CGROUP_BPF=y
+CONFIG_SQUASHFS=y
+CONFIG_SQUASHFS_XZ=y
+CONFIG_SQUASHFS_ZSTD=y
+CONFIG_FUSE_FS=y
+CONFIG_SERIO=y
+CONFIG_PCI=y
+CONFIG_INPUT=y
+CONFIG_INPUT_KEYBOARD=y
+CONFIG_KEYBOARD_ATKBD=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_X86_VERBOSE_BOOTUP=y
+CONFIG_VGA_CONSOLE=y
+CONFIG_FB=y
+CONFIG_FB_VESA=y
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_HCTOSYS=y
+CONFIG_RTC_DRV_CMOS=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+CONFIG_KVM_GUEST=y
+CONFIG_KVM=y
+CONFIG_KVM_INTEL=y
+CONFIG_KVM_AMD=y
+CONFIG_VSOCKETS=y
+CONFIG_VSOCKETS_DIAG=y
+CONFIG_VSOCKETS_LOOPBACK=y
+CONFIG_VMWARE_VMCI_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS_COMMON=y
+CONFIG_HYPERV_VSOCKETS=y
+CONFIG_VMWARE_VMCI=y
+CONFIG_VHOST_VSOCK=y
+CONFIG_HYPERV=y
+CONFIG_UEVENT_HELPER=n
+CONFIG_VIRTIO=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_NET=y
+CONFIG_NET_CORE=y
+CONFIG_NETDEVICES=y
+CONFIG_NETWORK_FILESYSTEMS=y
+CONFIG_INET=y
+CONFIG_NET_9P=y
+CONFIG_NET_9P_VIRTIO=y
+CONFIG_9P_FS=y
+CONFIG_VIRTIO_NET=y
+CONFIG_CMDLINE_OVERRIDE=n
+CONFIG_BINFMT_SCRIPT=y
+CONFIG_SHMEM=y
+CONFIG_TMPFS=y
+CONFIG_UNIX=y
+CONFIG_MODULE_SIG_FORCE=n
+CONFIG_DEVTMPFS=y
+CONFIG_TTY=y
+CONFIG_VT=y
+CONFIG_UNIX98_PTYS=y
+CONFIG_EARLY_PRINTK=y
+CONFIG_INOTIFY_USER=y
+CONFIG_BLOCK=y
+CONFIG_SCSI_LOWLEVEL=y
+CONFIG_SCSI=y
+CONFIG_SCSI_VIRTIO=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_WATCHDOG=y
+CONFIG_WATCHDOG_CORE=y
+CONFIG_I6300ESB_WDT=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
+CONFIG_OVERLAY_FS=y
+CONFIG_DAX=y
+CONFIG_DAX_DRIVER=y
+CONFIG_FS_DAX=y
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
+CONFIG_ZONE_DEVICE=y
+CONFIG_FUSE_FS=y
+CONFIG_VIRTIO_FS=y
+CONFIG_VSOCKETS=y
+CONFIG_VIRTIO_VSOCKETS=y
diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings
new file mode 100644
index 0000000000000000000000000000000000000000..694d70710ff08ac9fc91e9ecb5dbdadcf051f019
--- /dev/null
+++ b/tools/testing/selftests/vsock/settings
@@ -0,0 +1 @@
+timeout=300
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
new file mode 100755
index 0000000000000000000000000000000000000000..d083c444dd8e3a5893f7795ae5e5d90fdede6325
--- /dev/null
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -0,0 +1,460 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2025 Meta Platforms, Inc. and affiliates
+#
+# Dependencies:
+# * virtme-ng
+# * busybox-static (used by virtme-ng)
+# * qemu (used by virtme-ng)
+
+readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
+readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
+
+source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh
+
+readonly VSOCK_TEST="${SCRIPT_DIR}"/vsock_test
+readonly TEST_GUEST_PORT=51000
+readonly TEST_HOST_PORT=50000
+readonly TEST_HOST_PORT_LISTENER=50001
+readonly SSH_GUEST_PORT=22
+readonly SSH_HOST_PORT=2222
+readonly VSOCK_CID=1234
+readonly WAIT_PERIOD=3
+readonly WAIT_PERIOD_MAX=60
+readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
+readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid)
+
+# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a
+# control port forwarded for vsock_test. Because virtme-ng doesn't support
+# adding an additional port to forward to the device created from "--ssh" and
+# virtme-init mistakenly sets identical IPs to the ssh device and additional
+# devices, we instead opt out of using --ssh, add the device manually, and also
+# add the kernel cmdline options that virtme-init uses to setup the interface.
+readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}"
+readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}"
+readonly QEMU_OPTS="\
+ -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
+ -device virtio-net-pci,netdev=n0 \
+ -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \
+ --pidfile ${QEMU_PIDFILE} \
+"
+readonly KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh virtme_ssh_user=$USER"
+readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
+readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly TEST_DESCS=(
+ "Run vsock_test in server mode on the VM and in client mode on the host."
+ "Run vsock_test in client mode on the VM and in server mode on the host."
+ "Run vsock_test using the loopback transport in the VM."
+)
+
+VERBOSE=0
+
+usage() {
+ local name
+ local desc
+ local i
+
+ echo
+ echo "$0 [OPTIONS] [TEST]..."
+ echo "If no TEST argument is given, all tests will be run."
+ echo
+ echo "Options"
+ echo " -b: build the kernel from the current source tree and use it for guest VMs"
+ echo " -q: set the path to or name of qemu binary"
+ echo " -v: verbose output"
+ echo
+ echo "Available tests"
+
+ for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
+ name=${TEST_NAMES[${i}]}
+ desc=${TEST_DESCS[${i}]}
+ printf "\t%-35s%-35s\n" "${name}" "${desc}"
+ done
+ echo
+
+ exit 1
+}
+
+die() {
+ echo "$*" >&2
+ exit "${KSFT_FAIL}"
+}
+
+vm_ssh() {
+ ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
+ return $?
+}
+
+cleanup() {
+ if [[ -s "${QEMU_PIDFILE}" ]]; then
+ pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
+ fi
+
+ # If failure occurred during or before qemu start up, then we need
+ # to clean this up ourselves.
+ if [[ -e "${QEMU_PIDFILE}" ]]; then
+ rm "${QEMU_PIDFILE}"
+ fi
+}
+
+check_args() {
+ local found
+
+ for arg in "$@"; do
+ found=0
+ for name in "${TEST_NAMES[@]}"; do
+ if [[ "${name}" = "${arg}" ]]; then
+ found=1
+ break
+ fi
+ done
+
+ if [[ "${found}" -eq 0 ]]; then
+ echo "${arg} is not an available test" >&2
+ usage
+ fi
+ done
+
+ for arg in "$@"; do
+ if ! command -v > /dev/null "test_${arg}"; then
+ echo "Test ${arg} not found" >&2
+ usage
+ fi
+ done
+}
+
+check_deps() {
+ for dep in vng ${QEMU} busybox pkill ssh; do
+ if [[ ! -x $(command -v "${dep}") ]]; then
+ echo -e "skip: dependency ${dep} not found!\n"
+ exit "${KSFT_SKIP}"
+ fi
+ done
+
+ if [[ ! -x $(command -v "${VSOCK_TEST}") ]]; then
+ printf "skip: %s not found!" "${VSOCK_TEST}"
+ printf " Please build the kselftest vsock target.\n"
+ exit "${KSFT_SKIP}"
+ fi
+}
+
+handle_build() {
+ if [[ ! "${BUILD}" -eq 1 ]]; then
+ return
+ fi
+
+ if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then
+ echo "-b requires vmtest.sh called from the kernel source tree" >&2
+ exit 1
+ fi
+
+ pushd "${KERNEL_CHECKOUT}" &>/dev/null
+
+ if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then
+ die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})"
+ fi
+
+ if ! make -j$(nproc); then
+ die "failed to build kernel from source tree (${KERNEL_CHECKOUT})"
+ fi
+
+ popd &>/dev/null
+}
+
+vm_start() {
+ local logfile=/dev/null
+ local verbose_opt=""
+ local kernel_opt=""
+ local qemu
+
+ qemu=$(command -v "${QEMU}")
+
+ if [[ "${VERBOSE}" -eq 1 ]]; then
+ verbose_opt="--verbose"
+ logfile=/dev/stdout
+ fi
+
+ if [[ "${BUILD}" -eq 1 ]]; then
+ kernel_opt="${KERNEL_CHECKOUT}"
+ fi
+
+ vng \
+ --run \
+ ${kernel_opt} \
+ ${verbose_opt} \
+ --qemu-opts="${QEMU_OPTS}" \
+ --qemu="${qemu}" \
+ --user root \
+ --append "${KERNEL_CMDLINE}" \
+ --rw &> ${logfile} &
+
+ if ! timeout ${WAIT_TOTAL} \
+ bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then
+ die "failed to boot VM"
+ fi
+}
+
+vm_wait_for_ssh() {
+ local i
+
+ i=0
+ while true; do
+ if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
+ die "Timed out waiting for guest ssh"
+ fi
+ if vm_ssh -- true; then
+ break
+ fi
+ i=$(( i + 1 ))
+ sleep ${WAIT_PERIOD}
+ done
+}
+
+# derived from selftests/net/net_helper.sh
+wait_for_listener()
+{
+ local port=$1
+ local interval=$2
+ local max_intervals=$3
+ local protocol=tcp
+ local pattern
+ local i
+
+ pattern=":$(printf "%04X" "${port}") "
+
+ # for tcp protocol additionally check the socket state
+ [ "${protocol}" = "tcp" ] && pattern="${pattern}0A"
+ for i in $(seq "${max_intervals}"); do
+ if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \
+ grep -q "${pattern}"; then
+ break
+ fi
+ sleep "${interval}"
+ done
+}
+
+vm_wait_for_listener() {
+ local port=$1
+
+ vm_ssh <<EOF
+$(declare -f wait_for_listener)
+wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+EOF
+}
+
+host_wait_for_listener() {
+ wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+}
+
+__log_stdin() {
+ cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+__log_args() {
+ echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+log() {
+ local prefix="$1"
+
+ shift
+ local redirect=
+ if [[ ${VERBOSE} -eq 0 ]]; then
+ redirect=/dev/null
+ else
+ redirect=/dev/stdout
+ fi
+
+ if [[ "$#" -eq 0 ]]; then
+ __log_stdin | tee -a "${LOG}" > ${redirect}
+ else
+ __log_args "$@" | tee -a "${LOG}" > ${redirect}
+ fi
+}
+
+log_setup() {
+ log "setup" "$@"
+}
+
+log_host() {
+ local testname=$1
+
+ shift
+ log "test:${testname}:host" "$@"
+}
+
+log_guest() {
+ local testname=$1
+
+ shift
+ log "test:${testname}:guest" "$@"
+}
+
+test_vm_server_host_client() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=server \
+ --control-port="${TEST_GUEST_PORT}" \
+ --peer-cid=2 \
+ 2>&1 | log_guest "${testname}" &
+
+ vm_wait_for_listener "${TEST_GUEST_PORT}"
+
+ ${VSOCK_TEST} \
+ --mode=client \
+ --control-host=127.0.0.1 \
+ --peer-cid="${VSOCK_CID}" \
+ --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}"
+
+ return $?
+}
+
+test_vm_client_host_server() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ ${VSOCK_TEST} \
+ --mode "server" \
+ --control-port "${TEST_HOST_PORT_LISTENER}" \
+ --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" &
+
+ host_wait_for_listener
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=client \
+ --control-host=10.0.2.2 \
+ --peer-cid=2 \
+ --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}"
+
+ return $?
+}
+
+test_vm_loopback() {
+ local testname="${FUNCNAME[0]#test_}"
+ local port=60000 # non-forwarded local port
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=server \
+ --control-port="${port}" \
+ --peer-cid=1 2>&1 | log_guest "${testname}" &
+
+ vm_wait_for_listener "${port}"
+
+ vm_ssh -- "${VSOCK_TEST}" \
+ --mode=client \
+ --control-host="127.0.0.1" \
+ --control-port="${port}" \
+ --peer-cid=1 2>&1 | log_guest "${testname}"
+
+ return $?
+}
+
+run_test() {
+ local host_oops_cnt_before
+ local host_warn_cnt_before
+ local vm_oops_cnt_before
+ local vm_warn_cnt_before
+ local host_oops_cnt_after
+ local host_warn_cnt_after
+ local vm_oops_cnt_after
+ local vm_warn_cnt_after
+ local name
+ local rc
+
+ host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
+ host_warn_cnt_before=$(dmesg --level=warn | wc -l)
+ vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
+ vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l)
+
+ name=$(echo "${1}" | awk '{ print $1 }')
+ eval test_"${name}"
+ rc=$?
+
+ host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on host" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ host_warn_cnt_after=$(dmesg --level=warn | wc -l)
+ if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then
+ echo "FAIL: kernel warning detected on host" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l)
+ if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
+ echo "FAIL: kernel warning detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ return "${rc}"
+}
+
+QEMU="qemu-system-$(uname -m)"
+
+while getopts :hvsq:b o
+do
+ case $o in
+ v) VERBOSE=1;;
+ b) BUILD=1;;
+ q) QEMU=$OPTARG;;
+ h|*) usage;;
+ esac
+done
+shift $((OPTIND-1))
+
+trap cleanup EXIT
+
+if [[ ${#} -eq 0 ]]; then
+ ARGS=("${TEST_NAMES[@]}")
+else
+ ARGS=("$@")
+fi
+
+check_args "${ARGS[@]}"
+check_deps
+handle_build
+
+echo "1..${#ARGS[@]}"
+
+log_setup "Booting up VM"
+vm_start
+vm_wait_for_ssh
+log_setup "VM booted up"
+
+cnt_pass=0
+cnt_fail=0
+cnt_skip=0
+cnt_total=0
+for arg in "${ARGS[@]}"; do
+ run_test "${arg}"
+ rc=$?
+ if [[ ${rc} -eq $KSFT_PASS ]]; then
+ cnt_pass=$(( cnt_pass + 1 ))
+ echo "ok ${cnt_total} ${arg}"
+ elif [[ ${rc} -eq $KSFT_SKIP ]]; then
+ cnt_skip=$(( cnt_skip + 1 ))
+ echo "ok ${cnt_total} ${arg} # SKIP"
+ elif [[ ${rc} -eq $KSFT_FAIL ]]; then
+ cnt_fail=$(( cnt_fail + 1 ))
+ echo "not ok ${cnt_total} ${arg} # exit=$rc"
+ fi
+ cnt_total=$(( cnt_total + 1 ))
+done
+
+echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
+echo "Log: ${LOG}"
+
+if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then
+ exit "$KSFT_PASS"
+else
+ exit "$KSFT_FAIL"
+fi
---
base-commit: 8066e388be48f1ad62b0449dc1d31a25489fa12a
change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)gmail.com>
Hi,
This patch series by Carolina is V10 of the feature that adds rate
management support on traffic classes in devlink and mlx5, see full
description by Carolina below [0].
V10:
- Added netdevsim selftest for tc-bw ops.
- Dropped header: field as it’s unnecessary for local constants in
devlink.yaml.
Regards,
Tariq
[0]
This patch series extends the devlink-rate API to support traffic class
(TC) bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.
Additionally the series refines the QoS handling in net/mlx5 to support
TC arbitration and bandwidth management on vports and rate nodes.
Discussions on traffic class shaping in net-shapers began in V5 [1],
where we discussed with maintainers whether net-shapers should support
traffic classes and how this could be implemented.
Later, after further conversations with Paolo Abeni and Simon Horman,
Cosmin provided an update [2], confirming that net-shapers' tree-based
hierarchy aligns well with traffic classes when treated as distinct
subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
queues and traffic classes, this approach seems feasible, though some
open questions remain regarding queue reconfiguration and certain mlx5
scheduling behaviors.
Building on that discussion, Cosmin has now shared a concrete
implementation plan on the netdev mailing list [3]. The plan, developed
in collaboration with Paolo and Simon, outlines how net-shapers can be
extended to support the same use cases currently covered by
devlink-rate, with the eventual goal of aligning both and simplifying
the shaping infrastructure in the kernel.
This work was presented at Netdev 0x19 in Zagreb [4].
There we presented how TC scheduling is enforced in mlx5 hardware,
which led to discussions on the mailing list.
A summary of how things work:
Classification means labeling a packet with a traffic class based on the
packet's DSCP or VLAN PCP field, then treating packets with different
traffic classes differently during transmit processing.
In a virtualized setup, VFs are untrusted and do not control
classification or shaping. Classification is done by the hardware using
a prio-to-TC mapping set by the hypervisor. VFs only select which send
queue to use and are expected to respect the classification logic by
sending each traffic class on its dedicated queue. As stated in the
net-shapers plan [3], each transmit queue should carry only a single
traffic class. Mixing classes in a single queue can lead to HOL
blocking.
In the mlx5 implementation, if the queue used does not match the
classified traffic class, the hardware moves the queue to the correct TC
scheduler. This movement is not a reclassification; it’s a necessary
enforcement step to ensure traffic class isolation is maintained.
Extend devlink-rate API to support rate management on TCs:
- devlink: Extend the devlink rate API to support traffic class
bandwidth management
Introduce a no-op implementation:
- net/mlx5: Add no-op implementation for setting tc-bw on rate objects
Add support for enabling and disabling TC QoS on vports and nodes:
- net/mlx5: Add support for setting tc-bw on nodes
- net/mlx5: Add traffic class scheduling support for vport QoS
Support for setting tc-bw on rate objects:
- net/mlx5: Manage TC arbiter nodes and implement full support for
tc-bw
[1]
https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/
[2]
https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.cam…
[3]
https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.cam…
[4]
https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-…
Carolina Jubran (6):
devlink: Extend devlink rate API with traffic classes bandwidth
management
selftest: netdevsim: Add devlink rate tc-bw test
net/mlx5: Add no-op implementation for setting tc-bw on rate objects
net/mlx5: Add support for setting tc-bw on nodes
net/mlx5: Add traffic class scheduling support for vport QoS
net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw
Documentation/netlink/specs/devlink.yaml | 35 +-
.../networking/devlink/devlink-port.rst | 7 +
.../net/ethernet/mellanox/mlx5/core/devlink.c | 2 +
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1007 ++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/esw/qos.h | 8 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 14 +-
drivers/net/netdevsim/dev.c | 43 +
drivers/net/netdevsim/netdevsim.h | 1 +
include/net/devlink.h | 6 +
include/uapi/linux/devlink.h | 9 +
net/devlink/netlink_gen.c | 15 +-
net/devlink/netlink_gen.h | 1 +
net/devlink/rate.c | 127 +++
.../drivers/net/netdevsim/devlink.sh | 51 +
14 files changed, 1289 insertions(+), 37 deletions(-)
base-commit: f685204c57e87d2a88b159c7525426d70ee745c9
--
2.31.1
Use `kernel::ffi::c_void` instead of `core::ffi::c_void` for consistency
and to centralize abstraction.
Since `kernel::ffi::c_void` is a transparent wrapper around
`core::ffi::c_void`, both are functionally equivalent. However, using
`kernel::ffi::c_void` improves consistency across the kernel's Rust code
and provides a unified reference point in case the definition ever needs
to change, even if such a change is unlikely.
Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com>
---
rust/kernel/kunit.rs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 81833a687b75..bd6fc712dd79 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -6,7 +6,8 @@
//!
//! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html>
-use core::{ffi::c_void, fmt};
+use core::fmt;
+use kernel::ffi::c_void;
/// Prints a KUnit error-level message.
///
--
2.39.5
Hello,
this is the v2 of the many args series for arm64, being itself a revival
of Xu Kuhoai's work to enable larger arguments count for BPF programs on
ARM64 ([1]).
The discussions in v1 shed some light on some issues around specific
cases, for example with functions passing struct on stack with custom
packing/alignment attributes: those cases can not be properly detected
with the current BTF info. So this new revision aims to separate
concerns with a simpler implementation, just accepting additional args
on stack if we can make sure about the alignment constraints (and so,
refusing attachment to functions passing structs on stacks). I then
checked if the specific alignment constraints could be checked with
larger scalar types rather than structs, but it appears that this use
case is in fact rejected at the verifier level (see a9b59159d338 ("bpf:
Do not allow btf_ctx_access with __int128 types")). So in the end the
specific alignment corner cases raised in [1] can not really happen in
the kernel in its current state. This new revision still brings support
for the standard cases as a first step, it will then be possible to
iterate on top of it to add the more specific cases like struct passed
on stack and larger types.
[1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com…
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Changes in v3:
- switch back -EOPNOTSUPP to -ENOTSUPP
- fix comment style
- group intializations for arg_aux
- remove some unneeded round_up
- Link to v2: https://lore.kernel.org/r/20250522-many_args_arm64-v2-0-d6afdb9cf819@bootli…
Changes in v2:
- remove alignment computation from btf.c
- deduce alignment constraints directly in jit compiler for simple types
- deny attachment to functions with "corner-cases" arguments (ie:
structs on stack)
- remove custom tests, as the corresponding use cases are locked either
by the JIT comp or the verifier
- drop RFC
- Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootli…
---
Alexis Lothoré (eBPF Foundation) (1):
selftests/bpf: enable many-args tests for arm64
Xu Kuohai (1):
bpf, arm64: Support up to 12 function arguments
arch/arm64/net/bpf_jit_comp.c | 225 ++++++++++++++++++++-------
tools/testing/selftests/bpf/DENYLIST.aarch64 | 2 -
2 files changed, 171 insertions(+), 56 deletions(-)
---
base-commit: 9435138c069117cd59a4912b5ea2ae44cc2c5ffa
change-id: 20250220-many_args_arm64-8bd3747e6948
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to
perform per-buffer accounting with debugfs which is not suitable for
production environments. Eventually we discovered the overhead with
per-buffer sysfs file creation/removal was significantly impacting
allocation and free times, and exacerbated kernfs lock contention. [2]
dma_buf_stats_setup() is responsible for 39% of single-page buffer
creation duration, or 74% of single-page dma_buf_export() duration when
stressing dmabuf allocations and frees.
I prototyped a change from per-buffer to per-exporter statistics with a
RCU protected list of exporter allocations that accommodates most (but
not all) of our use-cases and avoids almost all of the sysfs overhead.
While that adds less overhead than per-buffer sysfs, and less even than
the maintenance of the dmabuf debugfs_list, it's still *additional*
overhead on top of the debugfs_list and doesn't give us per-buffer info.
This series uses the existing dmabuf debugfs_list to implement a BPF
dmabuf iterator, which adds no overhead to buffer allocation/free and
provides per-buffer info. The list has been moved outside of
CONFIG_DEBUG_FS scope so that it is always populated. The BPF program
loaded by userspace that extracts per-buffer information gets to define
its own interface which avoids the lack of ABI stability with debugfs.
This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and
the plan is to remove it from the kernel after the next longterm stable
release.
[1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.…
[2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com
v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com
v1 -> v2:
Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian
König
Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel
test robot
Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu
Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in
selftest per Song Liu
Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei
Starovoitov
Add open-coded iterator and selftest per Alexei Starovoitov
Add a second test buffer from the system dmabuf heap to selftests
Use the BPF program we'll use in production for selftest per Alexei
Starovoitov
https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.chttps://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdm…
v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com
v2 -> v3:
Rebase onto bpf-next/master
Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the
new get_first_dmabuf(). This avoids having to expose the dmabuf list
and mutex to the rest of the kernel, and keeps the dmabuf mutex
operations near each other in the same file. (Christian König)
Add Christian's RB to dma-buf: Rename debugfs symbols
Drop RFC: dma-buf: Remove DMA-BUF statistics
v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com
v3 -> v4:
Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov
Fix dma-buf.c kdoc comment style per Alexei Starovoitov
Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin /
dma_buf_iter_next per Christian König
Add Christian's RB to bpf: Add dmabuf iterator
v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com
v4 -> v5:
Add Christian's Acks to all patches
Add Song Liu's Acks
Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per
Song Liu
Fix open-coded iterator comment style per Song Liu
Move iterator termination check to its own subtest per Song Liu
Rework selftest buffer creation per Song Liu
Fix spacing in sanitize_string per BPF CI
v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com
v5 -> v6:
Song Liu:
Init test buffer FDs to -1
Zero-init udmabuf_create for future proofing
Bail early for iterator fd/FILE creation failure
Dereference char ptr to check for NUL in sanitize_string()
Move map insertion from create_test_buffers() to test_dmabuf_iter()
Add ACK to selftests/bpf: Add test for open coded dmabuf_iter
v6: https://lore.kernel.org/all/20250513163601.812317-1-tjmercier@google.com
v6 -> v7:
Zero uninitialized name bytes following the end of name strings per
s390x BPF CI
Reorder sanitize_string bounds checks per Song Liu
Add Song's Ack to: selftests/bpf: Add test for dmabuf_iter
Rebase onto bpf-next/master per BPF CI
T.J. Mercier (5):
dma-buf: Rename debugfs symbols
bpf: Add dmabuf iterator
bpf: Add open coded dmabuf iterator
selftests/bpf: Add test for dmabuf_iter
selftests/bpf: Add test for open coded dmabuf_iter
drivers/dma-buf/dma-buf.c | 98 ++++--
include/linux/dma-buf.h | 4 +-
kernel/bpf/Makefile | 3 +
kernel/bpf/dmabuf_iter.c | 150 +++++++++
kernel/bpf/helpers.c | 5 +
.../testing/selftests/bpf/bpf_experimental.h | 5 +
tools/testing/selftests/bpf/config | 3 +
.../selftests/bpf/prog_tests/dmabuf_iter.c | 285 ++++++++++++++++++
.../testing/selftests/bpf/progs/dmabuf_iter.c | 101 +++++++
9 files changed, 632 insertions(+), 22 deletions(-)
create mode 100644 kernel/bpf/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c
base-commit: 6888a036cfc3d617d0843ecc9bd8504e91fb9de6
--
2.49.0.1151.ga128411c76-goog
Currently testing of userspace and in-kernel API use two different
frameworks. kselftests for the userspace ones and Kunit for the
in-kernel ones. Besides their different scopes, both have different
strengths and limitations:
Kunit:
* Tests are normal kernel code.
* They use the regular kernel toolchain.
* They can be packaged and distributed as modules conveniently.
Kselftests:
* Tests are normal userspace code
* They need a userspace toolchain.
A kernel cross toolchain is likely not enough.
* A fair amout of userland is required to run the tests,
which means a full distro or handcrafted rootfs.
* There is no way to conveniently package and run kselftests with a
given kernel image.
* The kselftests makefiles are not as powerful as regular kbuild.
For example they are missing proper header dependency tracking or more
complex compiler option modifications.
Therefore kunit is much easier to run against different kernel
configurations and architectures.
This series aims to combine kselftests and kunit, avoiding both their
limitations. It works by compiling the userspace kselftests as part of
the regular kernel build, embedding them into the kunit kernel or module
and executing them from there. If the kernel toolchain is not fit to
produce userspace because of a missing libc, the kernel's own nolibc can
be used instead.
The structured TAP output from the kselftest is integrated into the
kunit KTAP output transparently, the kunit parser can parse the combined
logs together.
Further room for improvements:
* Call each test in its completely dedicated namespace
* Handle additional test files besides the test executable through
archives. CPIO, cramfs, etc.
* Compatibility with kselftest_harness.h (in progress)
* Expose the blobs in debugfs
* Provide some convience wrappers around compat userprogs
* Figure out a migration path/coexistence solution for
kunit UAPI and tools/testing/selftests/
Output from the kunit example testcase, note the output of
"example_uapi_tests".
$ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit example
...
Running tests with:
$ .kunit/linux kunit.filter_glob=example kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:53:53] ================== example (10 subtests) ===================
[11:53:53] [PASSED] example_simple_test
[11:53:53] [SKIPPED] example_skip_test
[11:53:53] [SKIPPED] example_mark_skipped_test
[11:53:53] [PASSED] example_all_expect_macros_test
[11:53:53] [PASSED] example_static_stub_test
[11:53:53] [PASSED] example_static_stub_using_fn_ptr_test
[11:53:53] [PASSED] example_priv_test
[11:53:53] =================== example_params_test ===================
[11:53:53] [SKIPPED] example value 3
[11:53:53] [PASSED] example value 2
[11:53:53] [PASSED] example value 1
[11:53:53] [SKIPPED] example value 0
[11:53:53] =============== [PASSED] example_params_test ===============
[11:53:53] [PASSED] example_slow_test
[11:53:53] ======================= (4 subtests) =======================
[11:53:53] [PASSED] procfs
[11:53:53] [PASSED] userspace test 2
[11:53:53] [SKIPPED] userspace test 3: some reason
[11:53:53] [PASSED] userspace test 4
[11:53:53] ================ [PASSED] example_uapi_test ================
[11:53:53] ===================== [PASSED] example =====================
[11:53:53] ============================================================
[11:53:53] Testing complete. Ran 16 tests: passed: 11, skipped: 5
[11:53:53] Elapsed time: 67.543s total, 1.823s configuring, 65.655s building, 0.058s running
Based on v6.15-rc1.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Rebase onto v6.15-rc1
- Add documentation and kernel docs
- Resolve invalid kconfig breakages
- Drop already applied patch "kbuild: implement CONFIG_HEADERS_INSTALL for Usermode Linux"
- Drop userprogs CONFIG_WERROR integration, it doesn't need to be part of this series
- Replace patch prefix "kconfig" with "kbuild"
- Rename kunit_uapi_run_executable() to kunit_uapi_run_kselftest()
- Generate private, conflict-free symbols in the blob framework
- Handle kselftest exit codes
- Handle SIGABRT
- Forward output also to kunit debugfs log
- Install a fd=0 stdin filedescriptor
- Link to v1: https://lore.kernel.org/r/20250217-kunit-kselftests-v1-0-42b4524c3b0a@linut…
---
Thomas Weißschuh (11):
kbuild: userprogs: add nolibc support
kbuild: introduce CONFIG_ARCH_HAS_NOLIBC
kbuild: doc: add label for userprogs section
kbuild: introduce blob framework
kunit: tool: Add test for nested test result reporting
kunit: tool: Don't overwrite test status based on subtest counts
kunit: tool: Parse skipped tests from kselftest.h
kunit: Introduce UAPI testing framework
kunit: uapi: Add example for UAPI tests
kunit: uapi: Introduce preinit executable
kunit: uapi: Validate usability of /proc
Documentation/dev-tools/kunit/api/index.rst | 5 +
Documentation/dev-tools/kunit/api/uapi.rst | 12 +
Documentation/kbuild/makefiles.rst | 37 ++-
MAINTAINERS | 2 +
include/kunit/uapi.h | 24 ++
include/linux/blob.h | 32 +++
init/Kconfig | 2 +
lib/kunit/.kunitconfig | 2 +
lib/kunit/Kconfig | 11 +
lib/kunit/Makefile | 18 +-
lib/kunit/kunit-example-test.c | 15 ++
lib/kunit/kunit-example-uapi.c | 56 ++++
lib/kunit/uapi-preinit.c | 65 +++++
lib/kunit/uapi.c | 294 +++++++++++++++++++++
scripts/Makefile.blobs | 19 ++
scripts/Makefile.build | 6 +
scripts/Makefile.clean | 2 +-
scripts/Makefile.userprogs | 16 +-
scripts/blob-wrap.c | 27 ++
tools/include/nolibc/Kconfig.nolibc | 13 +
tools/testing/kunit/kunit_parser.py | 13 +-
tools/testing/kunit/kunit_tool_test.py | 9 +
.../test_is_test_passed-failure-nested.log | 10 +
.../test_data/test_is_test_passed-kselftest.log | 3 +-
24 files changed, 682 insertions(+), 11 deletions(-)
---
base-commit: bf9962cc9ec3ac1dae2bf81b126657c1c49c348a
change-id: 20241015-kunit-kselftests-56273bc40442
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This patch improves the clarity and grammar of output messages in the acct()
selftest. Minor changes were made to user-facing strings and comments to make
them easier to understand and more consistent with the kselftest style.
Changes include:
- Fixing grammar in printed messages and comments.
- Rewording error and success outputs for clarity and professionalism.
- Making the "root check" message more direct.
These updates improve readability without affecting test logic or behavior.
Signed-off-by: Abdelrahman Fekry <Abdelrahmanfekry375(a)gmail.com>
---
tools/testing/selftests/acct/acct_syscall.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/acct/acct_syscall.c b/tools/testing/selftests/acct/acct_syscall.c
index 87c044fb9293..2c120a527574 100644
--- a/tools/testing/selftests/acct/acct_syscall.c
+++ b/tools/testing/selftests/acct/acct_syscall.c
@@ -22,9 +22,9 @@ int main(void)
ksft_print_header();
ksft_set_plan(1);
- // Check if test is run a root
+ // Check if test is run as root
if (geteuid()) {
- ksft_exit_skip("This test needs root to run!\n");
+ ksft_exit_skip("This test must be run as root!\n");
return 1;
}
@@ -52,7 +52,7 @@ int main(void)
child_pid = fork();
if (child_pid < 0) {
- ksft_test_result_error("Creating a child process to log failed\n");
+ ksft_test_result_error("Failed to create child process for logging\n");
acct(NULL);
return 1;
} else if (child_pid > 0) {
--
2.25.1
The bulk of these changes modify the cow and gup_longterm tests to
report unique and stable names for each test, bringing them into line
with the expectations of tooling that works with kselftest. The string
reported as a test result is used by tooling to both deduplicate tests
and track tests between test runs, using the same string for multiple
tests or changing the string depending on test result causes problems
for user interfaces and automation such as bisection.
It was suggested that converting to use kselftest_harness.h would be a
good way of addressing this, however that really wants the set of tests
to run to be known at compile time but both test programs dynamically
enumarate the set of huge page sizes the system supports and test each.
Refactoring to handle this would be even more invasive than these
changes which are large but straightforward and repetitive.
A version of the main gup_longterm cleanup was previously sent
separately, this version factors out the helpers for logging the start
of the test since the cow test looks very similar.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (4):
selftests/mm: Use standard ksft_finished() in cow and gup_longterm
selftest/mm: Add helper for logging test start and results
selftests/mm: Report unique test names for each cow test
selftests/mm: Fix test result reporting in gup_longterm
tools/testing/selftests/mm/cow.c | 340 +++++++++++++++++++-----------
tools/testing/selftests/mm/gup_longterm.c | 158 ++++++++------
tools/testing/selftests/mm/vm_util.h | 20 ++
3 files changed, 334 insertions(+), 184 deletions(-)
---
base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21
change-id: 20250521-selftests-mm-cow-dedupe-33dcab034558
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hello,
this is the v2 of the many args series for arm64, being itself a revival
of Xu Kuhoai's work to enable larger arguments count for BPF programs on
ARM64 ([1]).
The discussions in v1 shed some light on some issues around specific
cases, for example with functions passing struct on stack with custom
packing/alignment attributes: those cases can not be properly detected
with the current BTF info. So this new revision aims to separate
concerns with a simpler implementation, just accepting additional args
on stack if we can make sure about the alignment constraints (and so,
refusing attachment to functions passing structs on stacks). I then
checked if the specific alignment constraints could be checked with
larger scalar types rather than structs, but it appears that this use
case is in fact rejected at the verifier level (see a9b59159d338 ("bpf:
Do not allow btf_ctx_access with __int128 types")). So in the end the
specific alignment corner cases raised in [1] can not really happen in
the kernel in its current state. This new revision still brings support
for the standard cases as a first step, it will then be possible to
iterate on top of it to add the more specific cases like struct passed
on stack and larger types.
[1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com…
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Changes in v2:
- remove alignment computation from btf.c
- deduce alignment constraints directly in jit compiler for simple types
- deny attachment to functions with "corner-cases" arguments (ie:
structs on stack)
- remove custom tests, as the corresponding use cases are locked either
by the JIT comp or the verifier
- drop RFC
- Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootli…
---
Alexis Lothoré (eBPF Foundation) (1):
selftests/bpf: enable many-args tests for arm64
Xu Kuohai (1):
bpf, arm64: Support up to 12 function arguments
arch/arm64/net/bpf_jit_comp.c | 234 ++++++++++++++++++++-------
tools/testing/selftests/bpf/DENYLIST.aarch64 | 2 -
2 files changed, 180 insertions(+), 56 deletions(-)
---
base-commit: 9435138c069117cd59a4912b5ea2ae44cc2c5ffa
change-id: 20250220-many_args_arm64-8bd3747e6948
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Corrects a spelling mistake "memebers" instead of "members" in
tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
Signed-off-by: Hendrik Hamerlinck <hendrik.hamerlinck(a)hammernet.be>
---
.../selftests/filesystems/mount-notify/mount-notify_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
index 59a71f22fb11..af2b61224a61 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
@@ -230,7 +230,7 @@ static void verify_mount_ids(struct __test_metadata *const _metadata,
}
}
}
- // Check that all list1 memebers can be found in list2. Together with
+ // Check that all list1 members can be found in list2. Together with
// the above it means that the list1 and list2 represent the same sets.
for (i = 0; i < num; i++) {
for (j = 0; j < num; j++) {
--
2.43.0
This patch set convert the wireguard selftest to nftables, as iptables is
deparated and nftables is the default framework of most releases.
v7: re-post, no update.
v6: fix typo in patch 1/2. Update the description (Phil Sutter)
v5: remove the counter in nft rules and link nft statically (Jason A.
Donenfeld)
v4: no update, just re-send
v3: drop iptables directly (Jason A. Donenfeld)
Also convert to using nft for qemu testing (Jason A. Donenfeld)
v2: use one nft table for testing (Phil Sutter)
Hangbin Liu (2):
wireguard: selftests: convert iptables to nft
wireguard: selftests: update to using nft for qemu test
tools/testing/selftests/wireguard/netns.sh | 29 +++++++++------
.../testing/selftests/wireguard/qemu/Makefile | 36 ++++++++++++++-----
.../selftests/wireguard/qemu/kernel.config | 7 ++--
3 files changed, 49 insertions(+), 23 deletions(-)
--
2.46.0
Hi Linus,
Please pull the following kselftest next update for Linux 6.16-rc1.
-- Fixes
- cpufreq test to not double suspend in rtcwake case.
- compile error in pid_namespace test.
- run_kselftest.sh to use readlink if realpath is not available.
- cpufreq basic read and update testcases.
- ftrace to add poll to a gen_file so test can find it at run-time.
- spelling errors in perf_events test.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit b4432656b36e5cc1d50a1f2dc15357543add530e:
Linux 6.15-rc4 (2025-04-27 15:19:23 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.16-rc1
for you to fetch changes up to 1107dc4c5b06188a3fb4897ceb197eb320a52e85:
selftests/run_kselftest.sh: Use readlink if realpath is not available (2025-05-15 16:52:47 -0600)
----------------------------------------------------------------
linux_kselftest-next-6.16-rc1
-- Fixes
- cpufreq test to not double suspend in rtcwake case.
- compile error in pid_namespace test.
- run_kselftest.sh to use readlink if realpath is not available.
- cpufreq basic read and update testcases.
- ftrace to add poll to a gen_file so test can find it at run-time.
- spelling errors in perf_events test.
----------------------------------------------------------------
Ayush Jain (1):
selftests/ftrace: Convert poll to a gen_file
Colin Ian King (1):
selftests/perf_events: Fix spelling mistake "sycnhronize" -> "synchronize"
Nícolas F. R. A. Prado (1):
kselftest: cpufreq: Get rid of double suspend in rtcwake case
Peter Seiderer (1):
selftests: pid_namespace: add missing sys/mount.h include in pid_max.c
Swapnil Sapkal (1):
selftests/cpufreq: Fix cpufreq basic read and update testcases
Thomas Weißschuh (3):
selftests/timens: Print TAP headers
selftests/timens: Make run_tests() functions static
selftests/timens: timerfd: Use correct clockid type in tclock_gettime()
Yosry Ahmed (1):
selftests/run_kselftest.sh: Use readlink if realpath is not available
tools/testing/selftests/cpufreq/cpufreq.sh | 18 +++++++++++++-----
tools/testing/selftests/ftrace/Makefile | 2 +-
tools/testing/selftests/perf_events/watermark_signal.c | 2 +-
tools/testing/selftests/pid_namespace/pid_max.c | 1 +
tools/testing/selftests/run_kselftest.sh | 9 ++++++++-
tools/testing/selftests/timens/clock_nanosleep.c | 4 +++-
tools/testing/selftests/timens/exec.c | 2 ++
tools/testing/selftests/timens/futex.c | 2 ++
tools/testing/selftests/timens/gettime_perf.c | 2 ++
tools/testing/selftests/timens/procfs.c | 2 ++
tools/testing/selftests/timens/timens.c | 2 ++
tools/testing/selftests/timens/timer.c | 4 +++-
tools/testing/selftests/timens/timerfd.c | 6 ++++--
tools/testing/selftests/timens/vfork_exec.c | 2 ++
14 files changed, 46 insertions(+), 12 deletions(-)
----------------------------------------------------------------
Hi Linus,
Please pull the following kunit next update for Linux 6.16-rc1.
- Enables qemu_config for riscv32, sparc 64-bit, PowerPC 32-bit BE and
64-bit LE.
- Enables CONFIG_SPARC32 to clearly differentiate between sparc 32-bit
and 64-bit configurations.
- Enables CONFIG_CPU_BIG_ENDIAN to clearly differentiate between powerpc
LE and BE configurations.
- Adds feature to list available architectures to kunit tool.
- Fixes to bugs and changes to documentation.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 8ffd015db85fea3e15a77027fda6c02ced4d2444:
Linux 6.15-rc2 (2025-04-13 11:54:49 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-6.16-rc1
for you to fetch changes up to 772e50a76ee664e75581624f512df4e45582605a:
kunit: Fix wrong parameter to kunit_deactivate_static_stub() (2025-05-21 09:51:23 -0600)
----------------------------------------------------------------
linux_kselftest-kunit-6.16-rc1
- Enables qemu_config for riscv32, sparc 64-bit, PowerPC 32-bit BE and
64-bit LE.
- Enables CONFIG_SPARC32 to clearly differentiate between sparc 32-bit
and 64-bit configurations.
- Enables CONFIG_CPU_BIG_ENDIAN to clearly differentiate between powerpc
LE and BE configurations.
- Add feature to list available architectures to kunit tool.
- Fixes to bugs and changes to documentation.
----------------------------------------------------------------
David Gow (1):
kunit: qemu_configs: Disable faulting tests on 32-bit SPARC
Kees Cook (1):
kunit: executor: Remove const from kunit_filter_suites() allocation type
Rae Moar (2):
Documentation: kunit: improve example on testing static functions
kunit: tool: add test counts to JSON output
Richard Fitzgerald (1):
kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests
Thomas Weißschuh (6):
kunit: qemu_configs: Add riscv32 config
kunit: tool: Implement listing of available architectures
kunit: qemu_configs: powerpc: Explicitly enable CONFIG_CPU_BIG_ENDIAN=y
kunit: qemu_configs: Add PowerPC 32-bit BE and 64-bit LE
kunit: qemu_configs: sparc: Explicitly enable CONFIG_SPARC32=y
kunit: qemu_configs: Add 64-bit SPARC configuration
Tzung-Bi Shih (1):
kunit: Fix wrong parameter to kunit_deactivate_static_stub()
Documentation/dev-tools/kunit/run_wrapper.rst | 2 ++
Documentation/dev-tools/kunit/usage.rst | 38 +++++++++++++++++++++------
lib/kunit/executor.c | 2 +-
lib/kunit/static_stub.c | 2 +-
tools/testing/kunit/configs/all_tests.config | 1 +
tools/testing/kunit/kunit_json.py | 10 +++++++
tools/testing/kunit/kunit_kernel.py | 8 ++++++
tools/testing/kunit/qemu_configs/powerpc.py | 1 +
tools/testing/kunit/qemu_configs/powerpc32.py | 17 ++++++++++++
tools/testing/kunit/qemu_configs/powerpcle.py | 14 ++++++++++
tools/testing/kunit/qemu_configs/riscv32.py | 17 ++++++++++++
tools/testing/kunit/qemu_configs/sparc.py | 2 ++
tools/testing/kunit/qemu_configs/sparc64.py | 16 +++++++++++
13 files changed, 120 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/kunit/qemu_configs/powerpc32.py
create mode 100644 tools/testing/kunit/qemu_configs/powerpcle.py
create mode 100644 tools/testing/kunit/qemu_configs/riscv32.py
create mode 100644 tools/testing/kunit/qemu_configs/sparc64.py
----------------------------------------------------------------
The madv_populate selftest has some repetitive code for several different
cases that it covers, included repeated test names used in ksft_test_result()
reports. This causes problems for automation, the test name is used to both
track the test between runs and distinguish between multiple tests within
the same run. Fix this by tweaking the messages with duplication to be more
specific about the contexts they're in.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/madv_populate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c
index ef7d911da13e..b6fabd5c27ed 100644
--- a/tools/testing/selftests/mm/madv_populate.c
+++ b/tools/testing/selftests/mm/madv_populate.c
@@ -172,12 +172,12 @@ static void test_populate_read(void)
if (addr == MAP_FAILED)
ksft_exit_fail_msg("mmap failed\n");
ksft_test_result(range_is_not_populated(addr, SIZE),
- "range initially not populated\n");
+ "read range initially not populated\n");
ret = madvise(addr, SIZE, MADV_POPULATE_READ);
ksft_test_result(!ret, "MADV_POPULATE_READ\n");
ksft_test_result(range_is_populated(addr, SIZE),
- "range is populated\n");
+ "read range is populated\n");
munmap(addr, SIZE);
}
@@ -194,12 +194,12 @@ static void test_populate_write(void)
if (addr == MAP_FAILED)
ksft_exit_fail_msg("mmap failed\n");
ksft_test_result(range_is_not_populated(addr, SIZE),
- "range initially not populated\n");
+ "write range initially not populated\n");
ret = madvise(addr, SIZE, MADV_POPULATE_WRITE);
ksft_test_result(!ret, "MADV_POPULATE_WRITE\n");
ksft_test_result(range_is_populated(addr, SIZE),
- "range is populated\n");
+ "write range is populated\n");
munmap(addr, SIZE);
}
@@ -247,19 +247,19 @@ static void test_softdirty(void)
/* Clear any softdirty bits. */
clear_softdirty();
ksft_test_result(range_is_not_softdirty(addr, SIZE),
- "range is not softdirty\n");
+ "cleared range is not softdirty\n");
/* Populating READ should set softdirty. */
ret = madvise(addr, SIZE, MADV_POPULATE_READ);
- ksft_test_result(!ret, "MADV_POPULATE_READ\n");
+ ksft_test_result(!ret, "softdirty MADV_POPULATE_READ\n");
ksft_test_result(range_is_not_softdirty(addr, SIZE),
- "range is not softdirty\n");
+ "range is not softdirty after MADV_POPULATE_READ\n");
/* Populating WRITE should set softdirty. */
ret = madvise(addr, SIZE, MADV_POPULATE_WRITE);
- ksft_test_result(!ret, "MADV_POPULATE_WRITE\n");
+ ksft_test_result(!ret, "softdirty MADV_POPULATE_WRITE\n");
ksft_test_result(range_is_softdirty(addr, SIZE),
- "range is softdirty\n");
+ "range is softdirty after MADV_POPULATE_WRITE \n");
munmap(addr, SIZE);
}
---
base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21
change-id: 20250521-selftests-mm-madv-populate-dedupe-95faf16c3c8f
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Mark Rutland's recent SME fixes updated the SME ABI to reject any
attempt to write FPSIMD register data via the streaming mode SVE
register set but did not update the sve-ptrace test to take account of
this, resulting in spurious failures. Update the test for this, and
also fix another preexisting issue I noticed while looking at this.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (3):
kselftest/arm64: Fix check for setting new VLs in sve-ptrace
kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
tools/testing/selftests/arm64/fp/sve-ptrace.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
---
base-commit: 1c1abfd151c824698830ee900cc8d9f62e9a5fbb
change-id: 20250523-kselftest-arm64-ssve-fixups-b68ae61c1ebf
Best regards,
--
Mark Brown <broonie(a)kernel.org>
In cpufreq basic selftests, one of the testcases is to read all cpufreq
sysfs files and print the values. This testcase assumes all the cpufreq
sysfs files have read permissions. However certain cpufreq sysfs files
(eg. stats/reset) are write only files and this testcase errors out
when it is not able to read the file.
Similarily, there is one more testcase which reads the cpufreq sysfs
file data and write it back to same file. This testcase also errors out
for sysfs files without read permission.
Fix these testcases by adding proper read permission checks.
Reported-by: Narasimhan V <narasimhan.v(a)amd.com>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal(a)amd.com>
---
tools/testing/selftests/cpufreq/cpufreq.sh | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/cpufreq/cpufreq.sh b/tools/testing/selftests/cpufreq/cpufreq.sh
index e350c521b467..3484fa34e8d8 100755
--- a/tools/testing/selftests/cpufreq/cpufreq.sh
+++ b/tools/testing/selftests/cpufreq/cpufreq.sh
@@ -52,7 +52,14 @@ read_cpufreq_files_in_dir()
for file in $files; do
if [ -f $1/$file ]; then
printf "$file:"
- cat $1/$file
+ #file is readable ?
+ local rfile=$(ls -l $1/$file | awk '$1 ~ /^.*r.*/ { print $NF; }')
+
+ if [ ! -z $rfile ]; then
+ cat $1/$file
+ else
+ printf "$file is not readable\n"
+ fi
else
printf "\n"
read_cpufreq_files_in_dir "$1/$file"
@@ -83,10 +90,10 @@ update_cpufreq_files_in_dir()
for file in $files; do
if [ -f $1/$file ]; then
- # file is writable ?
- local wfile=$(ls -l $1/$file | awk '$1 ~ /^.*w.*/ { print $NF; }')
+ # file is readable and writable ?
+ local rwfile=$(ls -l $1/$file | awk '$1 ~ /^.*rw.*/ { print $NF; }')
- if [ ! -z $wfile ]; then
+ if [ ! -z $rwfile ]; then
# scaling_setspeed is a special file and we
# should skip updating it
if [ $file != "scaling_setspeed" ]; then
--
2.43.0
With joint effort from the upstream KVM community, we come up with the
4th version of mediated vPMU for x86. We have made the following changes
on top of the previous RFC v3.
v3 -> v4
- Rebase whole patchset on 6.14-rc3 base.
- Address Peter's comments on Perf part.
- Address Sean's comments on KVM part.
* Change key word "passthrough" to "mediated" in all patches
* Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
* Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
save/retore list support for GLOBAL_CTRL, thus the support of mediated
vPMU is constrained to SapphireRapids and later CPUs on Intel side.
* Merge some small changes into a single patch.
- Address Sandipan's comment on invalid pmu pointer.
- Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
Testing (Intel side):
- Perf-based legacy vPMU (force emulation on/off)
* Kselftests pmu_counters_test, pmu_event_filter_test and
vmx_pmu_caps_test pass.
* KUT PMU tests pmu, pmu_lbr, pmu_pebs pass.
* Basic perf counting/sampling tests in 3 scenarios, guest-only,
host-only and host-guest coexistence all pass.
- Mediated vPMU (force emulation on/off)
* Kselftests pmu_counters_test, pmu_event_filter_test and
vmx_pmu_caps_test pass.
* KUT PMU tests pmu, pmu_lbr, pmu_pebs pass.
* Basic perf counting/sampling tests in 3 scenarios, guest-only,
host-only and host-guest coexistence all pass.
- Failures. All above tests passed on Intel Granite Rapids as well
except a failure on KUT/pmu_pebs.
* GP counter 0 (0xfffffffffffe): PEBS record (written seq 0)
is verified (including size, counters and cfg).
* The pebs_data_cfg (0xb500000000) doesn't match with the
effective MSR_PEBS_DATA_CFG (0x0).
* This failure has nothing to do with this mediated vPMU patch set. The
failure is caused by Granite Rapids supported timed PEBS which needs
extra support on Qemu and KUT/pmu_pebs. These extra support would be
sent in separate patches later.
Testing (AMD side):
- Kselftests pmu_counters_test, pmu_event_filter_test and
vmx_pmu_caps_test all pass
- legacy guest with KUT/pmu:
* qmeu option: -cpu host, -perfctr-core
* when set force_emulation_prefix=1, passes
* when set force_emulation_prefix=0, passes
- perfmon-v1 guest with KUT/pmu:
* qmeu option: -cpu host, -perfmon-v2
* when set force_emulation_prefix=1, passes
* when set force_emulation_prefix=0, passes
- perfmon-v2 guest with KUT/pmu:
* qmeu option: -cpu host
* when set force_emulation_prefix=1, passes
* when set force_emulation_prefix=0, passes
- perf_fuzzer (perfmon-v2):
* fails with soft lockup in guest in current version.
* culprit could be between 6.13 ~ 6.14-rc3 within KVM
* Series tested on 6.12 and 6.13 without issue.
Note: a QEMU series is needed to run mediated vPMU v4:
- https://lore.kernel.org/all/20250324123712.34096-1-dapeng1.mi@linux.intel.c…
History:
- RFC v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com/
- RFC v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com/
- RFC v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.int…
Dapeng Mi (18):
KVM: x86/pmu: Introduce enable_mediated_pmu global parameter
KVM: x86/pmu: Check PMU cpuid configuration from user space
KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
KVM: x86/pmu: Add perf_capabilities field in struct kvm_host_values{}
KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
KVM: VMX: Add macros to wrap around
{secondary,tertiary}_exec_controls_changebit()
KVM: x86/pmu: Check if mediated vPMU can intercept rdpmc
KVM: x86/pmu/vmx: Save/load guest IA32_PERF_GLOBAL_CTRL with
vm_exit/entry_ctrl
KVM: x86/pmu: Optimize intel/amd_pmu_refresh() helpers
KVM: x86/pmu: Setup PMU MSRs' interception mode
KVM: x86/pmu: Handle PMU MSRs interception and event filtering
KVM: x86/pmu: Switch host/guest PMU context at vm-exit/vm-entry
KVM: x86/pmu: Handle emulated instruction for mediated vPMU
KVM: nVMX: Add macros to simplify nested MSR interception setting
KVM: selftests: Add mediated vPMU supported for pmu tests
KVM: Selftests: Support mediated vPMU for vmx_pmu_caps_test
KVM: Selftests: Fix pmu_counters_test error for mediated vPMU
KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
Kan Liang (8):
perf: Support get/put mediated PMU interfaces
perf: Skip pmu_ctx based on event_type
perf: Clean up perf ctx time
perf: Add a EVENT_GUEST flag
perf: Add generic exclude_guest support
perf: Add switch_guest_ctx() interface
perf/x86: Support switch_guest_ctx interface
perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
Mingwei Zhang (5):
perf/x86: Forbid PMI handler when guest own PMU
perf/x86/core: Plumb mediated PMU capability from x86_pmu to
x86_pmu_cap
KVM: x86/pmu: Exclude PMU MSRs in vmx_get_passthrough_msr_slot()
KVM: x86/pmu: introduce eventsel_hw to prepare for pmu event filtering
KVM: nVMX: Add nested virtualization support for mediated PMU
Sandipan Das (4):
perf/x86/core: Do not set bit width for unavailable counters
KVM: x86/pmu: Add AMD PMU registers to direct access list
KVM: x86/pmu/svm: Set GuestOnly bit and clear HostOnly bit when guest
write to event selectors
perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
Xiong Zhang (3):
x86/irq: Factor out common code for installing kvm irq handler
perf: core/x86: Register a new vector for KVM GUEST PMI
KVM: x86/pmu: Register KVM_GUEST_PMI_VECTOR handler
arch/x86/events/amd/core.c | 2 +
arch/x86/events/core.c | 40 +-
arch/x86/events/intel/core.c | 5 +
arch/x86/include/asm/hardirq.h | 1 +
arch/x86/include/asm/idtentry.h | 1 +
arch/x86/include/asm/irq.h | 2 +-
arch/x86/include/asm/irq_vectors.h | 5 +-
arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 10 +
arch/x86/include/asm/msr-index.h | 18 +-
arch/x86/include/asm/perf_event.h | 1 +
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/irq.c | 39 +-
arch/x86/kvm/cpuid.c | 15 +
arch/x86/kvm/pmu.c | 254 ++++++++-
arch/x86/kvm/pmu.h | 45 ++
arch/x86/kvm/svm/pmu.c | 148 ++++-
arch/x86/kvm/svm/svm.c | 26 +
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/capabilities.h | 11 +-
arch/x86/kvm/vmx/nested.c | 68 ++-
arch/x86/kvm/vmx/pmu_intel.c | 224 ++++++--
arch/x86/kvm/vmx/vmx.c | 89 +--
arch/x86/kvm/vmx/vmx.h | 11 +-
arch/x86/kvm/x86.c | 63 ++-
arch/x86/kvm/x86.h | 2 +
include/linux/perf_event.h | 47 +-
kernel/events/core.c | 519 ++++++++++++++----
.../beauty/arch/x86/include/asm/irq_vectors.h | 5 +-
.../selftests/kvm/include/kvm_test_harness.h | 13 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/include/x86/processor.h | 8 +
tools/testing/selftests/kvm/lib/kvm_util.c | 23 +
.../selftests/kvm/x86/pmu_counters_test.c | 24 +-
.../selftests/kvm/x86/pmu_event_filter_test.c | 8 +-
.../selftests/kvm/x86/vmx_pmu_caps_test.c | 2 +-
37 files changed, 1480 insertions(+), 258 deletions(-)
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
--
2.49.0.395.g12beb8f557-goog
Add small grammar fixes in perf events and Real Time Clock tests'
output messages.
Include braces around a single if statement, when there are multiple
statements in the else branch, to align with the kernel coding style.
Signed-off-by: Hanne-Lotta Mäenpää <hannelotta(a)gmail.com>
---
Notes:
v1 -> v2: Improved wording in RTC tests based on feedback from
Alexandre Belloni <alexandre.belloni(a)bootlin.com>
tools/testing/selftests/perf_events/watermark_signal.c | 7 ++++---
tools/testing/selftests/rtc/rtctest.c | 10 +++++-----
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/perf_events/watermark_signal.c b/tools/testing/selftests/perf_events/watermark_signal.c
index 49dc1e831174..6176afd4950b 100644
--- a/tools/testing/selftests/perf_events/watermark_signal.c
+++ b/tools/testing/selftests/perf_events/watermark_signal.c
@@ -65,8 +65,9 @@ TEST(watermark_signal)
child = fork();
EXPECT_GE(child, 0);
- if (child == 0)
+ if (child == 0) {
do_child();
+ }
else if (child < 0) {
perror("fork()");
goto cleanup;
@@ -75,7 +76,7 @@ TEST(watermark_signal)
if (waitpid(child, &child_status, WSTOPPED) != child ||
!(WIFSTOPPED(child_status) && WSTOPSIG(child_status) == SIGSTOP)) {
fprintf(stderr,
- "failed to sycnhronize with child errno=%d status=%x\n",
+ "failed to synchronize with child errno=%d status=%x\n",
errno,
child_status);
goto cleanup;
@@ -84,7 +85,7 @@ TEST(watermark_signal)
fd = syscall(__NR_perf_event_open, &attr, child, -1, -1,
PERF_FLAG_FD_CLOEXEC);
if (fd < 0) {
- fprintf(stderr, "failed opening event %llx\n", attr.config);
+ fprintf(stderr, "failed to setup performance monitoring %llx\n", attr.config);
goto cleanup;
}
diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c
index be175c0e6ae3..930bf0ce4fa6 100644
--- a/tools/testing/selftests/rtc/rtctest.c
+++ b/tools/testing/selftests/rtc/rtctest.c
@@ -138,10 +138,10 @@ TEST_F_TIMEOUT(rtc, date_read_loop, READ_LOOP_DURATION_SEC + 2) {
rtc_read = rtc_time_to_timestamp(&rtc_tm);
/* Time should not go backwards */
ASSERT_LE(prev_rtc_read, rtc_read);
- /* Time should not increase more then 1s at a time */
+ /* Time should not increase more than 1s per read */
ASSERT_GE(prev_rtc_read + 1, rtc_read);
- /* Sleep 11ms to avoid killing / overheating the RTC */
+ /* Sleep 11ms to avoid overheating the RTC */
nanosleep_with_retries(READ_LOOP_SLEEP_MS * 1000000);
prev_rtc_read = rtc_read;
@@ -236,7 +236,7 @@ TEST_F(rtc, alarm_alm_set) {
if (alarm_state == RTC_ALARM_DISABLED)
SKIP(return, "Skipping test since alarms are not supported.");
if (alarm_state == RTC_ALARM_RES_MINUTE)
- SKIP(return, "Skipping test since alarms has only minute granularity.");
+ SKIP(return, "Skipping test since alarm has only minute granularity.");
rc = ioctl(self->fd, RTC_RD_TIME, &tm);
ASSERT_NE(-1, rc);
@@ -306,7 +306,7 @@ TEST_F(rtc, alarm_wkalm_set) {
if (alarm_state == RTC_ALARM_DISABLED)
SKIP(return, "Skipping test since alarms are not supported.");
if (alarm_state == RTC_ALARM_RES_MINUTE)
- SKIP(return, "Skipping test since alarms has only minute granularity.");
+ SKIP(return, "Skipping test since alarm has only minute granularity.");
rc = ioctl(self->fd, RTC_RD_TIME, &alarm.time);
ASSERT_NE(-1, rc);
@@ -502,7 +502,7 @@ int main(int argc, char **argv)
if (access(rtc_file, R_OK) == 0)
ret = test_harness_run(argc, argv);
else
- ksft_exit_skip("[SKIP]: Cannot access rtc file %s - Exiting\n",
+ ksft_exit_skip("Cannot access RTC file %s - exiting\n",
rtc_file);
return ret;
--
2.39.5
This patch corrects minor coding style issues to comply with the Linux kernel coding style:
- Align closing parentheses to match opening ones in printf statements.
- Break long lines to keep them within the 100-column limit.
These changes address warnings reported by checkpatch.pl and do not
affect functionality.
changes in v2 :
- Resubmitted the patch with a properly formatted commit message,
following patch submission guidelines, as suggested by Shuah Khan.
Signed-off-by: Rujra Bhatt <braker.noob.kernel(a)gmail.com>
---
tools/testing/selftests/timers/valid-adjtimex.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/timers/valid-adjtimex.c
b/tools/testing/selftests/timers/valid-adjtimex.c
index 6b7801055ad1..5110f9ee285c 100644
--- a/tools/testing/selftests/timers/valid-adjtimex.c
+++ b/tools/testing/selftests/timers/valid-adjtimex.c
@@ -157,7 +157,7 @@ int validate_freq(void)
if (tx.freq == outofrange_freq[i]) {
printf("[FAIL]\n");
printf("ERROR: out of range value %ld actually set!\n",
- tx.freq);
+ tx.freq);
pass = -1;
goto out;
}
@@ -172,7 +172,7 @@ int validate_freq(void)
if (ret >= 0) {
printf("[FAIL]\n");
printf("Error: No failure on invalid
ADJ_FREQUENCY %ld\n",
- invalid_freq[i]);
+ invalid_freq[i]);
pass = -1;
goto out;
}
@@ -238,7 +238,8 @@ int set_bad_offset(long sec, long usec, int use_nano)
tmx.time.tv_usec = usec;
ret = clock_adjtime(CLOCK_REALTIME, &tmx);
if (ret >= 0) {
- printf("Invalid (sec: %ld usec: %ld) did not fail! ",
tmx.time.tv_sec, tmx.time.tv_usec);
+ printf("Invalid (sec: %ld usec: %ld) did not fail! ",
+ tmx.time.tv_sec, tmx.time.tv_usec);
printf("[FAIL]\n");
return -1;
}
--
2.43.0
With newer compilers the pid_max test is failing to build with the
following build error:
...
pid_max.c: In function ‘pid_max_cb’:
pid_max.c:42:15: error: implicit declaration of function ‘mount’ [-Wimplicit-function-declaration]
42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
| ^~~~~
pid_max.c:42:36: error: ‘MS_PRIVATE’ undeclared (first use in this function); did you mean ‘MAP_PRIVATE’?
42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
| ^~~~~~~~~~
| MAP_PRIVATE
pid_max.c:42:36: note: each undeclared identifier is reported only once for each function it appears in
...
The fix seems to be including sys/mount.h which brings in the missing
defines and missing definition mount function.
Signed-off-by: Brahmajit Das <listout(a)listout.xyz>
---
tools/testing/selftests/pid_namespace/pid_max.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/pid_namespace/pid_max.c b/tools/testing/selftests/pid_namespace/pid_max.c
index 51c414faabb0..96f274f0582b 100644
--- a/tools/testing/selftests/pid_namespace/pid_max.c
+++ b/tools/testing/selftests/pid_namespace/pid_max.c
@@ -10,6 +10,7 @@
#include <stdlib.h>
#include <string.h>
#include <syscall.h>
+#include <sys/mount.h>
#include <sys/wait.h>
#include "../kselftest_harness.h"
--
2.49.0
sys_open_tree is once defined in filesystems/overlayfs/wrappers.h so
before we define it in mount_setattr_test.c we should check if has been
previously defined or not. Otherwise it results in the following build
error:
make[1]: Nothing to be done for 'all'.
CC mount_setattr_test
mount_setattr_test.c:176:19: error: redefinition of ‘sys_open_tree’
176 | static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
| ^~~~~~~~~~~~~
In file included from mount_setattr_test.c:23:
../filesystems/overlayfs/wrappers.h:59:19: note: previous definition of
‘sys_open_tree’ with type ‘int(int, const char *, unsigned int)’
59 | static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
| ^~~~~~~~~~~~~
make[1]: *** [../lib.mk:222: /home/listout/linux/tools/testing/selftests/mount_setattr/mount_setattr_test]
Error 1
Signed-off-by: Brahmajit Das <listout(a)listout.xyz>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 48a000cabc97..b0798777b822 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -173,10 +173,13 @@ static inline int sys_mount_setattr(int dfd, const char *path, unsigned int flag
#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
#endif
+/* Do not define sys_open_tree if it's already defined in overlayfs/wrappers.h */
+#if !defined(__SELFTEST_OVERLAYFS_WRAPPERS_H__)
static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
{
return syscall(__NR_open_tree, dfd, filename, flags);
}
+#endif
static ssize_t write_nointr(int fd, const void *buf, size_t count)
{
--
2.49.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the DualPI2 patch v16.
v16 (16-MAy-2025)
- Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>)
- Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>)
- Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>)
v15 (09-May-2025)
- Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>)
- Fix one typo in comment of #2
- Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h
v14 (05-May-2025)
- Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun
- Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>)
- Update dualpi2.json to align with the updated variable order in pkt_sched.h
- Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>)
v13 (26-Apr-2025)
- Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>)
v12 (22-Apr-2025)
- Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>)
- Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>)
- Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>)
v11 (15-Apr-2025)
- Replace hstimer_init with hstimer_setup in sch_dualpi2.c
v10 (25-Mar-2025)
- Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>)
- Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>)
v9 (16-Mar-2025)
- Fix mem_usage error in previous version
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking.
In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets.
This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0.
Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 11.55 11.70 ms 350
TCP upload avg : 18.96 N/A Mbits/s 350
TCP upload sum : 18.96 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 10.81 10.70 ms 350
TCP upload avg : 18.91 N/A Mbits/s 350
TCP upload sum : 18.91 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 12.61 12.80 ms 350
TCP upload avg : 9.48 N/A Mbits/s 350
TCP upload sum : 9.48 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.06 10.80 ms 350
TCP upload avg : 9.43 N/A Mbits/s 350
TCP upload sum : 9.43 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 40.86 37.45 ms 350
TCP upload avg : 0.88 N/A Mbits/s 350
TCP upload sum : 0.88 N/A Mbits/s 350
TCP upload::1 : 0.88 0.97 Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.07 10.40 ms 350
TCP upload avg : 0.55 N/A Mbits/s 350
TCP upload sum : 0.55 N/A Mbits/s 350
TCP upload::1 : 0.55 0.59 Mbits/s 350
v8 (11-Mar-2025)
- Fix warning messages in v7
v7 (07-Mar-2025)
- Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>)
v6 (04-Mar-2025)
- Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>)
- Update test cases in dualpi2.json
- Update commit message
v5 (22-Feb-2025)
- A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL:
Unshaped 1gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
- Summary of tcp_4down run 'MQ + FQ_PIE'
avg median # data pts
Ping (ms) ICMP : 1.21 1.37 ms 350
TCP download avg : 235.42 N/A Mbits/s 350
TCP download sum : 941.61 N/A Mbits/s 350
TCP download::1 : 232.54 233.13 Mbits/s 350
TCP download::2 : 232.52 232.80 Mbits/s 350
TCP download::3 : 233.14 233.78 Mbits/s 350
TCP download::4 : 243.41 241.48 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2'
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
Unshaped 1gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
Unshaped 10gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 0.22 0.23 ms 350
TCP download avg : 2354.08 N/A Mbits/s 350
TCP download sum : 9416.31 N/A Mbits/s 350
TCP download::1 : 2353.65 2352.81 Mbits/s 350
TCP download::2 : 2354.54 2354.21 Mbits/s 350
TCP download::3 : 2353.56 2353.78 Mbits/s 350
TCP download::4 : 2354.56 2354.45 Mbits/s 350
- Summary of tcp_4down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 0.20 0.19 ms 350
TCP download avg : 2354.76 N/A Mbits/s 350
TCP download sum : 9419.04 N/A Mbits/s 350
TCP download::1 : 2354.77 2353.89 Mbits/s 350
TCP download::2 : 2353.41 2354.29 Mbits/s 350
TCP download::3 : 2356.18 2354.19 Mbits/s 350
TCP download::4 : 2354.68 2353.15 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 0.24 0.24 ms 350
TCP download avg : 2354.11 N/A Mbits/s 350
TCP download sum : 9416.43 N/A Mbits/s 350
TCP download::1 : 2354.75 2353.93 Mbits/s 350
TCP download::2 : 2353.15 2353.75 Mbits/s 350
TCP download::3 : 2353.49 2353.72 Mbits/s 350
TCP download::4 : 2355.04 2353.73 Mbits/s 350
Unshaped 10gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 7.57 8.69 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9467.82 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 7.82 8.91 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9468.42 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 6.87 7.93 ms 350
TCP download avg : 73.95 N/A Mbits/s 350
TCP download sum : 9465.87 N/A Mbits/s 350
From the results shown above, we see small differences between combinations.
- Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>)
- Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>)
- Update license identifier (Dave Taht <dave.taht(a)gmail.com>)
- Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>)
- Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>)
- Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c
- Fix step_thresh in packets
- Update code comments in sch_dualpi2.c
v4 (22-Oct-2024)
- Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>)
- Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Fix line length warning.
v3 (19-Oct-2024)
- Fix compilaiton error
- Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Oct-2024)
- Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
- Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Fix line length warning
This patch serise adds DualPI Improved with a Square (DualPI2) with following features:
* Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague)
* Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332
* Configurable overload strategies
* Use of sojourn time to reliably estimate queue delay
* Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues
For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332).
Best regards,
Chia-Yu
Chia-Yu Chang (4):
sched: Struct definition and parsing of dualpi2 qdisc
sched: Dump configuration and statistics of dualpi2 qdisc
selftests/tc-testing: Add selftests for qdisc DualPI2
Documentation: netlink: specs: tc: Add DualPI2 specification
Koen De Schepper (1):
sched: Add enqueue/dequeue of dualpi2 qdisc
Documentation/netlink/specs/tc.yaml | 156 +++
include/net/dropreason-core.h | 6 +
include/uapi/linux/pkt_sched.h | 68 +
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/sch_dualpi2.c | 1127 +++++++++++++++++
tools/testing/selftests/tc-testing/config | 1 +
.../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++
tools/testing/selftests/tc-testing/tdc.sh | 1 +
9 files changed, 1626 insertions(+)
create mode 100644 net/sched/sch_dualpi2.c
create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json
--
2.34.1
Apparently, this test completes successfully when it completes execution
without either causing a kernel panic or being killed by the kernel.
This new test result message is more descriptive and grammatically
correct.
There are no changes in v2. I'm resending this patch because I addressed
it to the wrong email for Shuah.
Signed-off-by: Brigham Campbell <me(a)brighamcampbell.com>
---
tools/testing/selftests/x86/mov_ss_trap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/mov_ss_trap.c b/tools/testing/selftests/x86/mov_ss_trap.c
index f22cb6b382f9..d80033c0d7eb 100644
--- a/tools/testing/selftests/x86/mov_ss_trap.c
+++ b/tools/testing/selftests/x86/mov_ss_trap.c
@@ -269,6 +269,6 @@ int main()
);
}
- printf("[OK]\tI aten't dead\n");
+ printf("[OK]\tkernel handled MOV SS without crashing test\n");
return 0;
}
--
2.49.0
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Paul Walmsley <paul.walmsley(a)sifive.com>
To: Palmer Dabbelt <palmer(a)dabbelt.com>
To: Albert Ou <aou(a)eecs.berkeley.edu>
To: Conor Dooley <conor(a)kernel.org>
To: Rob Herring <robh(a)kernel.org>
To: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
To: Arnd Bergmann <arnd(a)arndb.de>
To: Christian Brauner <brauner(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Oleg Nesterov <oleg(a)redhat.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Kees Cook <kees(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
To: Jann Horn <jannh(a)google.com>
To: Conor Dooley <conor+dt(a)kernel.org>
To: Miguel Ojeda <ojeda(a)kernel.org>
To: Alex Gaynor <alex.gaynor(a)gmail.com>
To: Boqun Feng <boqun.feng(a)gmail.com>
To: Gary Guo <gary(a)garyguo.net>
To: Björn Roy Baron <bjorn3_gh(a)protonmail.com>
To: Benno Lossin <benno.lossin(a)proton.me>
To: Andreas Hindborg <a.hindborg(a)kernel.org>
To: Alice Ryhl <aliceryhl(a)google.com>
To: Trevor Gross <tmgross(a)umich.edu>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-riscv(a)lists.infradead.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: alistair.francis(a)wdc.com
Cc: richard.henderson(a)linaro.org
Cc: jim.shu(a)sifive.com
Cc: andybnac(a)gmail.com
Cc: kito.cheng(a)sifive.com
Cc: charlie(a)rivosinc.com
Cc: atishp(a)rivosinc.com
Cc: evan(a)rivosinc.com
Cc: cleger(a)rivosinc.com
Cc: alexghiti(a)rivosinc.com
Cc: samitolvanen(a)google.com
Cc: broonie(a)kernel.org
Cc: rick.p.edgecombe(a)intel.com
Cc: rust-for-linux(a)vger.kernel.org
changelog
---------
v16:
- If FWFT is not implemented or returns error for shadow stack activation, then
no_usercfi is set to disable shadow stack. Although this should be picked up
by extension validation and activation. Fixed this bug for zicfilp and zicfiss
both. Thanks to Charlie Jenkins for reporting this.
- If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by
Charlie Jenkins.
- Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to
keep it off till we have more hardware availibility with RVA23 profile and
zimop/zcmop implemented. Else this will start breaking people's workflow
- Includes the fix if "!RV64 and !SBI" then definitions for FWFT in
asm-offsets.c error.
v15:
- Toolchain has been updated to include `-fcf-protection` flag. This
exists for x86 as well. Updated kernel patches to compile vDSO and
selftest to compile with `fcf-protection=full` flag.
- selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI.
- Patch to enable shadow stack for kernel wasn't hidden behind
CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14:
- rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches
Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants.
- Took Radim's suggestions on bitfields.
- Placed cfi_state at the end of thread_info block so that current situation
is not disturbed with respect to member fields of thread_info in single
cacheline.
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to
to `lpad 0`.
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
Changes in v16:
- Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri…
Changes in v15:
- changelog posted just below cover letter
- Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri…
Changes in v14:
- changelog posted just below cover letter
- Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri…
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri…
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri…
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (25):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv: Implements arch agnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: kernel command line option to opt out of user cfi
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad
Documentation/admin-guide/kernel-parameters.txt | 8 +
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 179 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 21 +
arch/riscv/Makefile | 5 +-
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 12 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 25 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 95 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 34 ++
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 10 +
arch/riscv/kernel/cpufeature.c | 27 +
arch/riscv/kernel/entry.S | 33 +-
arch/riscv/kernel/head.S | 27 +
arch/riscv/kernel/process.c | 26 +-
arch/riscv/kernel/ptrace.c | 95 ++++
arch/riscv/kernel/signal.c | 148 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 43 ++
arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 6 +
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 17 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 27 +
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 16 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++
tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 27 +
54 files changed, 2360 insertions(+), 29 deletions(-)
---
base-commit: 4181f8ad7a1061efed0219951d608d4988302af7
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
I'd like to cut down the memory usage of parsing vmlinux BTF in ebpf-go.
With some upcoming changes the library is sitting at 5MiB for a parse.
Most of that memory is simply copying the BTF blob into user space.
By allowing vmlinux BTF to be mmapped read-only into user space I can
cut memory usage by about 75%.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v4:
- Go back to remap_pfn_range for aarch64 compat
- Dropped btf_new_no_copy (Andrii)
- Fixed nits in selftests (Andrii)
- Clearer error handling in the mmap handler (Andrii)
- Fixed build on s390
- Link to v3: https://lore.kernel.org/r/20250505-vmlinux-mmap-v3-0-5d53afa060e8@isovalent…
Changes in v3:
- Remove slightly confusing calculation of trailing (Alexei)
- Use vm_insert_page (Alexei)
- Simplified libbpf code
- Link to v2: https://lore.kernel.org/r/20250502-vmlinux-mmap-v2-0-95c271434519@isovalent…
Changes in v2:
- Use btf__new in selftest
- Avoid vm_iomap_memory in btf_vmlinux_mmap
- Add VM_DONTDUMP
- Add support to libbpf
- Link to v1: https://lore.kernel.org/r/20250501-vmlinux-mmap-v1-0-aa2724572598@isovalent…
---
Lorenz Bauer (3):
btf: allow mmap of vmlinux btf
selftests: bpf: add a test for mmapable vmlinux BTF
libbpf: Use mmap to parse vmlinux BTF from sysfs
include/asm-generic/vmlinux.lds.h | 3 +-
kernel/bpf/sysfs_btf.c | 32 ++++++++
tools/lib/bpf/btf.c | 85 ++++++++++++++++++----
tools/testing/selftests/bpf/prog_tests/btf_sysfs.c | 81 +++++++++++++++++++++
4 files changed, 184 insertions(+), 17 deletions(-)
---
base-commit: 7220eabff8cb4af3b93cd021aa853b9f5df2923f
change-id: 20250501-vmlinux-mmap-2ec5563c3ef1
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
I'd like to cut down the memory usage of parsing vmlinux BTF in ebpf-go.
With some upcoming changes the library is sitting at 5MiB for a parse.
Most of that memory is simply copying the BTF blob into user space.
By allowing vmlinux BTF to be mmapped read-only into user space I can
cut memory usage by about 75%.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v5:
- Fix error return of btf_parse_raw_mmap (Andrii)
- Link to v4: https://lore.kernel.org/r/20250510-vmlinux-mmap-v4-0-69e424b2a672@isovalent…
Changes in v4:
- Go back to remap_pfn_range for aarch64 compat
- Dropped btf_new_no_copy (Andrii)
- Fixed nits in selftests (Andrii)
- Clearer error handling in the mmap handler (Andrii)
- Fixed build on s390
- Link to v3: https://lore.kernel.org/r/20250505-vmlinux-mmap-v3-0-5d53afa060e8@isovalent…
Changes in v3:
- Remove slightly confusing calculation of trailing (Alexei)
- Use vm_insert_page (Alexei)
- Simplified libbpf code
- Link to v2: https://lore.kernel.org/r/20250502-vmlinux-mmap-v2-0-95c271434519@isovalent…
Changes in v2:
- Use btf__new in selftest
- Avoid vm_iomap_memory in btf_vmlinux_mmap
- Add VM_DONTDUMP
- Add support to libbpf
- Link to v1: https://lore.kernel.org/r/20250501-vmlinux-mmap-v1-0-aa2724572598@isovalent…
---
Lorenz Bauer (3):
btf: allow mmap of vmlinux btf
selftests: bpf: add a test for mmapable vmlinux BTF
libbpf: Use mmap to parse vmlinux BTF from sysfs
include/asm-generic/vmlinux.lds.h | 3 +-
kernel/bpf/sysfs_btf.c | 32 ++++++++
tools/lib/bpf/btf.c | 89 +++++++++++++++++-----
tools/testing/selftests/bpf/prog_tests/btf_sysfs.c | 81 ++++++++++++++++++++
4 files changed, 186 insertions(+), 19 deletions(-)
---
base-commit: 7220eabff8cb4af3b93cd021aa853b9f5df2923f
change-id: 20250501-vmlinux-mmap-2ec5563c3ef1
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
Here is a series from Geliang, adding mptcp_subflow bpf_iter support.
We are working on extending MPTCP with BPF, e.g. to control the path
manager -- in charge of the creation, deletion, and announcements of
subflows (paths) -- and the packet scheduler -- in charge of selecting
which available path the next data will be sent to. These extensions
need to iterate over the list of subflows attached to an MPTCP
connection, and do some specific actions via some new kfunc that will be
added later on.
This preparation work is split in different patches:
- Patch 1: register some "basic" MPTCP kfunc.
- Patch 2: add mptcp_subflow bpf_iter support. Note that previous
versions of this single patch have already been shared to the
BPF mailing list. The changelog has been kept with a comment,
but the version number has been reset to avoid confusions.
- Patch 3: add more MPTCP endpoints in the selftests, in order to create
more than 2 subflows.
- Patch 4: add a very simple test validating mptcp_subflow bpf_iter
support. This test could be written without the new bpf_iter,
but it is there only to make sure this specific feature works
as expected.
- Patch 5: a small fix to drop an unused parameter in the selftests.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v3:
- Previous patches 1, 2 and 5 were no longer needed. (Martin)
- Patch 2: Switch to 'struct sock' and drop unneeded checks. (Martin)
- Patch 4: Adapt the test accordingly.
- Patch 5: New small fix for the selftests.
- Examples and questions for BPF maintainers have been added in Patch 2.
- Link to v2: https://lore.kernel.org/r/20241219-bpf-next-net-mptcp-bpf_iter-subflows-v2-…
Changes in v2:
- Patches 1-2: new ones.
- Patch 3: remove two kfunc, more restrictions. (Martin)
- Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin)
- Patch 7: adaptations due to modifications in patches 1-4.
- Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-…
---
Geliang Tang (5):
bpf: Register mptcp common kfunc set
bpf: Add mptcp_subflow bpf_iter
selftests/bpf: More endpoints for endpoint_init
selftests/bpf: Add mptcp_subflow bpf_iter subtest
selftests/bpf: Drop cgroup_fd of run_mptcpify
net/mptcp/bpf.c | 87 +++++++++++++-
tools/testing/selftests/bpf/bpf_experimental.h | 8 ++
tools/testing/selftests/bpf/prog_tests/mptcp.c | 133 +++++++++++++++++++--
tools/testing/selftests/bpf/progs/mptcp_bpf.h | 4 +
.../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 59 +++++++++
5 files changed, 282 insertions(+), 9 deletions(-)
---
base-commit: dad704ebe38642cd405e15b9c51263356391355c
change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Split out all headers which are used by nolibc-test.c.
This makes it easier to port existing applications to nolibc.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (9):
tools/nolibc: move ioctl() to sys/ioctl.h
tools/nolibc: move mount() to sys/mount.h
tools/nolibc: move prctl() to sys/prctl.h
tools/nolibc: move reboot() to sys/reboot.h
tools/nolibc: move getrlimit() and friends to sys/resource.h
tools/nolibc: move makedev() and friends to sys/sysmacros.h
tools/nolibc: move uname() and friends to sys/utsname.h
tools/nolibc: move NULL and offsetof() to sys/stddef.h
selftests/nolibc: drop include guards around standard headers
tools/include/nolibc/Makefile | 8 ++
tools/include/nolibc/nolibc.h | 7 ++
tools/include/nolibc/std.h | 6 +-
tools/include/nolibc/stddef.h | 24 +++++
tools/include/nolibc/sys.h | 136 ---------------------------
tools/include/nolibc/sys/ioctl.h | 29 ++++++
tools/include/nolibc/sys/mount.h | 37 ++++++++
tools/include/nolibc/sys/prctl.h | 36 +++++++
tools/include/nolibc/sys/reboot.h | 34 +++++++
tools/include/nolibc/sys/resource.h | 53 +++++++++++
tools/include/nolibc/sys/sysmacros.h | 20 ++++
tools/include/nolibc/sys/utsname.h | 42 +++++++++
tools/include/nolibc/types.h | 11 ---
tools/testing/selftests/nolibc/nolibc-test.c | 5 -
14 files changed, 291 insertions(+), 157 deletions(-)
---
base-commit: 6a25f787912a73613f12e7eefbebd72ee3d43f85
change-id: 20250515-nolibc-sys-31a4fd76d897
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Corrects a spelling mistake "memebers" instead of "members" in
tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
Signed-off-by: Hendrik Hamerlinck <hendrik.hamerlinck(a)hammernet.be>
---
Changes since v1:
Improved commit message to be consistent with other commit messages.
.../selftests/filesystems/mount-notify/mount-notify_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
index 59a71f22fb11..af2b61224a61 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
@@ -230,7 +230,7 @@ static void verify_mount_ids(struct __test_metadata *const _metadata,
}
}
}
- // Check that all list1 memebers can be found in list2. Together with
+ // Check that all list1 members can be found in list2. Together with
// the above it means that the list1 and list2 represent the same sets.
for (i = 0; i < num; i++) {
for (j = 0; j < num; j++) {
--
2.43.0
v2- fixed multiple trailing whitespace errors and
the Signed-off-by mismatch
The test file for the IR decoder used single-line comments
at the top to document its purpose and licensing,
which is inconsistent with the style used throughout the
Linux kernel.
In this patch i converted the file header to
a proper multi-line comment block
(/*) that aligns with standard kernel practices.
This improves readability, consistency across selftests,
and ensures the license and documentation are
clearly visible in a familiar format.
No functional changes have been made.
Signed-off-by: Abdelrahman Fekry <abdelrahmanfekry375(a)gmail.com>
---
tools/testing/selftests/ir/ir_loopback.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/ir/ir_loopback.c b/tools/testing/selftests/ir/ir_loopback.c
index f4a15cbdd5ea..c94faa975630 100644
--- a/tools/testing/selftests/ir/ir_loopback.c
+++ b/tools/testing/selftests/ir/ir_loopback.c
@@ -1,14 +1,17 @@
// SPDX-License-Identifier: GPL-2.0
-// test ir decoder
-//
-// Copyright (C) 2018 Sean Young <sean(a)mess.org>
-
-// When sending LIRC_MODE_SCANCODE, the IR will be encoded. rc-loopback
-// will send this IR to the receiver side, where we try to read the decoded
-// IR. Decoding happens in a separate kernel thread, so we will need to
-// wait until that is scheduled, hence we use poll to check for read
-// readiness.
-
+/* Copyright (C) 2018 Sean Young <sean(a)mess.org>
+ *
+ * Selftest for IR decoder
+ *
+ *
+ * When sending LIRC_MODE_SCANCODE, the IR will be encoded.
+ * rc-loopback will send this IR to the receiver side,
+ * where we try to read the decoded IR.
+ * Decoding happens in a separate kernel thread,
+ * so we will need to wait until that is scheduled,
+ * hence we use poll to check for read
+ * readiness.
+ */
#include <linux/lirc.h>
#include <errno.h>
#include <stdio.h>
--
2.25.1
This patch improves the clarity and grammar of output messages in the acct()
selftest. Minor changes were made to user-facing strings and comments to make
them easier to understand and more consistent with the kselftest style.
Changes include:
- Fixing grammar in printed messages and comments.
- Rewording error and success outputs for clarity and professionalism.
- Making the "root check" message more direct.
These updates improve readability without affecting test logic or behavior.
Signed-off-by: Abdelrahman Fekry <abdelrahmanfekry375(a)gmail.com>
---
tools/testing/selftests/acct/acct_syscall.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/acct/acct_syscall.c b/tools/testing/selftests/acct/acct_syscall.c
index 87c044fb9293..2c120a527574 100644
--- a/tools/testing/selftests/acct/acct_syscall.c
+++ b/tools/testing/selftests/acct/acct_syscall.c
@@ -22,9 +22,9 @@ int main(void)
ksft_print_header();
ksft_set_plan(1);
- // Check if test is run a root
+ // Check if test is run as root
if (geteuid()) {
- ksft_exit_skip("This test needs root to run!\n");
+ ksft_exit_skip("This test must be run as root!\n");
return 1;
}
@@ -52,7 +52,7 @@ int main(void)
child_pid = fork();
if (child_pid < 0) {
- ksft_test_result_error("Creating a child process to log failed\n");
+ ksft_test_result_error("Failed to create child process for logging\n");
acct(NULL);
return 1;
} else if (child_pid > 0) {
--
2.25.1
John, this revision introduces one more patch: "selftests/bpf: Introduce
verdict programs for sockmap_redir". I've kept you cross-series Acked-by. I
hope it's ok.
Jiayuan, I haven't heard back from you regarding [*], so I've kept things
unchanged for now. Please let me know what you think.
[*] https://lore.kernel.org/bpf/66bf942f-dfdb-4ce9-bd95-8b734e7afa53@rbox.co/
--
The idea behind this series is to comprehensively test the BPF redirection:
BPF_MAP_TYPE_SOCKMAP,
BPF_MAP_TYPE_SOCKHASH
x
sk_msg-to-egress,
sk_msg-to-ingress,
sk_skb-to-egress,
sk_skb-to-ingress
x
AF_INET, SOCK_STREAM,
AF_INET6, SOCK_STREAM,
AF_INET, SOCK_DGRAM,
AF_INET6, SOCK_DGRAM,
AF_UNIX, SOCK_STREAM,
AF_UNIX, SOCK_DGRAM,
AF_VSOCK, SOCK_STREAM,
AF_VSOCK, SOCK_SEQPACKET
New module is introduced, sockmap_redir: all supported and unsupported
redirect combinations are tested for success and failure respectively. Code
is pretty much stolen/adapted from Jakub Sitnicki's sockmap_redir_matrix.c
[1].
Usage:
$ cd tools/testing/selftests/bpf
$ make
$ sudo ./test_progs -t sockmap_redir
...
Summary: 1/576 PASSED, 0 SKIPPED, 0 FAILED
[1]: https://github.com/jsitnicki/sockmap-redir-matrix/blob/main/sockmap_redir_m…
Changes in v3:
- Drop unrelated changes; sockmap_listen, test_sockmap_listen, doc
- Collect tags [Jakub, John]
- Introduce BPF verdict programs especially for sockmap_redir [Jiayuan]
- Link to v2: https://lore.kernel.org/r/20250411-selftests-sockmap-redir-v2-0-5f9b018d670…
Changes in v2:
- Verify that the unsupported redirect combos do fail [Jakub]
- Dedup tests in sockmap_listen
- Cosmetic changes and code reordering
- Link to v1: https://lore.kernel.org/bpf/42939687-20f9-4a45-b7c2-342a0e11a014@rbox.co/
Suggested-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Signed-off-by: Michal Luczaj <mhal(a)rbox.co>
---
Michal Luczaj (8):
selftests/bpf: Support af_unix SOCK_DGRAM socket pair creation
selftests/bpf: Add socket_kind_to_str() to socket_helpers
selftests/bpf: Add u32()/u64() to sockmap_helpers
selftests/bpf: Introduce verdict programs for sockmap_redir
selftests/bpf: Add selftest for sockmap/hashmap redirection
selftests/bpf: sockmap_listen cleanup: Drop af_vsock redir tests
selftests/bpf: sockmap_listen cleanup: Drop af_unix redir tests
selftests/bpf: sockmap_listen cleanup: Drop af_inet SOCK_DGRAM redir tests
.../selftests/bpf/prog_tests/socket_helpers.h | 84 +++-
.../selftests/bpf/prog_tests/sockmap_helpers.h | 25 +-
.../selftests/bpf/prog_tests/sockmap_listen.c | 457 --------------------
.../selftests/bpf/prog_tests/sockmap_redir.c | 465 +++++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_redir.c | 68 +++
5 files changed, 623 insertions(+), 476 deletions(-)
---
base-commit: d0445d7dd3fd9b15af7564c38d7aa3cbc29778ee
change-id: 20240922-selftests-sockmap-redir-5d839396c75e
Best regards,
--
Michal Luczaj <mhal(a)rbox.co>
Improved the clarity and grammar in the header comment of nanosleep.c
for better readability and consistency with kernel documentation style.
Signed-off-by: Rahul Kumar <rk0006818(a)gmail.com>
---
tools/testing/selftests/timers/nanosleep.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/timers/nanosleep.c b/tools/testing/selftests/timers/nanosleep.c
index 252c6308c569..84adf8a4ab5d 100644
--- a/tools/testing/selftests/timers/nanosleep.c
+++ b/tools/testing/selftests/timers/nanosleep.c
@@ -1,12 +1,12 @@
-/* Make sure timers don't return early
- * by: john stultz (johnstul(a)us.ibm.com)
- * John Stultz (john.stultz(a)linaro.org)
- * (C) Copyright IBM 2012
- * (C) Copyright Linaro 2013 2015
- * Licensed under the GPLv2
+ /*
+ * Ensure timers do not return early.
+ * Author: John Stultz (john.stultz(a)linaro.org)
+ * Copyright (C) IBM 2012
+ * Copyright (C) Linaro 2013, 2015
+ * Licensed under the GPLv2
*
- * To build:
- * $ gcc nanosleep.c -o nanosleep -lrt
+ * To build:
+ * $ gcc nanosleep.c -o nanosleep -lrt
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -61,7 +61,7 @@ char *clockstring(int clockid)
case CLOCK_TAI:
return "CLOCK_TAI";
};
- return "UNKNOWN_CLOCKID";
+ return "UNKNOWN_CLOCKID"; // Could not identify clockid
}
/* returns 1 if a <= b, 0 otherwise */
@@ -90,7 +90,7 @@ int nanosleep_test(int clockid, long long ns)
{
struct timespec now, target, rel;
- /* First check abs time */
+ /* First, check absolute time using clock_nanosleep with TIMER_ABSTIME */
if (clock_gettime(clockid, &now))
return UNSUPPORTED;
target = timespec_add(now, ns);
@@ -102,7 +102,7 @@ int nanosleep_test(int clockid, long long ns)
if (!in_order(target, now))
return -1;
- /* Second check reltime */
+ /* Then, test relative time sleep */
clock_gettime(clockid, &now);
rel.tv_sec = 0;
rel.tv_nsec = 0;
--
2.43.0
This patch adds a new robust_list() syscall. The current syscall
can't be expanded to cover the following use case, so a new one is
needed. This new syscall allows users to set multiple robust lists per
process and to have either 32bit or 64bit pointers in the list.
* Use case
FEX-Emu[1] is an application that runs x86 and x86-64 binaries on an
AArch64 Linux host. One of the tasks of FEX-Emu is to translate syscalls
from one platform to another. Existing set_robust_list() can't be easily
translated because of two limitations:
1) x86 apps can have 32bit pointers robust lists. For a x86-64 kernel
this is not a problem, because of the compat entry point. But there's
no such compat entry point for AArch64, so the kernel would do the
pointer arithmetic wrongly. Is also unviable to userspace to keep
track every addition/removal to the robust list and keep a 64bit
version of it somewhere else to feed the kernel. Thus, the new
interface has an option of telling the kernel if the list is filled
with 32bit or 64bit pointers.
2) Apps can set just one robust list (in theory, x86-64 can set two if
they also use the compat entry point). That means that when a x86 app
asks FEX-Emu to call set_robust_list(), FEX have two options: to
overwrite their own robust list pointer and make the app robust, or
to ignore the app robust list and keep the emulator robust. The new
interface allows for multiple robust lists per application, solving
this.
* Interface
This is the proposed interface:
long set_robust_list2(void *head, int index, unsigned int flags)
`head` is the head of the userspace struct robust_list_head, just as old
set_robust_list(). It needs to be a void pointer since it can point to a normal
robust_list_head or a compat_robust_list_head.
`flags` can be used for defining the list type:
enum robust_list_type {
ROBUST_LIST_32BIT,
ROBUST_LIST_64BIT,
};
`index` is the index in the internal robust_list's linked list (the naming
starts to get confusing, I reckon). If `index == -1`, that means that user wants
to set a new robust_list, and the kernel will append it in the end of the list,
assign a new index and return this index to the user. If `index >= 0`, that
means that user wants to re-set `*head` of an already existing list (similarly
to what happens when you call set_robust_list() twice with different `*head`).
If `index` is out of range, or it points to a non-existing robust_list, or if
the internal list is full, an error is returned.
* Implementation
The implementation re-uses most of the existing robust list interface as
possible. The new task_struct member `struct list_head robust_list2` is just a
linked list where new lists are appended as the user requests more lists, and by
futex_cleanup(), the kernel walks through the internal list feeding
exit_robust_list() with the robust_list's.
This implementation supports up to 10 lists (defined at ROBUST_LISTS_PER_TASK),
but it was an arbitrary number for this RFC. For the described use case above, 4
should be enough, I'm not sure which should be the limit.
It doesn't support list removal (should it support?). It doesn't have a proper
get_robust_list2() yet as well, but I can add it in a next revision. We could
also have a generic robust_list() syscall that can be used to set/get and be
controlled by flags.
The new interface has a `unsigned int flags` argument, making it
extensible for future use cases as well.
It refuses unaligned `head` addresses. It doesn't have a limit for elements in a
single list (like ROBUST_LIST_LIMIT), it destroys the list as it is parsed to be
safe against circular lists.
* Testing
This patcheset has a selftest patch that expands this one:
https://lore.kernel.org/lkml/20250212131123.37431-1-andrealmeid@igalia.com/
Also, FEX-Emu added support for this interface to validate it:
https://github.com/FEX-Emu/FEX/pull/3966
Feedback is very welcomed!
Thanks,
André
[1] https://github.com/FEX-Emu/FEX
Changelog:
- Rebased on top of new futex work (private hash)
v4: https://lore.kernel.org/lkml/20250225183531.682556-1-andrealmeid@igalia.com/
- Refuse unaligned head pointers
- Ignore ROBUST_LIST_LIMIT for lists created with this interface and make it
robust against circular lists
- Fix a get_robust_list() syscall bug for getting the list from another thread
- Adapt selftest to use the new interface
v3: https://lore.kernel.org/lkml/20241217174958.477692-1-andrealmeid@igalia.com/
- Old syscall set_robust_list() adds new head to the internal linked list of
robust lists pointers, instead of having a field just for them. Remove
tsk->robust_list and use only tsk->robust_list2
v2: https://lore.kernel.org/lkml/20241101162147.284993-1-andrealmeid@igalia.com/
- Added a patch to properly deal with exit_robust_list() in 64bit vs 32bit
- Wired-up syscall for all archs
- Added more of the cover letter to the commit message
v1: https://lore.kernel.org/lkml/20241024145735.162090-1-andrealmeid@igalia.com/
---
André Almeida (7):
selftests/futex: Add ASSERT_ macros
selftests/futex: Create test for robust list
futex: Use explicit sizes for compat_exit_robust_list
futex: Create set_robust_list2
futex: Wire up set_robust_list2 syscall
futex: Remove the limit of elements for sys_set_robust_list2 lists
selftests: futex: Expand robust list test for the new interface
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/compat.h | 12 +-
include/linux/futex.h | 16 +-
include/linux/sched.h | 5 +-
include/uapi/asm-generic/unistd.h | 2 +
include/uapi/linux/futex.h | 24 +
kernel/futex/core.c | 165 ++++-
kernel/futex/futex.h | 5 +
kernel/futex/syscalls.c | 85 ++-
kernel/sys_ni.c | 1 +
scripts/syscall.tbl | 1 +
.../testing/selftests/futex/functional/.gitignore | 1 +
tools/testing/selftests/futex/functional/Makefile | 3 +-
.../selftests/futex/functional/robust_list.c | 706 +++++++++++++++++++++
tools/testing/selftests/futex/include/logging.h | 38 ++
29 files changed, 1026 insertions(+), 53 deletions(-)
---
base-commit: 3ee84e3dd88e39b55b534e17a7b9a181f1d46809
change-id: 20250225-tonyk-robust_futex-60adeedac695
Best regards,
--
André Almeida <andrealmeid(a)igalia.com>
There is a spelling mistake in a ksft_test_result message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/futex/functional/futex_priv_hash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/futex_priv_hash.c b/tools/testing/selftests/futex/functional/futex_priv_hash.c
index 2dca18fefedc..0213eb0bb4af 100644
--- a/tools/testing/selftests/futex/functional/futex_priv_hash.c
+++ b/tools/testing/selftests/futex/functional/futex_priv_hash.c
@@ -242,7 +242,7 @@ int main(int argc, char *argv[])
join_max_threads();
ret = futex_hash_slots_get();
- ksft_test_result(ret == 2, "No more auto-resize after manaul setting, got %d\n",
+ ksft_test_result(ret == 2, "No more auto-resize after manual setting, got %d\n",
ret);
futex_hash_slots_set_must_fail(1 << 29, 0);
--
2.49.0
The SBI Firmware Feature extension allows the S-mode to request some
specific features (either hardware or software) to be enabled. This
series uses this extension to request misaligned access exception
delegation to S-mode in order to let the kernel handle it. It also adds
support for the KVM FWFT SBI extension based on the misaligned access
handling infrastructure.
FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be
tested using the qemu provided at [2] which contains the series from
[3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct
delegation of misaligned exceptions. Upstream OpenSBI can be used.
Note: Since SBI V3.0 is not yet ratified, FWFT extension API is split
between interface only and implementation, allowing to pick only the
interface which do not have hard dependencies on SBI.
The tests can be run using the kselftest from series [4].
$ qemu-system-riscv64 \
-cpu rv64,trap-misaligned-access=true,v=true \
-M virt \
-m 1024M \
-bios fw_dynamic.bin \
-kernel Image
...
# ./misaligned
TAP version 13
1..23
# Starting 23 tests from 1 test cases.
# RUN global.gp_load_lh ...
# OK global.gp_load_lh
ok 1 global.gp_load_lh
# RUN global.gp_load_lhu ...
# OK global.gp_load_lhu
ok 2 global.gp_load_lhu
# RUN global.gp_load_lw ...
# OK global.gp_load_lw
ok 3 global.gp_load_lw
# RUN global.gp_load_lwu ...
# OK global.gp_load_lwu
ok 4 global.gp_load_lwu
# RUN global.gp_load_ld ...
# OK global.gp_load_ld
ok 5 global.gp_load_ld
# RUN global.gp_load_c_lw ...
# OK global.gp_load_c_lw
ok 6 global.gp_load_c_lw
# RUN global.gp_load_c_ld ...
# OK global.gp_load_c_ld
ok 7 global.gp_load_c_ld
# RUN global.gp_load_c_ldsp ...
# OK global.gp_load_c_ldsp
ok 8 global.gp_load_c_ldsp
# RUN global.gp_load_sh ...
# OK global.gp_load_sh
ok 9 global.gp_load_sh
# RUN global.gp_load_sw ...
# OK global.gp_load_sw
ok 10 global.gp_load_sw
# RUN global.gp_load_sd ...
# OK global.gp_load_sd
ok 11 global.gp_load_sd
# RUN global.gp_load_c_sw ...
# OK global.gp_load_c_sw
ok 12 global.gp_load_c_sw
# RUN global.gp_load_c_sd ...
# OK global.gp_load_c_sd
ok 13 global.gp_load_c_sd
# RUN global.gp_load_c_sdsp ...
# OK global.gp_load_c_sdsp
ok 14 global.gp_load_c_sdsp
# RUN global.fpu_load_flw ...
# OK global.fpu_load_flw
ok 15 global.fpu_load_flw
# RUN global.fpu_load_fld ...
# OK global.fpu_load_fld
ok 16 global.fpu_load_fld
# RUN global.fpu_load_c_fld ...
# OK global.fpu_load_c_fld
ok 17 global.fpu_load_c_fld
# RUN global.fpu_load_c_fldsp ...
# OK global.fpu_load_c_fldsp
ok 18 global.fpu_load_c_fldsp
# RUN global.fpu_store_fsw ...
# OK global.fpu_store_fsw
ok 19 global.fpu_store_fsw
# RUN global.fpu_store_fsd ...
# OK global.fpu_store_fsd
ok 20 global.fpu_store_fsd
# RUN global.fpu_store_c_fsd ...
# OK global.fpu_store_c_fsd
ok 21 global.fpu_store_c_fsd
# RUN global.fpu_store_c_fsdsp ...
# OK global.fpu_store_c_fsdsp
ok 22 global.fpu_store_c_fsdsp
# RUN global.gen_sigbus ...
[12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000]
[12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51
[12797.989169] Hardware name: riscv-virtio,qemu (DT)
[12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100
[12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008
[12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160
[12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002
[12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef
[12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000
[12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848
[12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200
[12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000
[12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0
[12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073
[12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006
[12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001
# OK global.gen_sigbus
ok 23 global.gen_sigbus
# PASSED: 23 / 23 tests passed.
# Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0
With kvm-tools:
# lkvm run -k sbi.flat -m 128
Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97
Info: Removed ghost socket file "/root/.lkvm//guest-97.sock".
##########################################################################
# kvm-unit-tests
##########################################################################
... [test messages elided]
PASS: sbi: fwft: FWFT extension probing no error
PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1
PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor
SUMMARY: 50 tests, 2 unexpected failures, 12 skipped
This series is available at [5].
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/… [1]
Link: https://github.com/rivosinc/qemu/tree/dev/cleger/misaligned [2]
Link: https://lore.kernel.org/all/20241211211933.198792-3-fkonrad@amd.com/T/ [3]
Link: https://lore.kernel.org/linux-riscv/20250414123543.1615478-1-cleger@rivosin… [4]
Link: https://github.com/rivosinc/linux/tree/dev/cleger/fwft [5]
---
V7:
- Fix ifdefery build problems
- Move sbi_fwft_is_supported with fwft_set_req struct
- Added Atish Reviewed-by
- Updated KVM vcpu cfg hedeleg value in set_delegation
- Changed SBI ETIME error mapping to ETIMEDOUT
- Fixed a few typo reported by Alok
V6:
- Rename FWFT interface to remove "_local"
- Fix test for MEDELEG values in KVM FWFT support
- Add __init for unaligned_access_init()
- Rebased on master
V5:
- Return ERANGE as mapping for SBI_ERR_BAD_RANGE
- Removed unused sbi_fwft_get()
- Fix kernel for sbi_fwft_local_set_cpumask()
- Fix indentation for sbi_fwft_local_set()
- Remove spurious space in kvm_sbi_fwft_ops.
- Rebased on origin/master
- Remove fixes commits and sent them as a separate series [4]
V4:
- Check SBI version 3.0 instead of 2.0 for FWFT presence
- Use long for kvm_sbi_fwft operation return value
- Init KVM sbi extension even if default_disabled
- Remove revert_on_fail parameter for sbi_fwft_feature_set().
- Fix comments for sbi_fwft_set/get()
- Only handle local features (there are no globals yet in the spec)
- Add new SBI errors to sbi_err_map_linux_errno()
V3:
- Added comment about kvm sbi fwft supported/set/get callback
requirements
- Move struct kvm_sbi_fwft_feature in kvm_sbi_fwft.c
- Add a FWFT interface
V2:
- Added Kselftest for misaligned testing
- Added get_user() usage instead of __get_user()
- Reenable interrupt when possible in misaligned access handling
- Document that riscv supports unaligned-traps
- Fix KVM extension state when an init function is present
- Rework SBI misaligned accesses trap delegation code
- Added support for CPU hotplugging
- Added KVM SBI reset callback
- Added reset for KVM SBI FWFT lock
- Return SBI_ERR_DENIED_LOCKED when LOCK flag is set
Clément Léger (14):
riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions
riscv: sbi: remove useless parenthesis
riscv: sbi: add new SBI error mappings
riscv: sbi: add FWFT extension interface
riscv: sbi: add SBI FWFT extension calls
riscv: misaligned: request misaligned exception from SBI
riscv: misaligned: use on_each_cpu() for scalar misaligned access
probing
riscv: misaligned: use correct CONFIG_ ifdef for
misaligned_access_speed
riscv: misaligned: move emulated access uniformity check in a function
riscv: misaligned: add a function to check misalign trap delegability
RISC-V: KVM: add SBI extension init()/deinit() functions
RISC-V: KVM: add SBI extension reset callback
RISC-V: KVM: add support for FWFT SBI extension
RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEG
arch/riscv/include/asm/cpufeature.h | 12 +-
arch/riscv/include/asm/kvm_host.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_sbi.h | 12 +
arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 29 +++
arch/riscv/include/asm/sbi.h | 60 +++++
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/sbi.c | 81 ++++++-
arch/riscv/kernel/traps_misaligned.c | 112 ++++++++-
arch/riscv/kernel/unaligned_access_speed.c | 8 +-
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu.c | 4 +-
arch/riscv/kvm/vcpu_sbi.c | 54 +++++
arch/riscv/kvm/vcpu_sbi_fwft.c | 257 +++++++++++++++++++++
arch/riscv/kvm/vcpu_sbi_sta.c | 3 +-
14 files changed, 620 insertions(+), 19 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h
create mode 100644 arch/riscv/kvm/vcpu_sbi_fwft.c
--
2.49.0
This patch set convert the wireguard selftest to nftables, as iptables is
deparated and nftables is the default framework of most releases.
v6: fix typo in patch 1/2. Update the description (Phil Sutter)
v5: remove the counter in nft rules and link nft statically (Jason A. Donenfeld)
v4: no update, just re-send
v3: drop iptables directly (Jason A. Donenfeld)
Also convert to using nft for qemu testing (Jason A. Donenfeld)
v2: use one nft table for testing (Phil Sutter)
Hangbin Liu (2):
wireguard: selftests: convert iptables to nft
wireguard: selftests: update to using nft for qemu test
tools/testing/selftests/wireguard/netns.sh | 29 +++++++++------
.../testing/selftests/wireguard/qemu/Makefile | 36 ++++++++++++++-----
.../selftests/wireguard/qemu/kernel.config | 7 ++--
3 files changed, 49 insertions(+), 23 deletions(-)
--
2.46.0