From: Geliang Tang <tanggeliang(a)kylinos.cn>
v4:
- add more patches using make_sockaddr and get_socket_local_port
helpers.
v3:
- address comments of Martin and Eduard in v2. (thanks)
- move "int type" to the first argument of start_server_addr and
connect_to_addr.
- add start_server_addr_opts.
- using "sockaddr_storage" instead of "sockaddr".
- move start_server_setsockopt patches out of this series.
v2:
- update patch 6 only, fix errors reported by CI.
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Update arguments of connect_to_addr
selftests/bpf: Add start_server_addr* helpers
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use get_socket_local_port in sk_assign
selftests/bpf: Use make_sockaddr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Use make_sockaddr in test_sock_addr
selftests/bpf: Use make_sockaddr in test_sock
selftests/bpf: Use make_sockaddr in ip_check_defrag
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 37 ++++-
tools/testing/selftests/bpf/network_helpers.h | 6 +-
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +----
.../selftests/bpf/prog_tests/empty_skb.c | 2 +
.../bpf/prog_tests/ip_check_defrag.c | 20 +--
.../selftests/bpf/prog_tests/sk_assign.c | 92 ++----------
.../selftests/bpf/prog_tests/sock_addr.c | 6 +-
.../selftests/bpf/prog_tests/tc_redirect.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 4 +
.../selftests/bpf/prog_tests/xdp_metadata.c | 16 +++
tools/testing/selftests/bpf/test_sock.c | 28 +---
tools/testing/selftests/bpf/test_sock_addr.c | 136 +++---------------
13 files changed, 104 insertions(+), 287 deletions(-)
--
2.40.1
This patch series ended up much larger than expected, please bear with
me! The goal here is to support vendor extensions, starting at probing
the device tree and ending with reporting to userspace.
The main design objective was to allow vendors to operate independently
of each other. This has been achieved by delegating vendor extensions to
a new struct "hart_isa_vendor" which is a counterpart to "hart_isa".
Each vendor will have their own list of extensions they support. Each
vendor will have a "namespace" to themselves which is set at the key
values of 0x8000 - 0x8080. It is up to the vendor's disgression how they
wish to allocate keys in the range for their vendor extensions.
Reporting to userspace follows a similar story, leveraging the hwprobe
syscall. There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_0 that
is used to request supported vendor extensions. The vendor extension
keys are disambiguated by the vendor associated with the cpumask passed
into hwprobe. The entire 64-bit key space is available to each vendor.
On to the xtheadvector specific code. xtheadvector is a custom extension
that is based upon riscv vector version 0.7.1 [1]. All of the vector
routines have been modified to support this alternative vector version
based upon whether xtheadvector was determined to be supported at boot.
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board on 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [2] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [3] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
To test the integration, I used the riscv vector kselftests. I modified
the test cases to be able to more easily extend them, and then added a
xtheadvector target that works by calling hwprobe and swapping out the
vector asm if needed.
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
[2] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[3] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Added commit hash to xtheadvector
- Simplified riscv,isa vector removal fix to not mess with the DT
riscv,vendorid
- Moved riscv,vendorid parsing into a different patch and cache the
value to be used by alternative patching
- Reduce riscv,vendorid missing severity to "info"
- Separate vendor extension list to vendor files
- xtheadvector no longer puts v in the elf_hwcap
- Only patch vendor extension if all harts are associated with the same
vendor. This is the best chance the kernel has for working properly if
there are multiple vendors.
- Split hwprobe vendor keys out into vendor file
- Add attribution for Heiko's patches
- Link to v1: https://lore.kernel.org/r/20240411-dev-charlie-support_thead_vector_6_9-v1-…
---
Charlie Jenkins (16):
riscv: cpufeature: Fix thead vector hwcap removal
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: riscv: Add vendorid
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Fix extension subset checking
riscv: Extend cpufeature.c to detect vendor extensions
riscv: Introduce vendor variants of extension helpers
riscv: drivers: Convert xandespmu to use the vendor extension framework
riscv: uaccess: Add alternative for xtheadvector uaccess
riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
riscv: Create xtheadvector file
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add vendor extension probing
riscv: hwprobe: Document vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 12 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 5 +
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig | 2 +
arch/riscv/Kconfig.vendor | 11 +
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/errata/sifive/errata.c | 2 +
arch/riscv/errata/thead/errata.c | 2 +
arch/riscv/include/asm/cpufeature.h | 170 +++++++++---
arch/riscv/include/asm/csr.h | 13 +
arch/riscv/include/asm/hwcap.h | 27 +-
arch/riscv/include/asm/hwprobe.h | 7 +-
arch/riscv/include/asm/sbi.h | 2 +
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 246 +++++++++++++----
arch/riscv/include/asm/vendor_extensions.h | 18 ++
arch/riscv/include/asm/xtheadvector.h | 25 ++
arch/riscv/include/uapi/asm/hwprobe.h | 11 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/cpu.c | 36 ++-
arch/riscv/kernel/cpufeature.c | 204 ++++++++++----
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 54 +++-
arch/riscv/kernel/vector.c | 35 ++-
arch/riscv/kernel/vendor_extensions.c | 36 +++
arch/riscv/kernel/vendor_extensions/Makefile | 4 +
.../kernel/vendor_extensions/andes_extensions.c | 13 +
.../kernel/vendor_extensions/thead_extensions.c | 13 +
arch/riscv/lib/uaccess.S | 2 +
drivers/perf/riscv_pmu_sbi.c | 7 +-
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 74 ++++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
42 files changed, 1226 insertions(+), 368 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240411-dev-charlie-support_thead_vector_6_9-1591fc2a431d
--
- Charlie
This patch series ended up much larger than expected, please bear with
me! The goal here is to support vendor extensions, starting at probing
the device tree and ending with reporting to userspace.
The main design objective was to allow vendors to operate independently
of each other. This has been achieved by delegating vendor extensions to
a new struct "hart_isa_vendor" which is a counterpart to "hart_isa".
Each vendor will have their own list of extensions they support. Each
vendor will have a "namespace" to themselves which is set at the key
values of 0x8000 - 0x8080. It is up to the vendor's disgression how they
wish to allocate keys in the range for their vendor extensions.
Reporting to userspace follows a similar story, leveraging the hwprobe
syscall. There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_0 that
is used to request supported vendor extensions. The vendor extension
keys are disambiguated by the vendor associated with the cpumask passed
into hwprobe. The entire 64-bit key space is available to each vendor.
On to the xtheadvector specific code. xtheadvector is a custom extension
that is based upon riscv vector version 0.7.1 [1]. All of the vector
routines have been modified to support this alternative vector version
based upon whether xtheadvector was determined to be supported at boot.
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board on 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [2] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [3] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
To test the integration, I used the riscv vector kselftests. I modified
the test cases to be able to more easily extend them, and then added a
xtheadvector target that works by calling hwprobe and swapping out the
vector asm if needed.
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
[2] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[3] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Charlie Jenkins (18):
dt-bindings: riscv: Add vendorid and archid
riscv: cpufeature: Fix thead vector hwcap removal
dt-bindings: riscv: Add xtheadvector ISA extension description
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Fix extension subset checking
riscv: Extend cpufeature.c to detect vendor extensions
riscv: Optimize riscv_cpu_isa_extension_(un)likely()
riscv: Introduce vendor variants of extension helpers
riscv: uaccess: Add alternative for xtheadvector uaccess
riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
riscv: Create xtheadvector file
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Disambiguate vector and xtheadvector in hwprobe
riscv: hwcap: Add v to hwcap if xtheadvector enabled
riscv: hwprobe: Add vendor extension probing
riscv: hwprobe: Document vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 12 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 11 +
.../devicetree/bindings/riscv/extensions.yaml | 9 +
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 4 +-
arch/riscv/include/asm/cpufeature.h | 143 +++++++---
arch/riscv/include/asm/csr.h | 13 +
arch/riscv/include/asm/hwcap.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 4 +-
arch/riscv/include/asm/sbi.h | 2 +
arch/riscv/include/asm/vector.h | 228 ++++++++++++----
arch/riscv/include/asm/xtheadvector.h | 25 ++
arch/riscv/include/uapi/asm/hwprobe.h | 10 +-
arch/riscv/kernel/cpu.c | 20 ++
arch/riscv/kernel/cpufeature.c | 264 +++++++++++++++---
arch/riscv/kernel/kernel_mode_vector.c | 4 +-
arch/riscv/kernel/sys_hwprobe.c | 59 ++++-
arch/riscv/kernel/vector.c | 22 +-
arch/riscv/lib/uaccess.S | 1 +
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 66 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
27 files changed, 1114 insertions(+), 331 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240411-dev-charlie-support_thead_vector_6_9-1591fc2a431d
--
- Charlie
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zcmop, Zca, Zcf, Zcd and Zcb extensions isa string parsing,
hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have been left
out since they target microcontrollers/embedded CPUs and are not needed
by RVA23U64
This series is based on the Zimop one [1].
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Clément Léger (10):
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 24 ++++++++++++
.../devicetree/bindings/riscv/extensions.yaml | 37 +++++++++++++++++++
arch/riscv/include/asm/hwcap.h | 5 +++
arch/riscv/include/uapi/asm/hwprobe.h | 5 +++
arch/riscv/include/uapi/asm/kvm.h | 5 +++
arch/riscv/kernel/cpufeature.c | 5 +++
arch/riscv/kernel/sys_hwprobe.c | 5 +++
arch/riscv/kvm/vcpu_onereg.c | 10 +++++
.../selftests/kvm/riscv/get-reg-list.c | 20 ++++++++++
9 files changed, 116 insertions(+)
--
2.43.0
Fix the error reported by clang:
In file included from mdwe_test.c:17:
./../kselftest_harness.h:1169:2: error: call to undeclared function
'asprintf'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
1169 | asprintf(&test_name, "%s%s%s.%s", f->name,
| ^
1 warning generated.
The gcc reports it as warning:
In file included from mdwe_test.c:17:
../kselftest_harness.h: In function ‘__run_test’:
../kselftest_harness.h:1169:9: warning: implicit declaration of function
‘asprintf’; did you mean ‘vsprintf’? [-Wimplicit-function-declaration]
1169 | asprintf(&test_name, "%s%s%s.%s", f->name,
| ^~~~~~~~
| vsprintf
Fix this by setting _GNU_SOURCE macro needed to get exposure to the
asprintf().
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/mdwe_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index 200bedcdc32e9..1e01d3ddc11c5 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -7,6 +7,7 @@
#include <linux/mman.h>
#include <linux/prctl.h>
+#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/auxv.h>
--
2.39.2
If libcpupower is not properly installed somehow, the cpupower tool
cannot work, and cpu min_freq and max_freq are not correctly assigned,
but the code can still keep going and gives an "ok" result at last,
which seems not to be the expected behavior of this test.
tools/testing/selftests/intel_pstate# make run_tests
TAP version 13
1..1
# selftests: intel_pstate: run.sh
# cpupower: error while loading shared libraries: libcpupower.so.1: cannot open shared object file: No such file or directory
# ./run.sh: line 90: / 1000: syntax error: operand expected (error token is "/ 1000")
# cpupower: error while loading shared libraries: libcpupower.so.1: cannot open shared object file: No such file or directory
# ./run.sh: line 92: / 1000: syntax error: operand expected (error token is "/ 1000")
# ========================================================================
# The marketing frequency of the cpu is 3400 MHz
# The maximum frequency of the cpu is MHz
# The minimum frequency of the cpu is MHz
# Target Actual Difference MSR(0x199) max_perf_pct
ok 1 selftests: intel_pstate: run.sh
Fix this by adding null checks as well as [ $var -eq $var ] checks to
confirm that both min_freq and max_freq are valid integers. The fixed
result will have a "# SKIP" suffix and looks like:
tools/testing/selftests/intel_pstate# make run_tests
TAP version 13
1..1
# selftests: intel_pstate: run.sh
# cpupower: error while loading shared libraries: libcpupower.so.1: cannot open shared object file: No such file or directory
# ./run.sh: line 90: / 1000: syntax error: operand expected (error token is "/ 1000")
# cpupower: error while loading shared libraries: libcpupower.so.1: cannot open shared object file: No such file or directory
# ./run.sh: line 92: / 1000: syntax error: operand expected (error token is "/ 1000")
# Cannot get cpu frequency info
ok 1 selftests: intel_pstate: run.sh # SKIP
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Yujie Liu <yujie.liu(a)intel.com>
---
tools/testing/selftests/intel_pstate/run.sh | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/intel_pstate/run.sh b/tools/testing/selftests/intel_pstate/run.sh
index e7008f614ad7..e81758cd1fb5 100755
--- a/tools/testing/selftests/intel_pstate/run.sh
+++ b/tools/testing/selftests/intel_pstate/run.sh
@@ -91,6 +91,11 @@ min_freq=$(($_min_freq / 1000))
_max_freq=$(cpupower frequency-info -l | tail -1 | awk ' { print $2 } ')
max_freq=$(($_max_freq / 1000))
+{ [ $min_freq ] && [ $min_freq -eq $min_freq ] &&
+ [ $max_freq ] && [ $max_freq -eq $max_freq ]; } || {
+ echo "Cannot get cpu frequency info"
+ exit $ksft_skip
+}
[ $EVALUATE_ONLY -eq 0 ] && for freq in `seq $max_freq -100 $min_freq`
do
--
2.34.1
From: Jeff Xu <jeffxu(a)chromium.org>
This patch is a follow up to the comments [1] on test code during
mseal discussion. This is style only change to the selftest code, not to
test code logic.
Please apply on top of mseal v10 patch [2]
[1]
https://lore.kernel.org/all/e1744539-a843-468a-9101-ce7a08669394@collabora.…
[2]
https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/
Thanks
Jeff Xu (1):
selftest mm/mseal: style change
tools/testing/selftests/mm/mseal_test.c | 124 +++++++++++++++++-------
tools/testing/selftests/mm/seal_elf.c | 3 -
2 files changed, 91 insertions(+), 36 deletions(-)
--
2.44.0.683.g7961c838ac-goog
This series introduces the selftests/arm directory, which tests 32 and 64-bit
kernel compatibility with 32-bit ELFs running on the Aarch platform.
The need for this bucket of tests is that 32 bit applications built on legacy
ARM architecture must not break on the new Aarch64 platforms and the 64-bit
kernel. The kernel must emulate the data structures, system calls and the
registers according to Aarch32, when running a 32-bit process; this directory
fills that testing requirement.
One may find similarity between this directory and selftests/arm64; it is
advisable to refer to that since a lot has been copied from there itself.
The mm directory includes a test for checking 4GB limit of the virtual
address space of a process.
The signal directory contains two tests, following a common theme: mangle
with arm_cpsr, dumped by the kernel to user space while invoking the signal
handler; kernel must spot this illegal attempt and terminate the program by
SEGV.
The elf directory includes a test for checking the 32-bit status of the ELF.
The series has been tested on 6.9.0-rc2, on Aarch64 platform. Testing remains
to be done on Aaarch32.
Dev Jain (4):
selftests/arm: Add mm test
selftests/arm: Add signal tests
selftests/arm: Add elf test
selftests: Add build infrastructure along with README
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/arm/Makefile | 57 ++++
tools/testing/selftests/arm/README | 31 +++
tools/testing/selftests/arm/elf/Makefile | 6 +
tools/testing/selftests/arm/elf/parse_elf.c | 75 +++++
tools/testing/selftests/arm/mm/Makefile | 6 +
tools/testing/selftests/arm/mm/compat_va.c | 94 +++++++
tools/testing/selftests/arm/signal/Makefile | 30 ++
.../selftests/arm/signal/test_signals.c | 27 ++
.../selftests/arm/signal/test_signals.h | 74 +++++
.../selftests/arm/signal/test_signals_utils.c | 257 ++++++++++++++++++
.../selftests/arm/signal/test_signals_utils.h | 128 +++++++++
.../signal/testcases/mangle_cpsr_aif_bits.c | 33 +++
.../mangle_cpsr_invalid_compat_toggle.c | 29 ++
14 files changed, 848 insertions(+)
create mode 100644 tools/testing/selftests/arm/Makefile
create mode 100644 tools/testing/selftests/arm/README
create mode 100644 tools/testing/selftests/arm/elf/Makefile
create mode 100644 tools/testing/selftests/arm/elf/parse_elf.c
create mode 100644 tools/testing/selftests/arm/mm/Makefile
create mode 100644 tools/testing/selftests/arm/mm/compat_va.c
create mode 100644 tools/testing/selftests/arm/signal/Makefile
create mode 100644 tools/testing/selftests/arm/signal/test_signals.c
create mode 100644 tools/testing/selftests/arm/signal/test_signals.h
create mode 100644 tools/testing/selftests/arm/signal/test_signals_utils.c
create mode 100644 tools/testing/selftests/arm/signal/test_signals_utils.h
create mode 100644 tools/testing/selftests/arm/signal/testcases/mangle_cpsr_aif_bits.c
create mode 100644 tools/testing/selftests/arm/signal/testcases/mangle_cpsr_invalid_compat_toggle.c
--
2.39.2
Hi!
Implement support for tests which require access to a remote system /
endpoint which can generate traffic.
This series concludes the "groundwork" for upstream driver tests.
I wanted to support the three models which came up in discussions:
- SW testing with netdevsim
- "local" testing with two ports on the same system in a loopback
- "remote" testing via SSH
so there is a tiny bit of an abstraction which wraps up how "remote"
commands are executed. Otherwise hopefully there's nothing surprising.
I'm only adding a ping test. I had a bigger one written but I was
worried we'll get into discussing the details of the test itself
and how I chose to hack up netdevsim, instead of the test infra...
So that test will be a follow up :)
v2:
- rename endpoint -> remote
- use 2001:db8:: v6 prefix
- add a note about persistent SSH connections
- add the kernel config
v1: https://lore.kernel.org/all/20240412233705.1066444-1-kuba@kernel.org
Jakub Kicinski (6):
selftests: drv-net: add stdout to the command failed exception
selftests: drv-net: add config for netdevsim
selftests: drv-net: define endpoint structures
selftests: drv-net: factor out parsing of the env
selftests: drv-net: construct environment for running tests which
require an endpoint
selftests: drv-net: add a trivial ping test
tools/testing/selftests/drivers/net/Makefile | 5 +-
.../testing/selftests/drivers/net/README.rst | 33 ++++
tools/testing/selftests/drivers/net/config | 2 +
.../selftests/drivers/net/lib/py/__init__.py | 1 +
.../selftests/drivers/net/lib/py/env.py | 141 +++++++++++++++---
.../selftests/drivers/net/lib/py/remote.py | 13 ++
.../drivers/net/lib/py/remote_netns.py | 15 ++
.../drivers/net/lib/py/remote_ssh.py | 34 +++++
tools/testing/selftests/drivers/net/ping.py | 32 ++++
.../testing/selftests/net/lib/py/__init__.py | 1 +
tools/testing/selftests/net/lib/py/netns.py | 31 ++++
tools/testing/selftests/net/lib/py/utils.py | 22 +--
12 files changed, 301 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/config
create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_netns.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/remote_ssh.py
create mode 100755 tools/testing/selftests/drivers/net/ping.py
create mode 100644 tools/testing/selftests/net/lib/py/netns.py
--
2.44.0
Let the compilers (clang) know that this function would just call
exit() and would never return. It is needed to avoid false positive
static analysis errors. All similar functions calling exit()
unconditionally have been marked as __noreturn.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
This patch has been suggested [1] and tested on top of the following
patches:
- f07041728422 ("selftests: add ksft_exit_fail_perror()") which is
in kselftest tree already
- ("kselftest: Mark functions that unconditionally call exit() as
__noreturn") would appear in tip/timers/urgent
[1] https://lore.kernel.org/all/8254ab4d-9cb6-402e-80dd-d9ec70d77de5@linuxfound…
---
tools/testing/selftests/kselftest.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 050c5fd018400..b43a7a7ca4b40 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -372,7 +372,7 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
exit(KSFT_FAIL);
}
-static inline void ksft_exit_fail_perror(const char *msg)
+static inline __noreturn void ksft_exit_fail_perror(const char *msg)
{
#ifndef NOLIBC
ksft_exit_fail_msg("%s: %s (%d)\n", msg, strerror(errno), errno);
--
2.39.2
Currently, the migration worker delays 1-10 us, assuming that one
KVM_RUN iteration only takes a few microseconds. But if C-state exit
latencies are large enough, for example, hundreds or even thousands
of microseconds on server CPUs, it may happen that it's not able to
bring the target CPU out of C-state before the migration worker starts
to migrate it to the next CPU.
If the system workload is light, most CPUs could be at a certain level
of C-state, which may result in less successful migrations and fail the
migration/KVM_RUN ratio sanity check.
This patch adds a command line option to skip the sanity check in
this case.
Additionally, seems it's reasonable to randomize the length of usleep(),
other than delay in a fixed pattern.
V2:
- removed the busy loop implementation
- add the new "-s" option
Signed-off-by: Zide Chen <zide.chen(a)intel.com>
---
tools/testing/selftests/kvm/rseq_test.c | 37 +++++++++++++++++++++++--
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 28f97fb52044..515cfa32a925 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -150,7 +150,7 @@ static void *migration_worker(void *__rseq_tid)
* Use usleep() for simplicity and to avoid unnecessary kernel
* dependencies.
*/
- usleep((i % 10) + 1);
+ usleep((rand() % 10) + 1);
}
done = true;
return NULL;
@@ -186,12 +186,35 @@ static void calc_min_max_cpu(void)
"Only one usable CPU, task migration not possible");
}
+static void usage(const char *name)
+{
+ puts("");
+ printf("usage: %s [-h] [-s]\n", name);
+ printf(" -s: skip the sanity check for successful KVM_RUN.\n");
+ puts("");
+ exit(0);
+}
+
int main(int argc, char *argv[])
{
int r, i, snapshot;
struct kvm_vm *vm;
struct kvm_vcpu *vcpu;
u32 cpu, rseq_cpu;
+ bool skip_sanity_check = false;
+ int opt;
+
+ while ((opt = getopt(argc, argv, "sh")) != -1) {
+ switch (opt) {
+ case 's':
+ skip_sanity_check = true;
+ break;
+ case 'h':
+ default:
+ usage(argv[0]);
+ break;
+ }
+ }
r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
@@ -199,6 +222,8 @@ int main(int argc, char *argv[])
calc_min_max_cpu();
+ srand(time(NULL));
+
r = rseq_register_current_thread();
TEST_ASSERT(!r, "rseq_register_current_thread failed, errno = %d (%s)",
errno, strerror(errno));
@@ -254,9 +279,15 @@ int main(int argc, char *argv[])
* getcpu() to stabilize. A 2:1 migration:KVM_RUN ratio is a fairly
* conservative ratio on x86-64, which can do _more_ KVM_RUNs than
* migrations given the 1us+ delay in the migration task.
+ *
+ * Another reason why it may have small migration:KVM_RUN ratio is that,
+ * on systems with large C-state exit latency, it may happen quite often
+ * that the scheduler is not able to wake up the target CPU before the
+ * vCPU thread is scheduled to another CPU.
*/
- TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
- "Only performed %d KVM_RUNs, task stalled too much?", i);
+ TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
+ "Only performed %d KVM_RUNs, task stalled too much? "
+ "Try to turn off C-states or run it with the -s option", i);
pthread_join(migration_thread, NULL);
--
2.34.1
This is a loose follow-up to the Kernel CI patchset posted recently. It
contains various fixes that were supposed to be part of said patchset, but
didn't fit due to its size. The latter 4 patches were written independently
of the CI effort, but again didn't fit in their intended patchsets.
- Patch #1 unifies code of two very similar looking functions, busywait()
and slowwait().
- Patch #2 adds sanity checks around the setting of NETIFS, which carries
list of interfaces to run on.
- Patch #3 changes bail_on_lldpad() to SKIP instead of FAILing.
- Patches #4 to #7 fix issues in selftests.
- Patches #8 to #10 add topology diagrams to several selftests.
This should have been part of the mlxsw leg of NH group stats patches,
but again, it did not fit in due to size.
Danielle Ratson (1):
selftests: mlxsw: ethtool_lanes: Wait for lanes parameter dump
explicitly
Petr Machata (9):
selftests: net: Unify code of busywait() and slowwait()
selftests: forwarding: lib.sh: Validate NETIFS
selftests: forwarding: bail_on_lldpad() should SKIP
selftests: drivers: hw: Fix ethtool_rmon
selftests: drivers: hw: ethtool.sh: Adjust output
selftests: drivers: hw: Include tc_common.sh in hw_stats_l3
selftests: forwarding: router_mpath_nh: Add a diagram
selftests: forwarding: router_mpath_nh_res: Add a diagram
selftests: forwarding: router_nh: Add a diagram
.../selftests/drivers/net/hw/ethtool.sh | 15 +++---
.../selftests/drivers/net/hw/ethtool_rmon.sh | 1 +
.../selftests/drivers/net/hw/hw_stats_l3.sh | 1 +
.../drivers/net/hw/hw_stats_l3_gre.sh | 1 +
.../drivers/net/mlxsw/ethtool_lanes.sh | 14 +++---
tools/testing/selftests/net/forwarding/lib.sh | 49 +++++++++----------
.../net/forwarding/router_mpath_nh.sh | 35 +++++++++++++
.../net/forwarding/router_mpath_nh_res.sh | 35 +++++++++++++
.../selftests/net/forwarding/router_nh.sh | 14 ++++++
tools/testing/selftests/net/lib.sh | 16 ++++--
10 files changed, 137 insertions(+), 44 deletions(-)
--
2.43.0
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
PATCH 2,3,14 : FW_READ_HI function implementation
PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
PATCH 16-17: Generic improvements for kvm selftests
PATCH 18-22: KVM selftests for SBI PMU extension
The series is based on v6.9-rc1 and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v6
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v5->v6:
1. Added a patch for command line option for the sbi pmu tests.
2. Removed redundant prints and restructure the code little bit.
3. Added a patch for computing the sbi minor version correctly.
4. Addressed all other comments on v5.
Changes from v4->v5:
1. Moved sbi related definitions to its own header file from processor.h
2. Added few helper functions for selftests.
3. Improved firmware counter read and RV32 start/stop functions.
4. Converted all the shifting operations to use BIT macro
5. Addressed all other comments on v4.
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (24):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
drivers/perf: riscv: Use BIT macro for shifting operations
RISC-V: Add SBI PMU snapshot definitions
RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
RISC-V: Use the minor version mask while computing sbi version
drivers/perf: riscv: Implement SBI PMU snapshot function
drivers/perf: riscv: Fix counter mask iteration for RV32
RISC-V: KVM: Fix the initial sample period value
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
RISC-V: KVM: Improve firmware counter read function
KVM: riscv: selftests: Move sbi definitions to its own header file
KVM: riscv: selftests: Add helper functions for extension checks
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
KVM: riscv: selftests: Add commandline option for SBI PMU test
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
arch/riscv/include/asm/sbi.h | 38 +-
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/paravirt.c | 6 +-
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 15 +-
arch/riscv/kvm/vcpu_onereg.c | 6 +
arch/riscv/kvm/vcpu_pmu.c | 260 ++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 272 ++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 49 +-
.../testing/selftests/kvm/include/riscv/sbi.h | 141 ++++
.../selftests/kvm/include/riscv/ucall.h | 1 +
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
.../selftests/kvm/riscv/get-reg-list.c | 4 +
.../selftests/kvm/riscv/sbi_pmu_test.c | 685 ++++++++++++++++++
tools/testing/selftests/kvm/steal_time.c | 4 +-
23 files changed, 1437 insertions(+), 114 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
--
2.34.1
Hi!
Implement support for tests which require access to a remote system /
endpoint which can generate traffic.
This series concludes the "groundwork" for upstream driver tests.
I wanted to support the three models which came up in discussions:
- SW testing with netdevsim
- "local" testing with two ports on the same system in a loopback
- "remote" testing via SSH
so there is a tiny bit of an abstraction which wraps up how "remote"
commands are executed. Otherwise hopefully there's nothing surprising.
I'm only adding a ping test. I had a bigger one written but I was
worried we'll get into discussing the details of the test itself
and how I chose to hack up netdevsim, instead of the test infra...
So that test will be a follow up :)
---
TBH, this series is on top of the one I posted in the morning:
https://lore.kernel.org/all/20240412141436.828666-1-kuba@kernel.org/
but it applies cleanly, and all it needs is the ifindex definition
in netdevsim. Testing with real HW works fine even without the other
series.
Jakub Kicinski (5):
selftests: drv-net: define endpoint structures
selftests: drv-net: add stdout to the command failed exception
selftests: drv-net: factor out parsing of the env
selftests: drv-net: construct environment for running tests which
require an endpoint
selftests: drv-net: add a trivial ping test
tools/testing/selftests/drivers/net/Makefile | 4 +-
.../testing/selftests/drivers/net/README.rst | 31 ++++
.../selftests/drivers/net/lib/py/__init__.py | 1 +
.../selftests/drivers/net/lib/py/endpoint.py | 13 ++
.../selftests/drivers/net/lib/py/env.py | 136 +++++++++++++++---
.../selftests/drivers/net/lib/py/ep_netns.py | 15 ++
.../selftests/drivers/net/lib/py/ep_ssh.py | 34 +++++
tools/testing/selftests/drivers/net/ping.py | 32 +++++
.../testing/selftests/net/lib/py/__init__.py | 1 +
tools/testing/selftests/net/lib/py/netns.py | 31 ++++
tools/testing/selftests/net/lib/py/utils.py | 22 +--
11 files changed, 291 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/lib/py/endpoint.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/ep_netns.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/ep_ssh.py
create mode 100755 tools/testing/selftests/drivers/net/ping.py
create mode 100644 tools/testing/selftests/net/lib/py/netns.py
--
2.44.0
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.9-rc5.
This kselftest fixes update for Linux 6.9-rc5 consists of a fix to
kselftest harness to prevent infinite loop triggered in an assert
in FIXTURE_TEARDOWN and a fix to a problem seen in being able to stop
subsystem-enable tests when sched events are being traced.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 224fe424c356cb5c8f451eca4127f32099a6f764:
selftests: dmabuf-heap: add config file for the test (2024-03-29 13:57:14 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.9-rc5
for you to fetch changes up to 72d7cb5c190befbb095bae7737e71560ec0fcaa6:
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN (2024-04-04 10:50:53 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.9-rc5
This kselftest fixes update for Linux 6.9-rc5 consists of a fix to
kselftest harness to prevent infinite loop triggered in an assert
in FIXTURE_TEARDOWN and a fix to a problem seen in being able to stop
subsystem-enable tests when sched events are being traced.
----------------------------------------------------------------
Shengyu Li (1):
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN
Yuanhe Shu (1):
selftests/ftrace: Limit length in subsystem-enable tests
tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc | 6 +++---
tools/testing/selftests/kselftest_harness.h | 5 ++++-
2 files changed, 7 insertions(+), 4 deletions(-)
----------------------------------------------------------------
Add a basic test for page pool netlink reporting.
v2:
- pass args as *args (patch 3)
- improve the test and add busy wait helper (patch 6)
v1: https://lore.kernel.org/all/20240411012815.174400-1-kuba@kernel.org/
Jakub Kicinski (6):
net: netdevsim: add some fake page pool use
tools: ynl: don't return None for dumps
selftests: net: print report check location in python tests
selftests: net: print full exception on failure
selftests: net: support use of NetdevSimDev under "with" in python
selftests: net: exercise page pool reporting via netlink
drivers/net/netdevsim/netdev.c | 93 ++++++++++++++++++++++
drivers/net/netdevsim/netdevsim.h | 4 +
tools/net/ynl/lib/ynl.py | 4 +-
tools/testing/selftests/net/lib/py/ksft.py | 41 +++++++---
tools/testing/selftests/net/lib/py/nsim.py | 16 +++-
tools/testing/selftests/net/nl_netdev.py | 76 +++++++++++++++++-
6 files changed, 218 insertions(+), 16 deletions(-)
--
2.44.0
Add FAULT_INJECTION_DEBUG_FS and FAILSLAB configurations which are
needed by iommufd_fail_nth test.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
While building and running these tests on x86, defconfig had these
configs enabled. But ARM64's defconfig doesn't enable these configs.
Hence the config options are being added explicitly in this patch.
---
tools/testing/selftests/iommu/config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/iommu/config b/tools/testing/selftests/iommu/config
index 110d73917615d..02a2a1b267c1e 100644
--- a/tools/testing/selftests/iommu/config
+++ b/tools/testing/selftests/iommu/config
@@ -1,3 +1,5 @@
CONFIG_IOMMUFD=y
+CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FAULT_INJECTION=y
CONFIG_IOMMUFD_TEST=y
+CONFIG_FAILSLAB=y
--
2.39.2
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Integrity detection and protection has long been a desirable feature, to
reach a large user base and mitigate the risk of flaws in the software
and attacks.
However, while solutions exist, they struggle to reach the large user
base, due to requiring higher than desired constraints on performance,
flexibility and configurability, that only security conscious people are
willing to accept.
This is where the new digest_cache LSM comes into play, it offers
additional support for new and existing integrity solutions, to make
them faster and easier to deploy.
The full documentation with the motivation and the solution details can be
found in patch 14.
The IMA integration patch set will be introduced separately. Also a PoC
based on the current version of IPE can be provided.
v3:
- Rewrite documentation, and remove the installation instructions since
they are now included in the README of digest-cache-tools
- Add digest cache event notifier
- Drop digest_cache_was_reset(), and send instead to asynchronous
notifications
- Fix digest_cache LSM Kconfig style issues (suggested by Randy Dunlap)
- Propagate digest cache reset to directory entries
- Destroy per directory entry mutex
- Introduce RESET_USER bit, to clear the dig_user pointer on
set/removexattr
- Replace 'file content' with 'file data' (suggested by Mimi)
- Introduce per digest cache mutex and replace verif_data_lock spinlock
- Track changes of security.digest_list xattr
- Stop tracking file_open and use file_release instead also for file writes
- Add error messages in digest_cache_create()
- Load/unload testing kernel module automatically during execution of test
- Add tests for digest cache event notifier
- Add test for ftruncate()
- Remove DIGEST_CACHE_RESET_PREFETCH_BUF command in test and clear the
buffer on read instead
v2:
- Include the TLV parser in this patch set (from user asymmetric keys and
signatures)
- Move from IMA and make an independent LSM
- Remove IMA-specific stuff from this patch set
- Add per algorithm hash table
- Expect all digest lists to be in the same directory and allow changing
the default directory
- Support digest lookup on directories, when there is no
security.digest_list xattr
- Add seq num to digest list file name, to impose ordering on directory
iteration
- Add a new data type DIGEST_LIST_ENTRY_DATA for the nested data in the
tlv digest list format
- Add the concept of verification data attached to digest caches
- Add the reset mechanism to track changes on digest lists and directory
containing the digest lists
- Add kernel selftests
v1:
- Add documentation in Documentation/security/integrity-digest-cache.rst
- Pass the mask of IMA actions to digest_cache_alloc()
- Add a reference count to the digest cache
- Remove the path parameter from digest_cache_get(), and rely on the
reference count to avoid the digest cache disappearing while being used
- Rename the dentry_to_check parameter of digest_cache_get() to dentry
- Rename digest_cache_get() to digest_cache_new() and add
digest_cache_get() to set the digest cache in the iint of the inode for
which the digest cache was requested
- Add dig_owner and dig_user to the iint, to distinguish from which inode
the digest cache was created from, and which is using it; consequently it
makes the digest cache usable to measure/appraise other digest caches
(support not yet enabled)
- Add dig_owner_mutex and dig_user_mutex to serialize accesses to dig_owner
and dig_user until they are initialized
- Enforce strong synchronization and make the contenders wait until
dig_owner and dig_user are assigned to the iint the first time
- Move checking IMA actions on the digest list earlier, and fail if no
action were performed (digest cache not usable)
- Remove digest_cache_put(), not needed anymore with the introduction of
the reference count
- Fail immediately in digest_cache_lookup() if the digest algorithm is
not set in the digest cache
- Use 64 bit mask for IMA actions on the digest list instead of 8 bit
- Return NULL in the inline version of digest_cache_get()
- Use list_add_tail() instead of list_add() in the iterator
- Copy the digest list path to a separate buffer in digest_cache_iter_dir()
- Use digest list parsers verified with Frama-C
- Explicitly disable (for now) the possibility in the IMA policy to use the
digest cache to measure/appraise other digest lists
- Replace exit(<value>) with return <value> in manage_digest_lists.c
Roberto Sassu (14):
lib: Add TLV parser
security: Introduce the digest_cache LSM
digest_cache: Add securityfs interface
digest_cache: Add hash tables and operations
digest_cache: Populate the digest cache from a digest list
digest_cache: Parse tlv digest lists
digest_cache: Parse rpm digest lists
digest_cache: Add management of verification data
digest_cache: Add support for directories
digest cache: Prefetch digest lists if requested
digest_cache: Reset digest cache on file/directory change
digest_cache: Notify digest cache events
selftests/digest_cache: Add selftests for digest_cache LSM
docs: Add documentation of the digest_cache LSM
Documentation/security/digest_cache.rst | 763 ++++++++++++++++
Documentation/security/index.rst | 1 +
MAINTAINERS | 16 +
include/linux/digest_cache.h | 117 +++
include/linux/kernel_read_file.h | 1 +
include/linux/tlv_parser.h | 28 +
include/uapi/linux/lsm.h | 1 +
include/uapi/linux/tlv_digest_list.h | 72 ++
include/uapi/linux/tlv_parser.h | 59 ++
include/uapi/linux/xattr.h | 6 +
lib/Kconfig | 3 +
lib/Makefile | 3 +
lib/tlv_parser.c | 214 +++++
lib/tlv_parser.h | 17 +
security/Kconfig | 11 +-
security/Makefile | 1 +
security/digest_cache/Kconfig | 33 +
security/digest_cache/Makefile | 11 +
security/digest_cache/dir.c | 252 ++++++
security/digest_cache/htable.c | 268 ++++++
security/digest_cache/internal.h | 290 +++++++
security/digest_cache/main.c | 570 ++++++++++++
security/digest_cache/modsig.c | 66 ++
security/digest_cache/notifier.c | 135 +++
security/digest_cache/parsers/parsers.h | 15 +
security/digest_cache/parsers/rpm.c | 223 +++++
security/digest_cache/parsers/tlv.c | 299 +++++++
security/digest_cache/populate.c | 163 ++++
security/digest_cache/reset.c | 235 +++++
security/digest_cache/secfs.c | 87 ++
security/digest_cache/verif.c | 119 +++
security/security.c | 3 +-
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/digest_cache/.gitignore | 3 +
tools/testing/selftests/digest_cache/Makefile | 24 +
.../testing/selftests/digest_cache/all_test.c | 815 ++++++++++++++++++
tools/testing/selftests/digest_cache/common.c | 78 ++
tools/testing/selftests/digest_cache/common.h | 135 +++
.../selftests/digest_cache/common_user.c | 47 +
.../selftests/digest_cache/common_user.h | 17 +
tools/testing/selftests/digest_cache/config | 1 +
.../selftests/digest_cache/generators.c | 248 ++++++
.../selftests/digest_cache/generators.h | 19 +
.../selftests/digest_cache/testmod/Makefile | 16 +
.../selftests/digest_cache/testmod/kern.c | 564 ++++++++++++
.../selftests/lsm/lsm_list_modules_test.c | 3 +
46 files changed, 6047 insertions(+), 6 deletions(-)
create mode 100644 Documentation/security/digest_cache.rst
create mode 100644 include/linux/digest_cache.h
create mode 100644 include/linux/tlv_parser.h
create mode 100644 include/uapi/linux/tlv_digest_list.h
create mode 100644 include/uapi/linux/tlv_parser.h
create mode 100644 lib/tlv_parser.c
create mode 100644 lib/tlv_parser.h
create mode 100644 security/digest_cache/Kconfig
create mode 100644 security/digest_cache/Makefile
create mode 100644 security/digest_cache/dir.c
create mode 100644 security/digest_cache/htable.c
create mode 100644 security/digest_cache/internal.h
create mode 100644 security/digest_cache/main.c
create mode 100644 security/digest_cache/modsig.c
create mode 100644 security/digest_cache/notifier.c
create mode 100644 security/digest_cache/parsers/parsers.h
create mode 100644 security/digest_cache/parsers/rpm.c
create mode 100644 security/digest_cache/parsers/tlv.c
create mode 100644 security/digest_cache/populate.c
create mode 100644 security/digest_cache/reset.c
create mode 100644 security/digest_cache/secfs.c
create mode 100644 security/digest_cache/verif.c
create mode 100644 tools/testing/selftests/digest_cache/.gitignore
create mode 100644 tools/testing/selftests/digest_cache/Makefile
create mode 100644 tools/testing/selftests/digest_cache/all_test.c
create mode 100644 tools/testing/selftests/digest_cache/common.c
create mode 100644 tools/testing/selftests/digest_cache/common.h
create mode 100644 tools/testing/selftests/digest_cache/common_user.c
create mode 100644 tools/testing/selftests/digest_cache/common_user.h
create mode 100644 tools/testing/selftests/digest_cache/config
create mode 100644 tools/testing/selftests/digest_cache/generators.c
create mode 100644 tools/testing/selftests/digest_cache/generators.h
create mode 100644 tools/testing/selftests/digest_cache/testmod/Makefile
create mode 100644 tools/testing/selftests/digest_cache/testmod/kern.c
--
2.34.1
By default, HLT instruction executed by guest is intercepted by hypervisor.
However, KVM_CAP_X86_DISABLE_EXITS capability can be used to not intercept
HLT by setting KVM_X86_DISABLE_EXITS_HLT.
By default, vms are created with in-kernel APIC support in KVM selftests.
VM needs to be created without in-kernel APIC support for this test, so
that HLT will exit to userspace. To do so, __vm_create() is modified to not
call KVM_CREATE_IRQCHIP ioctl while creating vm.
Add a test case to test KVM_X86_DISABLE_EXITS_HLT functionality.
Patch 1, 2 -> Preparatory patches to add the KVM_X86_DISABLE_EXITS_HLT test
case
Patch 3 -> Adds a test case for KVM_X86_DISABLE_EXITS_HLT
Testing done:
Tested KVM_X86_DISABLE_EXITS_HLT test case on AMD and Intel machines.
v1 -> v2
- Extended @shape to allow creation of VM without in-kernel APIC support
(Andrew Jones)
- Changed the test case based on Andrew's comments.
- Few more changes to the test case to pass the address of the flag on
which guest waits to execute HLT.
Manali Shukla (3):
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Extend @shape to allow creation of VM without
in-kernel APIC
KVM: selftests: Add a test case for KVM_X86_DISABLE_EXITS_HLT
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 17 ++-
.../selftests/kvm/include/x86_64/processor.h | 17 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 4 +-
.../kvm/x86_64/halt_disable_exit_test.c | 120 ++++++++++++++++++
6 files changed, 158 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/halt_disable_exit_test.c
base-commit: e9da6f08edb0bd4c621165496778d77a222e1174
--
2.34.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- address comments of Martin and Eduard in v2. (thanks)
- move "int type" to the first argument of start_server_addr and
connect_to_addr.
- add start_server_addr_opts.
- using "sockaddr_storage" instead of "sockaddr".
- move start_server_setsockopt patches out of this series.
v2:
- update patch 6 only, fix errors reported by CI.
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (9):
selftests/bpf: Update arguments of connect_to_addr
selftests/bpf: Add start_server_addr* helpers
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/network_helpers.c | 37 ++++++++--
tools/testing/selftests/bpf/network_helpers.h | 6 +-
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/empty_skb.c | 2 +
.../bpf/prog_tests/ip_check_defrag.c | 2 +
.../selftests/bpf/prog_tests/sk_assign.c | 59 ++-------------
.../selftests/bpf/prog_tests/sock_addr.c | 6 +-
.../selftests/bpf/prog_tests/tc_redirect.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 4 +
.../selftests/bpf/prog_tests/xdp_metadata.c | 16 ++++
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++-----------------
12 files changed, 81 insertions(+), 168 deletions(-)
--
2.40.1
Hello,
kernel test robot noticed "kunit.VCAP_API_DebugFS_Testsuite.vcap_api_show_admin_raw_test.fail" on:
commit: 93533996100c60ea6d4342c454752c0eb1e4b6b1 ("kunit: Handle test faults")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 9ed46da14b9b9b2ad4edb3b0c545b6dbe5c00d39]
in testcase: kunit
version:
with following parameters:
group: group-03
compiler: gcc-13
test machine: 16 threads 1 sockets Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (Broadwell-DE) with 48G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang(a)intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202404151340.5b152d96-lkp@intel.com
[ 206.153880][ T2978] # vcap_api_show_admin_raw_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c:377
[ 206.153880][ T2978] Expected test_expected == test_pr_buffer[0], but
[ 206.153880][ T2978] test_expected == " addr: 786, X6 rule, keysets: VCAP_KFS_MAC_ETYPE
[ 206.153880][ T2978] "
[ 206.153880][ T2978] test_pr_buffer[0] == ""
[ 206.159902][ T1] not ok 2 vcap_api_show_admin_raw_test
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240415/202404151340.5b152d96-lkp@…
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Dmitry Vyukov <dvyukov(a)google.com>
POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
thread of a thread group for signal delivery. However, this has a
significant downside: it requires waking up a potentially idle thread.
Instead, prefer to deliver signals to the current thread (in the same
thread group) if SIGEV_THREAD_ID is not set by the user. This does not
change guaranteed semantics, since POSIX process CPU time timers have
never guaranteed that signal delivery is to a specific thread (without
SIGEV_THREAD_ID set).
The effect is that we no longer wake up potentially idle threads, and
the kernel is no longer biased towards delivering the timer signal to
any particular thread (which better distributes the timer signals esp.
when multiple timers fire concurrently).
Signed-off-by: Dmitry Vyukov <dvyukov(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Marco Elver <elver(a)google.com>
---
v6:
- Split test from this patch.
- Update wording on what this patch aims to improve.
v5:
- Rebased onto v6.2.
v4:
- Restructured checks in send_sigqueue() as suggested.
v3:
- Switched to the completely different implementation (much simpler)
based on the Oleg's idea.
RFC v2:
- Added additional Cc as Thomas asked.
---
kernel/signal.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 8cb28f1df294..605445fa27d4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
/*
* Now find a thread we can wake up to take the signal off the queue.
*
- * If the main thread wants the signal, it gets first crack.
- * Probably the least surprising to the average bear.
+ * Try the suggested task first (may or may not be the main thread).
*/
if (wants_signal(sig, p))
t = p;
@@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
ret = -1;
rcu_read_lock();
+ /*
+ * This function is used by POSIX timers to deliver a timer signal.
+ * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
+ * set), the signal must be delivered to the specific thread (queues
+ * into t->pending).
+ *
+ * Where type is not PIDTYPE_PID, signals must just be delivered to the
+ * current process. In this case, prefer to deliver to current if it is
+ * in the same thread group as the target, as it avoids unnecessarily
+ * waking up a potentially idle task.
+ */
t = pid_task(pid, type);
- if (!t || !likely(lock_task_sighand(t, &flags)))
+ if (!t)
+ goto ret;
+ if (type != PIDTYPE_PID && same_thread_group(t, current))
+ t = current;
+ if (!likely(lock_task_sighand(t, &flags)))
goto ret;
ret = 1; /* the signal is ignored */
@@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
q->info.si_overrun = 0;
signalfd_notify(t, sig);
+ /*
+ * If the type is not PIDTYPE_PID, we just use shared_pending, which
+ * won't guarantee that the specified task will receive the signal, but
+ * is sufficient if t==current in the common case.
+ */
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
--
2.40.0.rc1.284.g88254d51c5-goog
KUnit's try-catch infrastructure now uses vfork_done, which is always
set to a valid completion when a kthread is created, but which is set to
NULL once the thread terminates. This creates a race condition, where
the kthread exits before we can wait on it.
Keep a copy of vfork_done, which is taken before we wake_up_process()
and so valid, and wait on that instead.
Fixes: 4de2a8e4cca4 ("kunit: Handle test faults")
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.o…
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Acked-by: Mickaël Salaün <mic(a)digikod.net>
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/kunit/try-catch.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index fa687278ccc9..6bbe0025b079 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -63,6 +63,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
{
struct kunit *test = try_catch->test;
struct task_struct *task_struct;
+ struct completion *task_done;
int exit_code, time_remaining;
try_catch->context = context;
@@ -75,13 +76,16 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
return;
}
get_task_struct(task_struct);
- wake_up_process(task_struct);
/*
* As for a vfork(2), task_struct->vfork_done (pointing to the
* underlying kthread->exited) can be used to wait for the end of a
- * kernel thread.
+ * kernel thread. It is set to NULL when the thread exits, so we
+ * keep a copy here.
*/
- time_remaining = wait_for_completion_timeout(task_struct->vfork_done,
+ task_done = task_struct->vfork_done;
+ wake_up_process(task_struct);
+
+ time_remaining = wait_for_completion_timeout(task_done,
kunit_test_timeout());
if (time_remaining == 0) {
try_catch->try_result = -ETIMEDOUT;
--
2.44.0.683.g7961c838ac-goog
Currently, the migration worker delays 1-10 us, assuming that one
KVM_RUN iteration only takes a few microseconds. But if C-state exit
latencies are large enough, for example, hundreds or even thousands
of microseconds on server CPUs, it may happen that it's not able to
bring the target CPU out of C-state before the migration worker starts
to migrate it to the next CPU.
If the system workload is light, most CPUs could be at a certain level
of C-state, and the vCPU thread may waste milliseconds before it can
actually migrate to a new CPU.
Thus, the tests may be inefficient in such systems, and in some cases
it may fail the migration/KVM_RUN ratio sanity check.
Since we are not able to turn off the cpuidle sub-system in run time,
this patch creates an idle thread on every CPU to prevent them from
entering C-states.
Additionally, seems it's reasonable to randomize the length of usleep(),
other than delay in a fixed pattern.
Signed-off-by: Zide Chen <zide.chen(a)intel.com>
---
tools/testing/selftests/kvm/rseq_test.c | 76 ++++++++++++++++++++++---
1 file changed, 69 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 28f97fb52044..d6e8b851d29e 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -11,6 +11,7 @@
#include <syscall.h>
#include <sys/ioctl.h>
#include <sys/sysinfo.h>
+#include <sys/resource.h>
#include <asm/barrier.h>
#include <linux/atomic.h>
#include <linux/rseq.h>
@@ -29,9 +30,10 @@
#define NR_TASK_MIGRATIONS 100000
static pthread_t migration_thread;
+static pthread_t *idle_threads;
static cpu_set_t possible_mask;
-static int min_cpu, max_cpu;
-static bool done;
+static int min_cpu, max_cpu, nproc;
+static volatile bool done;
static atomic_t seq_cnt;
@@ -150,7 +152,7 @@ static void *migration_worker(void *__rseq_tid)
* Use usleep() for simplicity and to avoid unnecessary kernel
* dependencies.
*/
- usleep((i % 10) + 1);
+ usleep((rand() % 10) + 1);
}
done = true;
return NULL;
@@ -158,7 +160,7 @@ static void *migration_worker(void *__rseq_tid)
static void calc_min_max_cpu(void)
{
- int i, cnt, nproc;
+ int i, cnt;
TEST_REQUIRE(CPU_COUNT(&possible_mask) >= 2);
@@ -186,6 +188,61 @@ static void calc_min_max_cpu(void)
"Only one usable CPU, task migration not possible");
}
+static void *idle_thread_fn(void *__idle_cpu)
+{
+ int r, cpu = (int)(unsigned long)__idle_cpu;
+ cpu_set_t allowed_mask;
+
+ CPU_ZERO(&allowed_mask);
+ CPU_SET(cpu, &allowed_mask);
+
+ r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+ TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
+ errno, strerror(errno));
+
+ /* lowest priority, trying to prevent it from entering C-states */
+ r = setpriority(PRIO_PROCESS, 0, 19);
+ TEST_ASSERT(!r, "setpriority failed, errno = %d (%s)",
+ errno, strerror(errno));
+
+ while(!done);
+
+ return NULL;
+}
+
+static void spawn_threads(void)
+{
+ int cpu;
+
+ /* Run a dummy thread on every CPU */
+ for (cpu = min_cpu; cpu <= max_cpu; cpu++) {
+ if (!CPU_ISSET(cpu, &possible_mask))
+ continue;
+
+ pthread_create(&idle_threads[cpu], NULL, idle_thread_fn,
+ (void *)(unsigned long)cpu);
+ }
+
+ pthread_create(&migration_thread, NULL, migration_worker,
+ (void *)(unsigned long)syscall(SYS_gettid));
+}
+
+static void join_threads(void)
+{
+ int cpu;
+
+ pthread_join(migration_thread, NULL);
+
+ for (cpu = min_cpu; cpu <= max_cpu; cpu++) {
+ if (!CPU_ISSET(cpu, &possible_mask))
+ continue;
+
+ pthread_join(idle_threads[cpu], NULL);
+ }
+
+ free(idle_threads);
+}
+
int main(int argc, char *argv[])
{
int r, i, snapshot;
@@ -199,6 +256,12 @@ int main(int argc, char *argv[])
calc_min_max_cpu();
+ srand(time(NULL));
+
+ idle_threads = malloc(sizeof(pthread_t) * nproc);
+ TEST_ASSERT(idle_threads, "malloc failed, errno = %d (%s)", errno,
+ strerror(errno));
+
r = rseq_register_current_thread();
TEST_ASSERT(!r, "rseq_register_current_thread failed, errno = %d (%s)",
errno, strerror(errno));
@@ -210,8 +273,7 @@ int main(int argc, char *argv[])
*/
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
- pthread_create(&migration_thread, NULL, migration_worker,
- (void *)(unsigned long)syscall(SYS_gettid));
+ spawn_threads();
for (i = 0; !done; i++) {
vcpu_run(vcpu);
@@ -258,7 +320,7 @@ int main(int argc, char *argv[])
TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
"Only performed %d KVM_RUNs, task stalled too much?", i);
- pthread_join(migration_thread, NULL);
+ join_threads();
kvm_vm_free(vm);
--
2.34.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- update patch 6 only, fix errors reported by CI.
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Add start_server_addr helper
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Add function pointer for __start_server
selftests/bpf: Add start_server_setsockopt helper
selftests/bpf: Use start_server_setsockopt in sockopt_inherit
selftests/bpf: Use connect_to_fd in sockopt_inherit
selftests/bpf: Use start_server_* in test_tcp_check_syncookie
selftests/bpf: Use connect_to_addr in test_tcp_check_syncookie
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 50 ++++++++----
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/sk_assign.c | 53 +------------
.../bpf/prog_tests/sockopt_inherit.c | 64 ++++------------
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++----------------
.../bpf/test_tcp_check_syncookie_user.c | 76 +++----------------
8 files changed, 83 insertions(+), 280 deletions(-)
--
2.40.1
The series consists of two parts:
- pids.events rework (originally v2, patches 1-6,
- migration charging, patches 7-9.
The changes are independent in principle, I stacked them for (my)
convenience and because they both deserve RFC:
1) Changed semantics of v2 pids.events
- similar change was proposed for memory.swap.events:max [1]
2) Migration charging is obsolete concept
How are the new events supposed to be useful?
- pids.events.local:max
- tells that cgroup's limit is hit (too tight?)
- pids.events.local:max.imposed
- tells that cgroup's workload was restricted (generalization of
'cgroup: fork rejected by pids controller in %s' message)
- pids.events:*
- "only" directs top-down search to cgroups of interest
The migration charging is motivated by apparenty surprising
pids.current > pids.max
because supervised processes are forked in supervisor's cgroup (more
details in commit cgroup/pids: Enforce pids.max on task migrations too)
Changes from v2 (https://lore.kernel.org/r/20200205134426.10570-1-mkoutny@suse.com)
- implemented pids.events.local (Tejun)
- added migration charging
[1] https://lore.kernel.org/r/20230202155626.1829121-1-hannes@cmpxchg.org/
Michal Koutný (9):
cgroup/pids: Remove superfluous zeroing
cgroup/pids: Separate semantics of pids.events related to pids.max
cgroup/pids: Make event counters hierarchical
cgroup/pids: Add pids.events.local
selftests: cgroup: Lexicographic order in Makefile
selftests: cgroup: Add basic tests for pids controller
cgroup/pids: Replace uncharge/charge pair with a single function
cgroup/pids: Enforce pids.max on task migrations
selftests: cgroup: Add tests pids controller
Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 22 +-
include/linux/cgroup-defs.h | 7 +-
kernel/cgroup/cgroup.c | 16 +-
kernel/cgroup/pids.c | 206 +++++++++----
tools/testing/selftests/cgroup/Makefile | 25 +-
tools/testing/selftests/cgroup/test_pids.c | 302 +++++++++++++++++++
7 files changed, 514 insertions(+), 67 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c
base-commit: 026e680b0a08a62b1d948e5a8ca78700bfac0e6e
--
2.44.0
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
check_timer_distribution()"), clang warns:
tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
401 | return major > min_major || (major == min_major && minor >= min_minor);
| ^~~~~
tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
395 | unsigned int major, minor;
| ^
| = 0
This is a false positive because if uname() fails, ksft_exit_fail_msg()
will be called, which unconditionally calls exit(), a noreturn function.
However, clang does not know that ksft_exit_fail_msg() will call exit()
at the point in the pipeline that the warning is emitted because
inlining has not occurred, so it assumes control flow will resume
normally after ksft_exit_fail_msg() is called.
Make it clear to clang that all of the functions that call exit()
unconditionally in kselftest.h are noreturn transitively by marking them
explicitly with '__attribute__((__noreturn__))', which clears up the
warning above and any future warnings that may appear for the same
reason.
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: John Stultz <jstultz(a)google.com>
Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I have based this change on timers/urgent, as the commit that introduces
this particular warning is there and it is marked for stable, even
though this appears to be a generic kselftest issue. I think it makes
the most sense for this change to go via timers/urgent with Shuah's ack.
While __noreturn with a return type other than 'void' does not make much
sense semantically, there are many places that these functions are used
as the return value for other functions such as main(), so I did not
change the return type of these functions from 'int' to 'void' to
minimize the necessary changes for a backport (it is an existing issue
anyways).
I see there is another instance of this problem that will need to be
addressed in -next, introduced by commit f07041728422 ("selftests: add
ksft_exit_fail_perror()").
---
tools/testing/selftests/kselftest.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 973b18e156b2..0591974b57e0 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -80,6 +80,9 @@
#define KSFT_XPASS 3
#define KSFT_SKIP 4
+#ifndef __noreturn
+#define __noreturn __attribute__((__noreturn__))
+#endif
#define __printf(a, b) __attribute__((format(printf, a, b)))
/* counters */
@@ -300,13 +303,13 @@ void ksft_test_result_code(int exit_code, const char *test_name,
va_end(args);
}
-static inline int ksft_exit_pass(void)
+static inline __noreturn int ksft_exit_pass(void)
{
ksft_print_cnts();
exit(KSFT_PASS);
}
-static inline int ksft_exit_fail(void)
+static inline __noreturn int ksft_exit_fail(void)
{
ksft_print_cnts();
exit(KSFT_FAIL);
@@ -333,7 +336,7 @@ static inline int ksft_exit_fail(void)
ksft_cnt.ksft_xfail + \
ksft_cnt.ksft_xskip)
-static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
@@ -348,19 +351,19 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
exit(KSFT_FAIL);
}
-static inline int ksft_exit_xfail(void)
+static inline __noreturn int ksft_exit_xfail(void)
{
ksft_print_cnts();
exit(KSFT_XFAIL);
}
-static inline int ksft_exit_xpass(void)
+static inline __noreturn int ksft_exit_xpass(void)
{
ksft_print_cnts();
exit(KSFT_XPASS);
}
-static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
---
base-commit: 076361362122a6d8a4c45f172ced5576b2d4a50d
change-id: 20240411-mark-kselftest-exit-funcs-noreturn-17d8ff729a7a
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v5 -> v6:
- Write inner_network_offset in vxlan, geneve and ipsec
- Ignore is_atomic when DF=0
- v5:
https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/
v4 -> v5:
- Add 1st commit - flush id checks in udp_gro_receive segment which can be
backported by itself
- Add TCP measurements for the 5th commit
- Add flush id tests to ensure flush id logic is preserved in GRO
- Simplify gro_inet_flush by removing a branch
- v4:
https://lore.kernel.org/all/20240325182543.87683-1-richardbgobert@gmail.com/
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/all/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.com/
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (6):
net: gro: add flush check in udp_gro_receive_segment
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
selftests/net: add flush id selftests
drivers/net/geneve.c | 8 +-
drivers/net/vxlan/vxlan_core.c | 12 +-
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 93 ++++++++++++--
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +-
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 7 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 54 +-------
net/ipv4/fou_core.c | 9 +-
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 22 +---
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 31 +++--
net/ipv6/ip6_offload.c | 41 +++---
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 +-
tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
25 files changed, 338 insertions(+), 161 deletions(-)
--
2.36.1
Add a basic test for page pool netlink reporting.
Jakub Kicinski (6):
net: netdevsim: add some fake page pool use
tools: ynl: don't return None for dumps
selftests: net: print report check location in python tests
selftests: net: print full exception on failure
selftests: net: support use of NetdevSimDev under "with" in python
selftests: net: exercise page pool reporting via netlink
drivers/net/netdevsim/netdev.c | 93 ++++++++++++++++++++++
drivers/net/netdevsim/netdevsim.h | 4 +
tools/net/ynl/lib/ynl.py | 4 +-
tools/testing/selftests/net/lib/py/ksft.py | 29 ++++---
tools/testing/selftests/net/lib/py/nsim.py | 22 ++++-
tools/testing/selftests/net/nl_netdev.py | 79 +++++++++++++++++-
6 files changed, 215 insertions(+), 16 deletions(-)
--
2.44.0
New version of the sleepable bpf_timer code.
I'm posting this as this is the result of the previous review, so we can
have a baseline to compare to.
The plan is now to introduce a new user API struct bpf_wq, as the timer
API working on softIRQ seems to be quite far away from a wq.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v6:
- Use of a workqueue to clean up sleepable timers
- integrated Kumar's patch instead of mine
- Link to v5: https://lore.kernel.org/r/20240322-hid-bpf-sleepable-v5-0-179c7b59eaaa@kern…
Changes in v5:
- took various reviews into account
- rewrote the tests to be separated to not have a uggly include
- Link to v4: https://lore.kernel.org/r/20240315-hid-bpf-sleepable-v4-0-5658f2540564@kern…
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (5):
bpf/helpers: introduce sleepable bpf_timers
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
Kumar Kartikeya Dwivedi (1):
bpf: Add support for KF_ARG_PTR_TO_TIMER
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 13 ++
kernel/bpf/helpers.c | 202 ++++++++++++++++---
kernel/bpf/verifier.c | 98 +++++++++-
tools/include/uapi/linux/bpf.h | 20 +-
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 34 ++++
.../testing/selftests/bpf/progs/timer_sleepable.c | 213 +++++++++++++++++++++
10 files changed, 553 insertions(+), 39 deletions(-)
---
base-commit: 61df575632d6b39213f47810c441bddbd87c3606
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
This patch series introduces a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.
== Background ==
The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.
The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.
This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:
https://www.youtube.com/watch?v=NjU4nyWyhU8
== Performance ==
The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuils, OnMars, and myself:
Game Upstream ntsync improvement
===========================================================================
Anger Foot 69 99 43%
Call of Juarez 99.8 224.1 125%
Dirt 3 110.6 860.7 678%
Forza Horizon 5 108 160 48%
Lara Croft: Temple of Osiris 141 326 131%
Metro 2033 164.4 199.2 21%
Resident Evil 2 26 77 196%
The Crew 26 51 96%
Tiny Tina's Wonderlands 130 360 177%
Total War Saga: Troy 109 146 34%
===========================================================================
== Patches ==
The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. For those not already familiar with the Windows
functions (or their undocumented behaviour), patch 31/31 provides a detailed
specification, and individual patches also include a brief description of the
API they are implementing.
The patches making use of this driver in Wine can be retrieved or browsed here:
https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync5
== Implementation ==
Some aspects of the implementation may deserve particular comment:
* In the interest of performance, each object is governed only by a single
spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
objects be changed as a single atomic operation. In order to achieve this, we
first take a device-wide lock ("wait_all_lock") any time we are going to lock
more than one object at a time.
The maximum number of objects that can be used in a vectored wait, and
therefore the maximum that can be locked simultaneously, is 64. This number is
NT's own limit.
The acquisition of multiple spinlocks will degrade performance. This is a
conscious choice, however. Wait-for-all is known to be a very rare operation
in practice, especially with counts that approach the maximum, and it is the
intent of the ntsync driver to optimize wait-for-any at the expense of
wait-for-all as much as possible.
* NT mutexes are tied to their threads on an OS level, and the kernel includes
builtin support for "robust" mutexes. In order to keep the ntsync driver
self-contained and avoid touching more code than necessary, it does not hook
into task exit nor use pids.
Instead, the user space emulator is expected to manage thread IDs and pass
them as an argument to any relevant functions; this is the "owner" field of
ntsync_wait_args and ntsync_mutex_args.
When the emulator detects that a thread dies, it should therefore call
NTSYNC_IOC_MUTEX_KILL on any open mutexes.
* ntsync is module-capable mostly because there was nothing preventing it, and
because it aided development. It is not a hard requirement, though.
== Previous versions ==
Changes from v2:
* Check the result of fget() for NULL.
* Squash patch 31 (introducing the NTSYNC_WAIT_REALTIME flag) into patch 4, per
Arnd Bergmann.
* Use atomic_try_cmpxchg() instead of atomic_cmpxchg(), per off-list review from
Uros Bizjak.
* Link to v2: https://lore.kernel.org/lkml/20240219223833.95710-1-zfigura@codeweavers.com/
* Link to v1: https://lore.kernel.org/lkml/20240214233645.9273-1-zfigura@codeweavers.com/
* Link to RFC v2: https://lore.kernel.org/lkml/20240131021356.10322-1-zfigura@codeweavers.com/
* Link to RFC v1: https://lore.kernel.org/lkml/20240124004028.16826-1-zfigura@codeweavers.com/
Elizabeth Figura (30):
ntsync: Introduce the ntsync driver and character device.
ntsync: Introduce NTSYNC_IOC_CREATE_SEM.
ntsync: Introduce NTSYNC_IOC_SEM_POST.
ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
ntsync: Introduce NTSYNC_IOC_EVENT_SET.
ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
ntsync: Introduce NTSYNC_IOC_SEM_READ.
ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
ntsync: Introduce NTSYNC_IOC_EVENT_READ.
ntsync: Introduce alertable waits.
selftests: ntsync: Add some tests for semaphore state.
selftests: ntsync: Add some tests for mutex state.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for manual-reset event state.
selftests: ntsync: Add some tests for auto-reset event state.
selftests: ntsync: Add some tests for wakeup signaling with events.
selftests: ntsync: Add tests for alertable waits.
selftests: ntsync: Add some tests for wakeup signaling via alerts.
selftests: ntsync: Add a stress test for contended waits.
maintainers: Add an entry for ntsync.
docs: ntsync: Add documentation for the ntsync uAPI.
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 2 +
Documentation/userspace-api/ntsync.rst | 399 +++++
MAINTAINERS | 9 +
drivers/misc/Kconfig | 11 +
drivers/misc/Makefile | 1 +
drivers/misc/ntsync.c | 1166 ++++++++++++++
include/uapi/linux/ntsync.h | 62 +
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/drivers/ntsync/Makefile | 8 +
tools/testing/selftests/drivers/ntsync/config | 1 +
.../testing/selftests/drivers/ntsync/ntsync.c | 1407 +++++++++++++++++
12 files changed, 3068 insertions(+)
create mode 100644 Documentation/userspace-api/ntsync.rst
create mode 100644 drivers/misc/ntsync.c
create mode 100644 include/uapi/linux/ntsync.h
create mode 100644 tools/testing/selftests/drivers/ntsync/Makefile
create mode 100644 tools/testing/selftests/drivers/ntsync/config
create mode 100644 tools/testing/selftests/drivers/ntsync/ntsync.c
base-commit: 4cece764965020c22cff7665b18a012006359095
--
2.43.0
When logging an error from calling waitpid() on the child we print a
misleading error message saying that the error we report was returned by
the chilld. Fix this to say the error is from waitpid().
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 3c9bf0cd82a8..eb108727c35c 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -95,7 +95,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
getpid(), pid);
if (waitpid(-1, &status, __WALL) < 0) {
- ksft_print_msg("Child returned %s\n", strerror(errno));
+ ksft_print_msg("waitpid() returned %s\n", strerror(errno));
return -errno;
}
if (WEXITSTATUS(status))
---
base-commit: 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc
change-id: 20240405-kselftest-clone3-waitpid-68c4833cf5ff
Best regards,
--
Mark Brown <broonie(a)kernel.org>
When the child exits during the clone3() selftest we use WEXITSTATUS() to
get the exit status from the process without first checking WIFEXITED() to
see if the result will be valid. This can lead to incorrect results, for
example if the child exits due to signal. Add a WIFEXTED() check and report
any non-standard exit as a failure, using EXIT_FAILURE as the exit status
for call_clone3() since we otherwise report 0 or negative errnos.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 3c9bf0cd82a8..0e0e5dfa97c6 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -98,6 +98,11 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
ksft_print_msg("Child returned %s\n", strerror(errno));
return -errno;
}
+ if (!WIFEXITED(status)) {
+ ksft_print_msg("Child did not exit normally, status 0x%x\n",
+ status);
+ return EXIT_FAILURE;
+ }
if (WEXITSTATUS(status))
return WEXITSTATUS(status);
---
base-commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
change-id: 20240405-kselftest-clone3-signal-1edb1a3f5473
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v5:
- address Martin's comments for v4 (thanks).
- update patch 2, use 'return err' instead of 'return -1/0'.
- drop patch 3 in v4.
v4:
- fix a bug in v3, it should be 'if (err)', not 'if (!err)'.
- move "selftests/bpf: Use log_err in network_helpers" out of this
series.
v3:
- add two more patches.
- use log_err instead of ASSERT in v3.
- let send_recv_data return int as Martin suggested.
v2:
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (2):
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
tools/testing/selftests/bpf/network_helpers.c | 96 +++++++++++++++++++
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +-------------
3 files changed, 98 insertions(+), 70 deletions(-)
--
2.40.1
These patches from Geliang add support for the "last time" field in
MPTCP Info, and verify that the counters look valid.
Patch 1 adds these counters: last_data_sent, last_data_recv and
last_ack_recv. They are available in the MPTCP Info, so exposed via
getsockopt(MPTCP_INFO) and the Netlink Diag interface.
Patch 2 adds a test in diag.sh MPTCP selftest, to check that the
counters have moved by at least 250ms, after having waited twice that
time.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Only patch 1/2 has been modified following Eric's suggestion, see the
individual changelog for more details.
- Link to v1: https://lore.kernel.org/r/20240405-upstream-net-next-20240405-mptcp-last-ti…
---
Geliang Tang (2):
mptcp: add last time fields in mptcp_info
selftests: mptcp: test last time mptcp_info
include/uapi/linux/mptcp.h | 4 +++
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 7 ++++
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 16 +++++++---
tools/testing/selftests/net/mptcp/diag.sh | 53 +++++++++++++++++++++++++++++++
6 files changed, 79 insertions(+), 5 deletions(-)
---
base-commit: 2ecd487b670fcbb1ad4893fff1af4aafdecb6023
change-id: 20240405-upstream-net-next-20240405-mptcp-last-time-info-9b03618e08f1
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch series adds support to freeze the task cgroup hierarchy
that is on a default cgroup v2 without going through kernfs interface.
For some cases we want to freeze the cgroup of a task based on some
signals, doing so from bpf is better than user space which could be
too late.
Planned users of this feature are: tetragon and systemd when freezing
a cgroup hierarchy that could be a K8s pod, container, system service
or a user session.
Patch 1: cgroup: add cgroup_freeze_no_kn() to freeze a cgroup from bpf
Patch 2: bpf: add bpf_task_freeze_cgroup() to freeze the cgroup of a task
Patch 3: selftests/bpf: add selftest for bpf_task_freeze_cgroup
include/linux/cgroup.h | 2 ++
kernel/bpf/helpers.c | 31 ++++
kernel/cgroup/cgroup.c | 67 ++++++++
tools/testing/selftests/bpf/prog_tests/task_freeze_cgroup.c | 165 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/test_task_freeze_cgroup.c | 110 ++++++++++++++
5 files changed, 375 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_freeze_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_freeze_cgroup.c
--
2.34.1
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
PATCH 2,3,14 : FW_READ_HI function implementation
PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
PATCH 16-17: Generic improvements for kvm selftests
PATCH 18-22: KVM selftests for SBI PMU extension
The series is based on v6.9-rc1 and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v5
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v4->v5:
1. Moved sbi related definitions to its own header file from processor.h
2. Added few helper functions for selftests.
3. Improved firmware counter read and RV32 start/stop functions.
4. Converted all the shifting operations to use BIT macro
5. Addressed all other comments on v4.
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (22):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
drivers/perf: riscv: Use BIT macro for shifting operations
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
drivers/perf: riscv: Fix counter mask iteration for RV32
RISC-V: KVM: Fix the initial sample period value
RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
RISC-V: KVM: Improve firmware counter read function
KVM: riscv: selftests: Move sbi definitions to its own header file
KVM: riscv: selftests: Add helper functions for extension checks
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
arch/riscv/include/asm/sbi.h | 34 +-
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/paravirt.c | 6 +-
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 15 +-
arch/riscv/kvm/vcpu_onereg.c | 5 +
arch/riscv/kvm/vcpu_pmu.c | 260 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 264 +++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 49 +-
.../testing/selftests/kvm/include/riscv/sbi.h | 141 +++++
.../selftests/kvm/include/riscv/ucall.h | 1 +
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
.../selftests/kvm/riscv/get-reg-list.c | 4 +
.../selftests/kvm/riscv/sbi_pmu_test.c | 581 ++++++++++++++++++
tools/testing/selftests/kvm/steal_time.c | 4 +-
23 files changed, 1322 insertions(+), 112 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
--
2.34.1
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,6,7 : Generic cleanups/improvements.
PATCH 2,3,10 : FW_READ_HI function implementation
PATCH 4-5: Add PMU snapshot feature in sbi pmu driver
PATCH 6-7: KVM implementation for snapshot and sampling in kvm guests
PATCH 11-15: KVM selftests for SBI PMU extension
The series is based on kvm-next and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v4
The series is based on kvm-riscv/queue branch + fixes suggested on the following
series
https://patchwork.kernel.org/project/kvm/cover/cover.1705916069.git.haibo1.…
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (15):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/errata_list.h | 2 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 14 +-
arch/riscv/include/asm/sbi.h | 12 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 14 +-
arch/riscv/kvm/vcpu_onereg.c | 9 +-
arch/riscv/kvm/vcpu_pmu.c | 247 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 15 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 229 ++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 92 +++
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
tools/testing/selftests/kvm/riscv/sbi_pmu.c | 588 ++++++++++++++++++
18 files changed, 1212 insertions(+), 45 deletions(-)
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
--
2.34.1
This series implements support for the 2023 dpISA extensions in KVM
guests, it was previously posted as part of a series with the host
support but that has now been merged so only the KVM portions remain.
Most of these extensions add only new instructions so the guest support
consists of adding the relevant ID registers, masking out other features
like the 2023 MTE extensions.
FEAT_FPMR introduces a new system register FPMR to the floating point
state which we enable guest access to and context switch when the ID
registers indicate that it is supported. Currently we implement
visibility for FPMR with a fpmr_visibility() function as for other
system registers, I will separately look into adding support for
specifying this in the struct sys_reg_desc.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v6:
- Rebase onto v6.9-rc1.
- The host portions of the series were merged so only the KVM guest
support remains.
- Link to v5: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-0-c568edc8ed7f@kerne…
Changes in v5:
- Rebase onto v6.8-rc3.
- Use u64 rather than unsigned long for storing FPMR.
- Temporarily drop KVM guest support due to issues with KVM being a
moving target.
- Link to v4: https://lore.kernel.org/r/20240122-arm64-2023-dpisa-v4-0-776e094861df@kerne…
Changes in v4:
- Rebase onto v6.8-rc1.
- Move KVM support to the end of the series.
- Link to v3: https://lore.kernel.org/r/20231205-arm64-2023-dpisa-v3-0-dbcbcd867a7f@kerne…
Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kerne…
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (5):
KVM: arm64: Share all userspace hardened thread data with the hypervisor
KVM: arm64: Add newly allocated ID registers to register descriptions
KVM: arm64: Support FEAT_FPMR for guests
KVM: arm64: selftests: Document feature registers added in 2023 extensions
KVM: arm64: selftests: Teach get-reg-list about FPMR
arch/arm64/include/asm/kvm_host.h | 6 ++++--
arch/arm64/include/asm/processor.h | 2 +-
arch/arm64/kvm/emulate-nested.c | 9 ++++++++
arch/arm64/kvm/fpsimd.c | 15 +++++++-------
arch/arm64/kvm/hyp/include/hyp/switch.h | 9 ++++++--
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++--
arch/arm64/kvm/sys_regs.c | 24 +++++++++++++++++++---
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 11 ++++++++--
8 files changed, 60 insertions(+), 20 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The test_offload.py test fits in networking and bpf equally
well. We started adding more Python tests in networking
and some of the code in test_offload.py can be reused,
so move it to networking. Looks like it bit rotted over
time and some fixes are needed.
Admittedly more code could be extracted but I only had
the time for a minor cleanup :(
Jakub Kicinski (4):
selftests: move bpf-offload test from bpf to net
selftests: net: bpf_offload: wait for maps
selftests: net: declare section names for bpf_offload
selftests: net: reuse common code in bpf_offload
tools/testing/selftests/bpf/Makefile | 1 -
tools/testing/selftests/net/Makefile | 11 +-
.../test_offload.py => net/bpf_offload.py} | 138 +++++-------------
tools/testing/selftests/net/lib/py/nsim.py | 9 +-
.../sample_map_ret0.bpf.c} | 2 +-
.../sample_ret0.c => net/sample_ret0.bpf.c} | 3 +
6 files changed, 57 insertions(+), 107 deletions(-)
rename tools/testing/selftests/{bpf/test_offload.py => net/bpf_offload.py} (93%)
rename tools/testing/selftests/{bpf/progs/sample_map_ret0.c => net/sample_map_ret0.bpf.c} (96%)
rename tools/testing/selftests/{bpf/progs/sample_ret0.c => net/sample_ret0.bpf.c} (70%)
--
2.44.0
New version of the sleepable bpf_timer code, without BPF changes, as
they can now go through the HID tree independantly:
https://lore.kernel.org/all/20240315-hid-bpf-sleepable-v4-0-5658f2540564@ke…
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Jiri Kosina <jikos(a)kernel.org>
To: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <linux-input(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-doc(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the BPF changes, they can go independently in bpf-core
- dropped the HID-BPF integration tests with the sleppable timers,
I'll re-add them once both series (this and sleepable timers) are
merged
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (7):
HID: bpf/dispatch: regroup kfuncs definitions
HID: bpf: export hid_hw_output_report as a BPF kfunc
selftests/hid: add KASAN to the VM tests
selftests/hid: Add test for hid_bpf_hw_output_report
HID: bpf: allow to inject HID event from BPF
selftests/hid: add tests for hid_bpf_input_report
HID: bpf: allow to use bpf_timer_set_sleepable_cb() in tracing callbacks.
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 226 ++++++++++++++-------
drivers/hid/hid-core.c | 2 +
include/linux/hid_bpf.h | 3 +
tools/testing/selftests/hid/config.common | 1 +
tools/testing/selftests/hid/hid_bpf.c | 112 +++++++++-
tools/testing/selftests/hid/progs/hid.c | 46 +++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 6 +
8 files changed, 324 insertions(+), 74 deletions(-)
---
base-commit: 3e78a6c0d3e02e4cf881dc84c5127e9990f939d6
change-id: 20240314-b4-hid-bpf-new-funcs-ecf05d0ef870
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Currently, the sud_test expects the emulated syscall to return the
emulated syscall number. This assumption only works on architectures
were the syscall calling convention use the same register for syscall
number/syscall return value. This is not the case for RISC-V and thus
the return value must be also emulated using the provided ucontext.
Signed-off-by: Clément Léger <cleger(a)rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Acked-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
Changes in V2:
- Changes comment to be more explicit
- Use A7 syscall arg rather than hardcoding MAGIC_SYSCALL_1
---
.../selftests/syscall_user_dispatch/sud_test.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_test.c b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
index b5d592d4099e..d975a6767329 100644
--- a/tools/testing/selftests/syscall_user_dispatch/sud_test.c
+++ b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
@@ -158,6 +158,20 @@ static void handle_sigsys(int sig, siginfo_t *info, void *ucontext)
/* In preparation for sigreturn. */
SYSCALL_DISPATCH_OFF(glob_sel);
+
+ /*
+ * The tests for argument handling assume that `syscall(x) == x`. This
+ * is a NOP on x86 because the syscall number is passed in %rax, which
+ * happens to also be the function ABI return register. Other
+ * architectures may need to swizzle the arguments around.
+ */
+#if defined(__riscv)
+/* REG_A7 is not defined in libc headers */
+# define REG_A7 (REG_A0 + 7)
+
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A0] =
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A7];
+#endif
}
TEST(dispatch_and_return)
--
2.43.0
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v4 -> v5:
- Add 1st commit - flush id checks in udp_gro_receive segment which can be
backported by itself
- Add TCP measurements for the 5th commit
- Add flush id tests to ensure flush id logic is preserved in GRO
- Simplify gro_inet_flush by removing a branch
- v4:
https://lore.kernel.org/all/20240325182543.87683-1-richardbgobert@gmail.com/
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/all/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.com/
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (6):
net: gro: add flush check in udp_gro_receive_segment
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
selftests/net: add flush id selftests
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 +-
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 93 ++++++++++++--
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +-
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 7 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 54 +-------
net/ipv4/fou_core.c | 9 +-
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 22 +---
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 31 +++--
net/ipv6/ip6_offload.c | 41 +++---
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 +-
tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
25 files changed, 336 insertions(+), 161 deletions(-)
--
2.36.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Add start_server_addr helper
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Add function pointer for __start_server
selftests/bpf: Add start_server_setsockopt helper
selftests/bpf: Use start_server_setsockopt in sockopt_inherit
selftests/bpf: Use connect_to_fd in sockopt_inherit
selftests/bpf: Use start_server_* in test_tcp_check_syncookie
selftests/bpf: Use connect_to_addr in test_tcp_check_syncookie
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 53 +++++++++----
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/sk_assign.c | 53 +------------
.../bpf/prog_tests/sockopt_inherit.c | 64 ++++------------
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++----------------
.../bpf/test_tcp_check_syncookie_user.c | 76 +++----------------
8 files changed, 85 insertions(+), 281 deletions(-)
--
2.40.1
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace. IMO, the
kernel should provide a set of workingset interfaces that should be
generic enough to accommodate the various use cases, and be extensible
to potential future use cases. The current proposed interfaces are not
sufficient in that regard, but I would like to start somewhere, solicit
feedback, and iterate.
Use cases
==========
Job scheduling
For data center machines, workingset information allows the job scheduler
to right-size each job and land more jobs on the same host or NUMA node,
and in the case of a job with increasing workingset, policy decisions
can be made to migrate other jobs off the host/NUMA node, or oom-kill
the misbehaving job. If the job shape is very different from the machine
shape, knowing the workingset per-node can also help inform page
allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting enables the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
While this patch series does not extend the virtio-balloon device,
balloon policies benefit from workingset to more precisely determine
the size of the memory balloon. On desktops/laptops/mobile devices where
memory is scarce and overcommitted, the balloon sizing in multiple VMs
running on the same device can be orchestrated with workingset reports
from each one.
Promotion/Demotion
Similar to proactive reclaim, a workingset report enables demotion to a
slower tier of memory.
For promotion, the workingset report interfaces need to be extended to
report hotness and gather hotness information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
I realize this does not generalize well to hotness information, but I
lack the intuition for an abstraction that presents hotness in a useful
way. Based on a recent proposal for move_phys_pages[2], it seems like
userspace tiering software would like to move specific physical pages,
instead of informing the kernel "move x number of hot pages to y
device". Please advise.
[2]
https://lore.kernel.org/lkml/20240319172609.332900-1-gregory.price@memverge…
Implementation
==========
Currently, the reporting of user pages is based off of MGLRU, and
therefore requires CONFIG_LRU_GEN=y. We would benefit from more MGLRU
generations for a more fine-grained workingset report. I will make the
generation count configurable in the next version. The workingset
reporting mechanism is gated behind CONFIG_WORKINGSET_REPORT, and the
aging thread is behind CONFIG_WORKINGSET_REPORT_AGING.
--
Changes from RFC v2 -> RFC v3:
- Update to v6.8
- Added an aging kernel thread (gated behind config)
- Added basic selftests for sysfs interface files
- Track swapped out pages for reaccesses
- Refactoring and cleanup
- Dropped the virtio-balloon extension to make things manageable
Changes from RFC v1 -> RFC v2:
- Refactored the patchs into smaller pieces
- Renamed interfaces and functions from wss to wsr (Working Set Reporting)
- Fixed build errors when CONFIG_WSR is not set
- Changed working_set_num_bins to u8 for virtio-balloon
- Added support for per-NUMA node reporting for virtio-balloon
[rfc v1]
https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.co…
[rfc v2]
https://lore.kernel.org/linux-mm/20230621180454.973862-1-yuanchu@google.com/
Yuanchu Xie (8):
mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true
mm: aggregate working set information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend working set reporting to memcgs
mm: add per-memcg reaccess histogram
mm: add kernel aging thread for workingset reporting
mm: test system-wide workingset reporting
drivers/base/node.c | 3 +
include/linux/memcontrol.h | 5 +
include/linux/mmzone.h | 4 +
include/linux/workingset_report.h | 107 +++
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 45 ++
mm/memcontrol.c | 386 ++++++++-
mm/mmzone.c | 2 +
mm/vmscan.c | 95 ++-
mm/workingset.c | 9 +-
mm/workingset_report.c | 757 ++++++++++++++++++
mm/workingset_report_aging.c | 127 +++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
.../testing/selftests/mm/workingset_report.c | 315 ++++++++
.../testing/selftests/mm/workingset_report.h | 37 +
.../selftests/mm/workingset_report_test.c | 328 ++++++++
18 files changed, 2231 insertions(+), 10 deletions(-)
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.44.0.396.g6e790dbe36-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- add two more patches.
- use log_err instead of ASSERT in v3.
- let send_recv_data return int as Martin suggested.
v2:
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (4):
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
selftests/bpf: Support nonblock for send_recv_data
tools/testing/selftests/bpf/network_helpers.c | 125 +++++++++++++++++-
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +---------
3 files changed, 121 insertions(+), 76 deletions(-)
--
2.40.1
Hi,
This patch series teaches KUnit to handle kthread faults as errors, and
it brings a few related fixes and improvements.
Shuah, everything should be OK now, could you please merge this series?
All these tests pass (on top of v6.8):
./tools/testing/kunit/kunit.py run --alltests
./tools/testing/kunit/kunit.py run --alltests --arch x86_64
./tools/testing/kunit/kunit.py run --alltests --arch arm64 \
--cross_compile=aarch64-linux-gnu-
I also built and ran KUnit tests as a kernel module.
A new test case check NULL pointer dereference, which wasn't possible
before.
This is useful to test current kernel self-protection mechanisms or
future ones such as Heki: https://github.com/heki-linux
Previous versions:
v3: https://lore.kernel.org/r/20240319104857.70783-1-mic@digikod.net
v2: https://lore.kernel.org/r/20240301194037.532117-1-mic@digikod.net
v1: https://lore.kernel.org/r/20240229170409.365386-1-mic@digikod.net
Regards,
Mickaël Salaün (7):
kunit: Handle thread creation error
kunit: Fix kthread reference
kunit: Fix timeout message
kunit: Handle test faults
kunit: Fix KUNIT_SUCCESS() calls in iov_iter tests
kunit: Print last test location on fault
kunit: Add tests for fault
include/kunit/test.h | 24 ++++++++++++++++++---
include/kunit/try-catch.h | 3 ---
kernel/kthread.c | 1 +
lib/kunit/kunit-test.c | 45 ++++++++++++++++++++++++++++++++++++++-
lib/kunit/try-catch.c | 38 ++++++++++++++++++++++-----------
lib/kunit_iov_iter.c | 18 ++++++++--------
6 files changed, 101 insertions(+), 28 deletions(-)
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
--
2.44.0
riscv-debug-spec [1] Chapter 5: Sdtrig extension introduces
trigger CSRs which can cause a breakpoint exception,
entry into Debug Mode, or a trace action without having to
execute a special instruction.
The focus in the following patches is on the two CSRs from
the Sdtrig extension: hcontext and scontext.
These two CSRs are optional according to the spec, apart from
the Smstateen extension [2], which has bit 57 to control the
accessbility of the hcontext/scontext CSRs.
We also introduce dt-binding in the CPU DTS for the existence of
the CSRs in situations where the Smstaten extension is not available.
The hcontext/scontext CSRs can help to raise triggers with the
textra32/textra64 CSRs set up correctly. (Chapter 5.7.17/ 5.7.18 [1])
Therefore, as part of Linux awareness debugging.
We propose the scontext CSR be filled by the Linux PID,
And the hcontext CSR be filled with a self-maintained Guest OS ID.
The reason for using the self-maintained Guest OS ID instead of VMID is
that VMID might change over time, and the user setting up the trigger
might enter the previous value, invoking the wrong VM for debugging.
The tests have been done on QEMU with Sdtrig CSRs
(mcontext/hcontext/scontext implemented) [3] boot on virt machine
and also run the Guest OS as virt machine with KVM enabled,
the two hcontext/scontext CSRs can be written correctly.
This patch series is based on v6.9-rc1.
Link: https://github.com/riscv/riscv-debug-spec/releases/download/ar20231208/risc… [1]
Link: https://github.com/riscvarchive/riscv-state-enable/releases/download/v1.0.0… [2]
Link: https://github.com/sifive/qemu/tree/dev/maxh/sdtrig_ISA [3]
Signed-off-by: Max Hsu <max.hsu(a)sifive.com>
---
Max Hsu (7):
dt-bindings: riscv: Add Sdtrig ISA extension
dt-bindings: riscv: Add Sdtrig optional CSRs existence on DT
riscv: Add ISA extension parsing for Sdtrig
riscv: Add Sdtrig CSRs definition, Smstateen bit to access Sdtrig CSRs
riscv: cpufeature: Add Sdtrig optional CSRs checks
riscv: suspend: add Smstateen CSRs save/restore
riscv: Add task switch support for scontext CSR
Yong-Xuan Wang (4):
riscv: KVM: Add Sdtrig Extension Support for Guest/VM
riscv: KVM: Add scontext to ONE_REG
riscv: KVM: Add hcontext support
KVM: riscv: selftests: Add Sdtrig Extension to get-reg-list test
Documentation/devicetree/bindings/riscv/cpus.yaml | 18 +++
.../devicetree/bindings/riscv/extensions.yaml | 7 +
arch/riscv/include/asm/csr.h | 6 +
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/kvm_host.h | 14 ++
arch/riscv/include/asm/kvm_vcpu_debug.h | 24 +++
arch/riscv/include/asm/suspend.h | 7 +
arch/riscv/include/asm/switch_to.h | 15 ++
arch/riscv/include/uapi/asm/kvm.h | 9 ++
arch/riscv/kernel/cpufeature.c | 162 +++++++++++++++++++++
arch/riscv/kernel/suspend.c | 25 ++++
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/main.c | 4 +
arch/riscv/kvm/vcpu.c | 8 +
arch/riscv/kvm/vcpu_debug.c | 107 ++++++++++++++
arch/riscv/kvm/vcpu_onereg.c | 63 +++++++-
arch/riscv/kvm/vm.c | 4 +
tools/testing/selftests/kvm/riscv/get-reg-list.c | 27 ++++
18 files changed, 500 insertions(+), 2 deletions(-)
---
base-commit: 317c7bc0ef035d8ebfc3e55c5dde0566fd5fb171
change-id: 20240329-dev-maxh-lin-452-6-9-c6209e6db67f
Best regards,
--
Max Hsu <max.hsu(a)sifive.com>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (2):
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
tools/testing/selftests/bpf/network_helpers.c | 85 +++++++++++++++++++
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +---------------
3 files changed, 87 insertions(+), 70 deletions(-)
--
2.40.1
From: Jason Xing <kernelxing(a)tencent.com>
The output goes like this if I make samples/bpf:
...warning: no previous prototype for ‘get_cgroup_id_from_path’...
Make this function static could solve the warning problem since
no one outside of the file calls it.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index 19be9c63d5e8..f2952a65dcc2 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -429,7 +429,7 @@ int create_and_get_cgroup(const char *relative_path)
* which is an invalid cgroup id.
* If there is a failure, it prints the error to stderr.
*/
-unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
+static unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
{
int dirfd, err, flags, mount_id, fhsize;
union {
--
2.37.3
v2:
- Found that rebuild_sched_domains() has external callers, revert its
change and introduce rebuild_sched_domains_cpuslocked() instead.
As discussed in the LKML thread [1], the asynchronous nature of cpuset
hotplug handling code is causing problem with RCU testing. With recent
changes in the way locking is being handled in the cpuset code, it is
now possible to make the cpuset hotplug code synchronous again without
major changes.
This series enables the hotplug code to call directly into cpuset hotplug
core without indirection with the exception of the special case of v1
cpuset becoming empty still being handled indirectly with workqueue.
A new simple test case was also written to test this special v1 cpuset
case. The test_cpuset_prs.sh script was also run with LOCKDEP on to
verify that there is no regression.
[1] https://lore.kernel.org/lkml/ZgYikMb5kZ7rxPp6@slm.duckdns.org/
Waiman Long (2):
cgroup/cpuset: Make cpuset hotplug processing synchronous
cgroup/cpuset: Add test_cpuset_v1_hp.sh
include/linux/cpuset.h | 3 -
kernel/cgroup/cpuset.c | 141 +++++++-----------
kernel/cpu.c | 48 ------
kernel/power/process.c | 2 -
tools/testing/selftests/cgroup/Makefile | 2 +-
.../selftests/cgroup/test_cpuset_v1_hp.sh | 46 ++++++
6 files changed, 103 insertions(+), 139 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
--
2.39.3
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (14):
queue_api: define queue api
net: page_pool: factor out page_pool recycle check
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 52 +++
Documentation/networking/devmem.rst | 271 ++++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/netdevice.h | 24 ++
include/linux/skbuff.h | 67 ++-
include/linux/socket.h | 1 +
include/net/devmem.h | 127 ++++++
include/net/netdev_rx_queue.h | 1 +
include/net/netmem.h | 234 +++++++++-
include/net/page_pool/helpers.h | 154 +++++--
include/net/page_pool/types.h | 28 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 14 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 2 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 413 ++++++++++++++++++
net/core/gro.c | 7 +-
net/core/netdev-genl-gen.c | 19 +
net/core/netdev-genl-gen.h | 2 +
net/core/netdev-genl.c | 123 ++++++
net/core/page_pool.c | 362 +++++++++-------
net/core/skbuff.c | 110 ++++-
net/core/sock.c | 61 +++
net/ipv4/tcp.c | 257 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 9 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++
42 files changed, 2764 insertions(+), 270 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.44.0.rc1.240.g4c46232300-goog
Here are some patches from Geliang, doing different cleanups, and
supporting 'ip mptcp' in more MPTCP selftests.
Patch 1 checks that TC is available in selftests requiring it.
Patch 2 adds 'ms' units in TC commands, to avoid confusions.
Patches 3-9 are some prerequisites for patch 10: some export code from
mptcp_join.sh to mptcp_lib.sh, to be re-used in pm_netlink.sh,
mptcp_sockopt.sh and simult_flows.sh ; and others add helpers to
pm_netlink.sh to easily support both 'ip mptcp' and 'pm_nl_ctl' tools to
interact with the in-kernel MPTCP path-manager.
Patch 10 adds a '-i' parameter in mptcp_sockopt.sh, pm_netlink.sh, and
simult_flows.sh to use 'ip mptcp' tool instead of 'pm_nl_ctl'.
Patch 11 fixes some ShellCheck warnings in pm_netlink.sh, in order to
drop a ShellCheck's 'disable' instruction.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (11):
selftests: mptcp: add tc check for check_tools
selftests: mptcp: add ms units for tc-netem delay
selftests: mptcp: export ip_mptcp to mptcp_lib
selftests: mptcp: netlink: add 'limits' helpers
selftests: mptcp: add {get,format}_endpoint(s) helpers
selftests: mptcp: netlink: add change_address helper
selftests: mptcp: join: update endpoint ops
selftests: mptcp: export pm_nl endpoint ops
selftests: mptcp: use pm_nl endpoint ops
selftests: mptcp: ip_mptcp option for more scripts
selftests: mptcp: netlink: drop disable=SC2086
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 155 ++----------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 135 ++++++++++
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 34 ++-
tools/testing/selftests/net/mptcp/pm_netlink.sh | 281 +++++++++++++--------
tools/testing/selftests/net/mptcp/simult_flows.sh | 20 +-
6 files changed, 375 insertions(+), 252 deletions(-)
---
base-commit: d76c740b2eaaddc5fc3a8b21eaec5b6b11e8c3f5
change-id: 20240405-upstream-net-next-20240405-mptcp-selftests-refactoring-f5ed9780df8e
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.
The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.
This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.
Running tests in tree directly:
$ ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..2
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
in tree via make:
$ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
[ ... ]
and installed externally, all seem to work:
$ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
$ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
[ ... ]
For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.
v3:
- fix up netdevsim C
- various small nits in other patches (see changelog in patches)
v2: https://lore.kernel.org/all/20240403023426.1762996-1-kuba@kernel.org/
- don't add to TARGETS, create a deperate variable with deps
- support and use with
- support and use passing arguments to tests
v1: https://lore.kernel.org/all/20240402010520.1209517-1-kuba@kernel.org/
Jakub Kicinski (5):
selftests: net: add scaffolding for Netlink tests in Python
selftests: nl_netdev: add a trivial Netlink netdev test
netdevsim: report stats by default, like a real device
selftests: drivers: add scaffolding for Netlink tests in Python
testing: net-drv: add a driver test for stats reporting
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 49 ++++++++
tools/testing/selftests/Makefile | 10 +-
tools/testing/selftests/drivers/net/Makefile | 7 ++
.../testing/selftests/drivers/net/README.rst | 30 +++++
.../selftests/drivers/net/lib/py/__init__.py | 17 +++
.../selftests/drivers/net/lib/py/env.py | 52 ++++++++
tools/testing/selftests/drivers/net/stats.py | 86 +++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/lib/Makefile | 8 ++
.../testing/selftests/net/lib/py/__init__.py | 7 ++
tools/testing/selftests/net/lib/py/consts.py | 9 ++
tools/testing/selftests/net/lib/py/ksft.py | 96 +++++++++++++++
tools/testing/selftests/net/lib/py/nsim.py | 115 ++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 47 +++++++
tools/testing/selftests/net/lib/py/ynl.py | 49 ++++++++
tools/testing/selftests/net/nl_netdev.py | 24 ++++
17 files changed, 617 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/drivers/net/Makefile
create mode 100644 tools/testing/selftests/drivers/net/README.rst
create mode 100644 tools/testing/selftests/drivers/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/env.py
create mode 100755 tools/testing/selftests/drivers/net/stats.py
create mode 100644 tools/testing/selftests/net/lib/Makefile
create mode 100644 tools/testing/selftests/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/net/lib/py/consts.py
create mode 100644 tools/testing/selftests/net/lib/py/ksft.py
create mode 100644 tools/testing/selftests/net/lib/py/nsim.py
create mode 100644 tools/testing/selftests/net/lib/py/utils.py
create mode 100644 tools/testing/selftests/net/lib/py/ynl.py
create mode 100755 tools/testing/selftests/net/nl_netdev.py
--
2.44.0
From: Mark Rutland <mark.rutland(a)arm.com>
[ Upstream commit 8ecab2e64572f1aecdfc5a8feae748abda6e3347 ]
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae30..3f74c09c56b62 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.43.0
From: Mark Rutland <mark.rutland(a)arm.com>
[ Upstream commit 8ecab2e64572f1aecdfc5a8feae748abda6e3347 ]
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae30..3f74c09c56b62 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Incorrect arguments are passed to fcntl() in test_sockmap.c when invoking
it to set file status flags. If O_NONBLOCK is used as 2nd argument and
passed into fcntl, -EINVAL will be returned (See do_fcntl() in fs/fcntl.c).
The correct approach is to use F_SETFL as 2nd argument, and O_NONBLOCK as
3rd one.
Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 024a0faafb3b..34d6a1e6f664 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -603,7 +603,7 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
struct timeval timeout;
fd_set w;
- fcntl(fd, fd_flags);
+ fcntl(fd, F_SETFL, fd_flags);
/* Account for pop bytes noting each iteration of apply will
* call msg_pop_data helper so we need to account for this
* by calculating the number of apply iterations. Note user
--
2.40.1
Introduce ring__consume_n() and ring_buffer__consume_n() API to
partially consume items from one (or more) ringbuffer(s).
This can be useful, for example, to consume just a single item or when
we need to copy multiple items to a limited user-space buffer from the
ringbuffer callback.
Practical example (where this API can be used):
https://github.com/sched-ext/scx/blob/b7c06b9ed9f72cad83c31e39e9c4e2cfd8683…
See also:
https://lore.kernel.org/lkml/20240310154726.734289-1-andrea.righi@canonical…
v4:
- add a selftest to test the new API
- open a new 1.5.0 cycle
v3:
- rename ring__consume_max() -> ring__consume_n() and
ring_buffer__consume_max() -> ring_buffer__consume_n()
- add new API to a new 1.5.0 cycle
- fixed minor nits / comments
v2:
- introduce a new API instead of changing the callback's retcode
behavior
Andrea Righi (4):
libbpf: Start v1.5 development cycle
libbpf: ringbuf: allow to consume up to a certain amount of items
libbpf: Add ring__consume_n / ring_buffer__consume_n
selftests/bpf: Add tests for ring__consume_n and ring_buffer__consume_n
tools/lib/bpf/libbpf.h | 12 +++++
tools/lib/bpf/libbpf.map | 6 +++
tools/lib/bpf/libbpf_version.h | 2 +-
tools/lib/bpf/ringbuf.c | 59 ++++++++++++++++++++----
tools/testing/selftests/bpf/prog_tests/ringbuf.c | 8 ++++
5 files changed, 76 insertions(+), 11 deletions(-)
This patchset allows for io_uring zerocopy to support REQ_F_CQE_SKIP,
skipping the normal completion notification, but not the zerocopy buffer
release notification.
This patchset also includes a test to test these changes, and a patch to
mini_liburing to enable io_uring_peek_cqe, which is needed for the test.
Oliver Crumrine (3):
io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
io_uring: Add io_uring_peek_cqe to mini_liburing
io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test
io_uring/net.c | 6 +--
tools/include/io_uring/mini_liburing.h | 18 +++++++++
.../selftests/net/io_uring_zerocopy_tx.c | 37 +++++++++++++++++--
.../selftests/net/io_uring_zerocopy_tx.sh | 7 +++-
4 files changed, 59 insertions(+), 10 deletions(-)
--
2.44.0
This series aims to improve the usability of the ftrace selftests when
running as part of the kselftest runner, mainly for use with automated
systems. It fixes the output of verbose mode when run in KTAP output
mode and then enables verbose mode by default when invoked from the
kselftest runner so that the diagnostic information is there by default
when run in automated systems.
I've split this into two patches in case there is a concern with one
part but not the other, especially given the verbosity of the verbose
output when it triggers.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
tracing/selftests: Support log output when generating KTAP output
tracing/selftests: Default to verbose mode when running in kselftest
tools/testing/selftests/ftrace/ftracetest | 8 +++++++-
tools/testing/selftests/ftrace/ftracetest-ktap | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240319-kselftest-ftrace-ktap-verbose-72e37957e213
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Earlier commit fc8b2a619469378 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
added check of potential number of UDP segment vs UDP_MAX_SEGMENTS
in linux/virtio_net.h.
After this change certification test of USO guest-to-guest
transmit on Windows driver for virtio-net device fails,
for example with packet size of ~64K and mss of 536 bytes.
In general the USO should not be more restrictive than TSO.
Indeed, in case of unreasonably small mss a lot of segments
can cause queue overflow and packet loss on the destination.
Limit of 128 segments is good for any practical purpose,
with minimal meaningful mss of 536 the maximal UDP packet will
be divided to ~120 segments.
Signed-off-by: Yuri Benditovich <yuri.benditovich(a)daynix.com>
---
include/linux/udp.h | 2 +-
tools/testing/selftests/net/udpgso.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 3748e82b627b..7e75ccdf25fe 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -108,7 +108,7 @@ struct udp_sock {
#define udp_assign_bit(nr, sk, val) \
assign_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags, val)
-#define UDP_MAX_SEGMENTS (1 << 6UL)
+#define UDP_MAX_SEGMENTS (1 << 7UL)
#define udp_sk(ptr) container_of_const(ptr, struct udp_sock, inet.sk)
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
index 1d975bf52af3..85b3baa3f7f3 100644
--- a/tools/testing/selftests/net/udpgso.c
+++ b/tools/testing/selftests/net/udpgso.c
@@ -34,7 +34,7 @@
#endif
#ifndef UDP_MAX_SEGMENTS
-#define UDP_MAX_SEGMENTS (1 << 6UL)
+#define UDP_MAX_SEGMENTS (1 << 7UL)
#endif
#define CONST_MTU_TEST 1500
--
2.34.3
"Bail out! " is not descriptive. It rather should be: "Failed: " and
then this added prefix doesn't need to be added everywhere. Usually in
the logs, we are searching for "Failed" or "Error" instead of "Bail
out" so it must be replace.
Remove Error/Failed prefixes from all usages as well.
Muhammad Usama Anjum (2):
selftests: Replace "Bail out" with "Error"
selftests: Remove Error/Failed prefix from ksft_exit_fail*() usages
tools/testing/selftests/exec/load_address.c | 8 +-
.../testing/selftests/exec/recursion-depth.c | 10 +-
tools/testing/selftests/kselftest.h | 2 +-
.../selftests/mm/map_fixed_noreplace.c | 24 +--
tools/testing/selftests/mm/map_populate.c | 2 +-
tools/testing/selftests/mm/mremap_dontunmap.c | 2 +-
tools/testing/selftests/mm/pagemap_ioctl.c | 166 +++++++++---------
.../selftests/mm/split_huge_page_test.c | 2 +-
8 files changed, 108 insertions(+), 108 deletions(-)
--
2.39.2
New version of the sleepable bpf_timer code, without the HID changes, as
they can now go through the HID tree indepandantly.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v5:
- took various reviews into account
- rewrote the tests to be separated to not have a uggly include
- Link to v4: https://lore.kernel.org/r/20240315-hid-bpf-sleepable-v4-0-5658f2540564@kern…
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (6):
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 132 ++++++++++++++-
kernel/bpf/verifier.c | 96 ++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 34 ++++
.../testing/selftests/bpf/progs/timer_sleepable.c | 185 +++++++++++++++++++++
10 files changed, 458 insertions(+), 9 deletions(-)
---
base-commit: 9187210eee7d87eea37b45ea93454a88681894a4
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
These patches from Geliang add support for the "last time" field in
MPTCP Info, and verify that the counters look valid.
Patch 1 adds these counters: last_data_sent, last_data_recv and
last_ack_recv. They are available in the MPTCP Info, so exposed via
getsockopt(MPTCP_INFO) and the Netlink Diag interface.
Patch 2 adds a test in diag.sh MPTCP selftest, to check that the
counters have moved by at least 250ms, after having waited twice that
time.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (2):
mptcp: add last time fields in mptcp_info
selftests: mptcp: test last time mptcp_info
include/uapi/linux/mptcp.h | 4 +++
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 7 ++++
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 4 +++
tools/testing/selftests/net/mptcp/diag.sh | 53 +++++++++++++++++++++++++++++++
6 files changed, 72 insertions(+)
---
base-commit: d76c740b2eaaddc5fc3a8b21eaec5b6b11e8c3f5
change-id: 20240405-upstream-net-next-20240405-mptcp-last-time-info-9b03618e08f1
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.
The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.
This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.
Running tests in tree directly:
$ ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..2
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
in tree via make:
$ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
[ ... ]
and installed externally, all seem to work:
$ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
$ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
[ ... ]
For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.
v2: (see patches for minor changes)
- don't add to TARGETS, create a deperate variable with deps
- support and use with
- support and use passing arguments to tests
v1: https://lore.kernel.org/all/20240402010520.1209517-1-kuba@kernel.org/
Jakub Kicinski (7):
netlink: specs: define ethtool header flags
tools: ynl: copy netlink error to NlError
selftests: net: add scaffolding for Netlink tests in Python
selftests: nl_netdev: add a trivial Netlink netdev test
netdevsim: report stats by default, like a real device
selftests: drivers: add scaffolding for Netlink tests in Python
testing: net-drv: add a driver test for stats reporting
Documentation/netlink/specs/ethtool.yaml | 6 +
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 45 +++++++
tools/net/ynl/lib/ynl.py | 3 +-
tools/testing/selftests/Makefile | 10 +-
tools/testing/selftests/drivers/net/Makefile | 7 ++
.../testing/selftests/drivers/net/README.rst | 30 +++++
.../selftests/drivers/net/lib/py/__init__.py | 17 +++
.../selftests/drivers/net/lib/py/env.py | 52 ++++++++
tools/testing/selftests/drivers/net/stats.py | 86 +++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/lib/Makefile | 8 ++
.../testing/selftests/net/lib/py/__init__.py | 7 ++
tools/testing/selftests/net/lib/py/consts.py | 9 ++
tools/testing/selftests/net/lib/py/ksft.py | 96 ++++++++++++++
tools/testing/selftests/net/lib/py/nsim.py | 118 ++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 47 +++++++
tools/testing/selftests/net/lib/py/ynl.py | 49 ++++++++
tools/testing/selftests/net/nl_netdev.py | 24 ++++
19 files changed, 624 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/Makefile
create mode 100644 tools/testing/selftests/drivers/net/README.rst
create mode 100644 tools/testing/selftests/drivers/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/env.py
create mode 100755 tools/testing/selftests/drivers/net/stats.py
create mode 100644 tools/testing/selftests/net/lib/Makefile
create mode 100644 tools/testing/selftests/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/net/lib/py/consts.py
create mode 100644 tools/testing/selftests/net/lib/py/ksft.py
create mode 100644 tools/testing/selftests/net/lib/py/nsim.py
create mode 100644 tools/testing/selftests/net/lib/py/utils.py
create mode 100644 tools/testing/selftests/net/lib/py/ynl.py
create mode 100755 tools/testing/selftests/net/nl_netdev.py
--
2.44.0
This patchset allows for io_uring zerocopy to support REQ_F_CQE_SKIP,
skipping the normal completion notification, but not the zerocopy buffer
release notification.
This patchset also includes a test to test these changes, and a patch to
mini_liburing to enable io_uring_peek_cqe, which is needed for the test.
Oliver Crumrine (3):
io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
io_uring: Add io_uring_peek_cqe to mini_liburing
io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test
io_uring/net.c | 6 +--
tools/include/io_uring/mini_liburing.h | 18 +++++++++
.../selftests/net/io_uring_zerocopy_tx.c | 37 +++++++++++++++++--
.../selftests/net/io_uring_zerocopy_tx.sh | 7 +++-
4 files changed, 59 insertions(+), 10 deletions(-)
--
2.44.0
Hi,
As mentioned in each patch, this implements the solution that we discussed in
December 2023, in [1]. This turned out to be very clean and easy. It should also
be quite easy to maintain.
This should also make Peter Zijlstra happy, because it directly addresses the
root cause of his "NAK NAK NAK" reply [2]. :)
I haven't done much build testing, because selftests are not so easy to build
with a cross-compiler. So it's just tested on x86 64-bit so far.
[1] https://lore.kernel.org/all/783a4178-1dec-4e30-989a-5174b8176b09@redhat.com/
[2] https://lore.kernel.org/lkml/20231103121652.GA6217@noisy.programming.kicks-…
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
John Hubbard (2):
selftests: break the dependency upon local header files
selftests/mm: fix additional build errors for selftests
tools/include/uapi/linux/memfd.h | 39 +++
tools/include/uapi/linux/userfaultfd.h | 386 +++++++++++++++++++++++++
tools/testing/selftests/lib.mk | 9 +
tools/testing/selftests/mm/Makefile | 2 +-
4 files changed, 435 insertions(+), 1 deletion(-)
create mode 100644 tools/include/uapi/linux/memfd.h
create mode 100644 tools/include/uapi/linux/userfaultfd.h
base-commit: 98560e9019851bf55b8a4073978a623a3bcf98c0
--
2.44.0
In this series, ksft_exit_fail_perror() is being added which is helper
function on top of ksft_exit_fail_msg(). It prints errno and its string
form always. After writing and porting several kselftests, I've found
out that most of times ksft_exit_fail_msg() isn't useful if errno value
isn't printed. The ksft_exit_fail_perror() provides a convenient way to
always print errno when its used.
Muhammad Usama Anjum (2):
selftests: add ksft_exit_fail_perror()
selftests: exec: Use new ksft_exit_fail_perror() helper
tools/testing/selftests/exec/recursion-depth.c | 10 +++++-----
tools/testing/selftests/kselftest.h | 14 ++++++++++++++
2 files changed, 19 insertions(+), 5 deletions(-)
--
2.39.2
The comment on top of the file is used by many developers to glance over
all the available functions. Add the recently added ksft_perror() to it.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/kselftest.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 7d650a06ca359..159bf8e314fa3 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -16,6 +16,7 @@
* For each test, report any progress, debugging, etc with:
*
* ksft_print_msg(fmt, ...);
+ * ksft_perror(msg);
*
* and finally report the pass/fail/skip/xfail state of the test with one of:
*
--
2.39.2
Hi,
(This is verified on the second test box.)
In the most recent 6.8.0 release of torvalds tree kernel with selftest configs on,
process ./iommufd appears to consume 99% of a CPU core for quote a while in an
endless loop:
root 59502 8816 0 Mar11 pts/2 00:00:00 make OUTPUT=/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu -C iommu run_tests O=/home/marvin/linux/kernel/linux_torvalds
root 59503 59502 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59516 59503 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59517 59516 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59518 59517 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59522 59518 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59523 59522 0 Mar11 pts/2 00:00:00 perl /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/prefix.pl
root 59635 2367 99 Mar11 pts/2 11:28:03 ./iommufd
root@stargazer:/home/marvin# strace -p 59635
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
.
.
.
Please find attached config. It is the vanilla kernel, the build suite marked it "dirty"
because of the modifications to the selftests (adding debug option, mostly).
The kseltest output is:
make[3]: Entering directory '/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu'
TAP version 13
1..2
# timeout set to 45
# selftests: iommu: iommufd
# TAP version 13
# 1..180
# # Starting 180 tests from 18 test cases.
# # RUN iommufd.simple_close ...
# # OK iommufd.simple_close
# ok 1 iommufd.simple_close
# # RUN iommufd.cmd_fail ...
# # OK iommufd.cmd_fail
# ok 2 iommufd.cmd_fail
# # RUN iommufd.cmd_length ...
# # OK iommufd.cmd_length
# ok 3 iommufd.cmd_length
# # RUN iommufd.cmd_ex_fail ...
# # OK iommufd.cmd_ex_fail
# ok 4 iommufd.cmd_ex_fail
# # RUN iommufd.global_options ...
# # OK iommufd.global_options
# ok 5 iommufd.global_options
# # RUN iommufd.simple_ioctls ...
# # OK iommufd.simple_ioctls
# ok 6 iommufd.simple_ioctls
# # RUN iommufd.unmap_cmd ...
# # OK iommufd.unmap_cmd
# ok 7 iommufd.unmap_cmd
# # RUN iommufd.map_cmd ...
# # OK iommufd.map_cmd
# ok 8 iommufd.map_cmd
# # RUN iommufd.info_cmd ...
# # OK iommufd.info_cmd
# ok 9 iommufd.info_cmd
# # RUN iommufd.set_iommu_cmd ...
# # OK iommufd.set_iommu_cmd
# ok 10 iommufd.set_iommu_cmd
# # RUN iommufd.vfio_ioas ...
# # OK iommufd.vfio_ioas
# ok 11 iommufd.vfio_ioas
# # RUN iommufd_ioas.no_domain.ioas_auto_destroy ...
# # OK iommufd_ioas.no_domain.ioas_auto_destroy
# ok 12 iommufd_ioas.no_domain.ioas_auto_destroy
# # RUN iommufd_ioas.no_domain.ioas_destroy ...
# # OK iommufd_ioas.no_domain.ioas_destroy
# ok 13 iommufd_ioas.no_domain.ioas_destroy
# # RUN iommufd_ioas.no_domain.alloc_hwpt_nested ...
# # OK iommufd_ioas.no_domain.alloc_hwpt_nested
# ok 14 iommufd_ioas.no_domain.alloc_hwpt_nested
# # RUN iommufd_ioas.no_domain.hwpt_attach ...
# # iommufd.c:541:hwpt_attach:Expected 2 (2) == errno (22)
# # hwpt_attach: Test failed at step #6
# # FAIL iommufd_ioas.no_domain.hwpt_attach
# not ok 15 iommufd_ioas.no_domain.hwpt_attach
# # RUN iommufd_ioas.no_domain.ioas_area_destroy ...
# # OK iommufd_ioas.no_domain.ioas_area_destroy
# ok 16 iommufd_ioas.no_domain.ioas_area_destroy
# # RUN iommufd_ioas.no_domain.ioas_area_auto_destroy ...
# # OK iommufd_ioas.no_domain.ioas_area_auto_destroy
# ok 17 iommufd_ioas.no_domain.ioas_area_auto_destroy
# # RUN iommufd_ioas.no_domain.get_hw_info ...
# # OK iommufd_ioas.no_domain.get_hw_info
# ok 18 iommufd_ioas.no_domain.get_hw_info
# # RUN iommufd_ioas.no_domain.area ...
# # OK iommufd_ioas.no_domain.area
# ok 19 iommufd_ioas.no_domain.area
# # RUN iommufd_ioas.no_domain.unmap_fully_contained_areas ...
# # OK iommufd_ioas.no_domain.unmap_fully_contained_areas
# ok 20 iommufd_ioas.no_domain.unmap_fully_contained_areas
# # RUN iommufd_ioas.no_domain.area_auto_iova ...
# # OK iommufd_ioas.no_domain.area_auto_iova
# ok 21 iommufd_ioas.no_domain.area_auto_iova
# # RUN iommufd_ioas.no_domain.area_allowed ...
# # OK iommufd_ioas.no_domain.area_allowed
# ok 22 iommufd_ioas.no_domain.area_allowed
# # RUN iommufd_ioas.no_domain.copy_area ...
# # OK iommufd_ioas.no_domain.copy_area
# ok 23 iommufd_ioas.no_domain.copy_area
# # RUN iommufd_ioas.no_domain.iova_ranges ...
# # OK iommufd_ioas.no_domain.iova_ranges
# ok 24 iommufd_ioas.no_domain.iova_ranges
# # RUN iommufd_ioas.no_domain.access_domain_destory ...
# # iommufd.c:916:access_domain_destory:Expected MAP_FAILED (18446744073709551615) != buf (18446744073709551615)
# # access_domain_destory: Test terminated by timeout
# # FAIL iommufd_ioas.no_domain.access_domain_destory
# not ok 25 iommufd_ioas.no_domain.access_domain_destory
# # RUN iommufd_ioas.no_domain.access_pin ...
# # iommufd.c:991:access_pin:Expected 0 (0) == _test_cmd_mock_domain(self->fd, self->ioas_id, &mock_stdev_id, &mock_hwpt_id, ((void *)0)) (-1)
The failing assert seems to be here:
987 /* Add/remove a domain with a user */
988 ASSERT_EQ(0, ioctl(self->fd,
989 _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
990 &access_cmd));
→ 991 test_cmd_mock_domain(self->ioas_id, &mock_stdev_id,
992 &mock_hwpt_id, NULL);
993 check_map_cmd.id = mock_hwpt_id;
994 ASSERT_EQ(0, ioctl(self->fd,
995 _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP),
996 &check_map_cmd));
For those of you who still do not have a clue what went wrong (like myself), I am trying
to generate a reproducer.
attempt to tap gdb on the running ./iommufd gave this:
root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu# gdb ./iommufd --pid 63963
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.1) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./iommufd...
Attaching to program: /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd, process 63963
Reading symbols from /usr/libexec/coreutils/libstdbuf.so...
(No debugging symbols found in /usr/libexec/coreutils/libstdbuf.so)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/c2/89da5071a3399de893d2af81d6a30c62646e1e.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/15/921ea631d9f36502d20459c43e5c85b7d6ab76.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__GI___ioctl (fd=5, request=request@entry=15264) at ../sysdeps/unix/sysv/linux/ioctl.c:36
36 ../sysdeps/unix/sysv/linux/ioctl.c: No such file or directory.
(gdb) bt
#0 __GI___ioctl (fd=5, request=request@entry=15264) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1 0x000057ed23d1f1ae in _test_ioctl_set_temp_memory_limit (limit=65536, fd=<optimized out>) at /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_utils.h:585
#2 iommufd_ioas_teardown (_metadata=_metadata@entry=0x57ed23d42860 <_iommufd_ioas_access_domain_destory_object>, self=self@entry=0x7ffdcdce5ef0, variant=<optimized out>) at iommufd.c:229
#3 0x000057ed23d23b7f in wrapper_iommufd_ioas_access_domain_destory (_metadata=0x57ed23d42860 <_iommufd_ioas_access_domain_destory_object>, variant=0x57ed23d43860 <_iommufd_ioas_mock_domain_object>) at iommufd.c:902
#4 0x000057ed23d1bfe9 in __run_test (f=f@entry=0x57ed23d438a0 <_iommufd_ioas_fixture_object>, variant=variant@entry=0x57ed23d43860 <_iommufd_ioas_mock_domain_object>,
t=t@entry=0x57ed23d42860 <_iommufd_ioas_access_domain_destory_object>) at ../kselftest_harness.h:1134
#5 0x000057ed23d12146 in test_harness_run (argv=0x7ffdcdce61a8, argc=1) at ../kselftest_harness.h:1199
#6 main (argc=1, argv=0x7ffdcdce61a8) at iommufd.c:2349
(gdb) list iommufd.c:2349
2344 &unmap_cmd));
2345 }
2346 }
2347 }
2348
2349 TEST_HARNESS_MAIN
(gdb)
Hope this helps someone.
Best regards,
Mirsad Todorovac
This patch addresses an issue in the selftests/harness where an assertion within FIXTURE_TEARDOWN could trigger an infinite loop. The problem arises because the teardown procedure is meant to execute once, but the presence of failing assertions (ASSERT_EQ(0, 1)) leads to repeated attempts to execute teardown due to the long jump mechanism used by the harness for handling assertions.
To resolve this, the patch ensures that the teardown process runs only once, regardless of assertion outcomes, preventing the infinite loop and allowing tests to fail.
Signed-off-by: Shengyu Li <shengyu.li.evgeny(a)gmail.com>
---
tools/testing/selftests/kselftest_harness.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 4fd735e48ee7..230d62884885 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -383,6 +383,7 @@
FIXTURE_DATA(fixture_name) self; \
pid_t child = 1; \
int status = 0; \
+ bool jmp = false; \
memset(&self, 0, sizeof(FIXTURE_DATA(fixture_name))); \
if (setjmp(_metadata->env) == 0) { \
/* Use the same _metadata. */ \
@@ -399,8 +400,10 @@
_metadata->exit_code = KSFT_FAIL; \
} \
} \
+ else \
+ jmp = true; \
if (child == 0) { \
- if (_metadata->setup_completed && !_metadata->teardown_parent) \
+ if (_metadata->setup_completed && !_metadata->teardown_parent && !jmp) \
fixture_name##_teardown(_metadata, &self, variant->data); \
_exit(0); \
} \
--
2.25.1
Hi,
We have caught bugs in kselftest suites on linux-next and on stable-RCs etc
when using clang. There are two types of bugs (logs with clang-17 are
attached.):
As usually people use GCC, there are GCC-specific flags added to the
Makefiles that clang doesn't recognize. For example:
* clang: error: argument unused during compilation: '-pie'
[-Werror,-Wunused-command-line-argument]
* clang: error: unknown argument '-static-libasan'; did you mean
'-static-libsan'?
* clang: error: cannot specify -o when generating multiple output files
Clang has best static analysis tools. It is reporting static errors. For
example:
* test_execve.c:121:13: warning: variable 'have_outer_privilege' is used
uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
* test_execve.c:121:9: note: remove the 'if' if its condition is always true
* test_memcontrol.c:727:6: warning: variable 'fd' is used uninitialized
whenever 'if' condition is true [-Wsometimes-uninitialized]
We have found these issues through our new KernelCI system when enabling
kselftest and clang there. The new system dashboard is a WIP, so It is not
the web dashboard you are used-to with in KernelCI. We can show you ways of
pulling the data if you are interest into.
Unless the above is some sort of false-positive or misconfiguration, it
would be great to support clang for kselftests. What we can do from our
side is that clang kselftests builds should be enabled on KernelCI to find
and fix the errors. What is your stance about this?
Thanks,
Usama
Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.
The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.
This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.
Running tests in tree directly:
$ ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..2
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
in tree via make:
$ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
[ ... ]
and installed externally, all seem to work:
$ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
$ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
[ ... ]
For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.
Jakub Kicinski (7):
netlink: specs: define ethtool header flags
tools: ynl: copy netlink error to NlError
selftests: net: add scaffolding for Netlink tests in Python
selftests: nl_netdev: add a trivial Netlink netdev test
netdevsim: report stats by default, like a real device
selftests: drivers: add scaffolding for Netlink tests in Python
testing: net-drv: add a driver test for stats reporting
Documentation/netlink/specs/ethtool.yaml | 5 +
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 45 +++++++
tools/net/ynl/lib/ynl.py | 3 +-
tools/testing/selftests/Makefile | 8 ++
tools/testing/selftests/drivers/net/Makefile | 7 ++
.../testing/selftests/drivers/net/README.rst | 30 +++++
.../selftests/drivers/net/lib/py/__init__.py | 17 +++
.../selftests/drivers/net/lib/py/env.py | 41 ++++++
tools/testing/selftests/drivers/net/stats.py | 85 +++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/lib/Makefile | 8 ++
.../testing/selftests/net/lib/py/__init__.py | 7 ++
tools/testing/selftests/net/lib/py/consts.py | 9 ++
tools/testing/selftests/net/lib/py/ksft.py | 96 ++++++++++++++
tools/testing/selftests/net/lib/py/nsim.py | 118 ++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 47 +++++++
tools/testing/selftests/net/lib/py/ynl.py | 49 ++++++++
tools/testing/selftests/net/nl_netdev.py | 24 ++++
19 files changed, 610 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/drivers/net/Makefile
create mode 100644 tools/testing/selftests/drivers/net/README.rst
create mode 100644 tools/testing/selftests/drivers/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/env.py
create mode 100755 tools/testing/selftests/drivers/net/stats.py
create mode 100644 tools/testing/selftests/net/lib/Makefile
create mode 100644 tools/testing/selftests/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/net/lib/py/consts.py
create mode 100644 tools/testing/selftests/net/lib/py/ksft.py
create mode 100644 tools/testing/selftests/net/lib/py/nsim.py
create mode 100644 tools/testing/selftests/net/lib/py/utils.py
create mode 100644 tools/testing/selftests/net/lib/py/ynl.py
create mode 100755 tools/testing/selftests/net/nl_netdev.py
--
2.44.0
Add specification for test metadata to the KTAP v2 spec.
KTAP v1 only specifies the output format of very basic test information:
test result and test name. Any additional test information either gets
added to general diagnostic data or is not included in the output at all.
The purpose of KTAP metadata is to create a framework to include and
easily identify additional important test information in KTAP.
KTAP metadata could include any test information that is pertinent for
user interaction before or after the running of the test. For example,
the test file path or the test speed.
Since this includes a large variety of information, this specification
will recognize notable types of KTAP metadata to ensure consistent format
across test frameworks. See the full list of types in the specification.
Example of KTAP Metadata:
KTAP version 2
#:ktap_test: main
#:ktap_arch: uml
1..1
KTAP version 2
#:ktap_test: suite_1
#:ktap_subsystem: example
#:ktap_test_file: lib/test.c
1..2
ok 1 test_1
#:ktap_test: test_2
#:ktap_speed: very_slow
# test_2 has begun
#:custom_is_flaky: true
ok 2 test_2
# suite_1 has passed
ok 1 suite_1
The changes to the KTAP specification outline the format, location, and
different types of metadata.
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Note this version is in reponse to comments made off the list asking for
more explanation on inheritance and edge cases.
Changes since v3:
- Add two metadata ktap_config and ktap_id
- Add section on metadata inheritance
- Add edge case examples
Documentation/dev-tools/ktap.rst | 248 ++++++++++++++++++++++++++++++-
1 file changed, 244 insertions(+), 4 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..55bc43cd5aea 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -17,19 +17,22 @@ KTAP test results describe a series of tests (which may be nested: i.e., test
can have subtests), each of which can contain both diagnostic data -- e.g., log
lines -- and a final result. The test structure and results are
machine-readable, whereas the diagnostic data is unstructured and is there to
-aid human debugging.
+aid human debugging. Since version 2, tests can also contain metadata which
+consists of important supplemental test information and can be
+machine-readable.
+
+KTAP output is built from five different types of lines:
-KTAP output is built from four different types of lines:
- Version lines
- Plan lines
- Test case result lines
- Diagnostic lines
+- Metadata lines
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
-a couple of places (notably the "Subtest" header), which are described where
-relevant later in this document.
+a couple of places, which are described where relevant later in this document.
Version lines
-------------
@@ -166,6 +169,237 @@ even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
+KTAP metadata lines
+-------------------
+
+KTAP metadata lines are used to include and easily identify important
+supplemental test information in KTAP. These lines may appear similar to
+diagnostic lines. The format of metadata lines is below:
+
+.. code-block:: none
+
+ #:<prefix>_<metadata type>: <metadata value>
+
+The <prefix> indicates where to find the specification for the type of
+metadata, such as the name of a test framework or "ktap" to indicate this
+specification. The list of currently approved prefixes and where to find the
+documentation of the metadata types is below. Note any metadata type that does
+not use a prefix from the list below must use the prefix "custom".
+
+Current List of Approved Prefixes:
+
+- ``ktap``: See Types of KTAP Metadata below for the list of metadata types.
+
+The format of <metadata type> and <value> varies based on the type. See the
+individual specification. For "custom" types the <metadata type> can be any
+string excluding ":", spaces, or newline characters and the <value> can be any
+string.
+
+**Location:**
+
+The first KTAP metadata line for a test must be "#:ktap_test: <test name>",
+which acts as a header to associate metadata with the correct test. Metadata
+for the main KTAP level uses the test name "main". A test's metadata ends
+with a "ktap_test" line for a different test.
+
+For test cases, the location of the metadata is between the prior test result
+line and the current test result line. For test suites, the location of the
+metadata is between the suite's version line and test plan line. For the main
+level, the location of the metadata is between the main version line and main
+test plan line. See the example below.
+
+Note that a test case's metadata is inline with the test's result line. Whereas
+a suite's metadata is inline with the suite's version line and thus will be
+more indented than the suite's result line. Additionally, metadata for the main
+level is inline with the main version line.
+
+KTAP metadata for a test does not need to be contiguous. For example, a kernel
+warning or other diagnostic output could interrupt metadata lines. However, it
+is recommended to keep a test's metadata lines in the correct location and
+together when possible, as this improves readability.
+
+**Example of KTAP metadata:**
+
+::
+
+ KTAP version 2
+ #:ktap_test: main
+ #:ktap_arch: uml
+ 1..1
+ KTAP version 2
+ #:ktap_test: suite_1
+ #:ktap_subsystem: example
+ #:ktap_test_file: lib/test.c
+ 1..2
+ # WARNING: test_1 skipped
+ ok 1 test_1 # SKIP
+ #:ktap_test: test_2
+ #:ktap_speed: very_slow
+ # test_2 has begun
+ #:custom_is_flaky: true
+ ok 2 test_2
+ # suite_1 passed
+ ok 1 suite_1
+
+In this example, the tests are running on UML. The test suite "suite_1" is part
+of the subsystem "example" and belongs to the file "lib/test.c". It has
+two subtests, "test_1" and "test_2". The subtest "test_2" has a speed of
+"very_slow" and has been marked with a custom KTAP metadata type called
+"custom_is_flaky" with the value of "true".
+
+**Inheritance of KTAP metadata**
+
+Tests can inherit KTAP metadata. A child test inherits all the parent test's
+KTAP metadata except for directly opposing metadata. For example, if a suite
+has a property of "#:ktap_speed: slow", all child test cases are also marked as
+slow. However, if one of the test cases has metadata of "#:ktap_speed:
+very_slow" then that test case would be marked as very_slow instead and not
+slow.
+
+Note if a test case inherits metadata it does not need to appear as a line in
+the KTAP. Using the example above, not every test case would have the line
+"#:ktap_speed: slow" in their metadata.
+
+**Edge Case Examples of KTAP metadata**
+
+Here are a few edge case examples of KTAP metadata. The first example shows
+metadata in the wrong location.
+
+::
+
+ KTAP version 2
+ 1..1
+ KTAP version 2
+ #:ktap_test: suite_1
+ 1..3
+ ok 1 test_1
+ #:ktap_test: test_2
+ #:ktap_speed: very_slow
+ ok 2 test_2
+ #:ktap_duration: 1.342s
+ #:ktap_test: test_3
+ #:ktap_speed: slow
+ ok 3 test_3
+ ok 1 suite_1
+
+In this example, the metadata "#:ktap_duration: 1.342s" is in the wrong
+location. It was meant to belong to test_2 but was printed late. The location
+of this metadata is not recommended. However, it is allowed because the line is
+still below "#:ktap_test: test_2" and above any other ktap_test lines.
+
+This second example shows metadata in the correct location but without the
+proper header.
+
+::
+
+ KTAP version 2
+ 1..1
+ KTAP version 2
+ #:ktap_test: suite_1
+ 1..2
+ not ok 1 test_1
+ #:ktap_speed: very_slow
+ ok 2 test_2
+ ok 1 suite_1
+
+In this example, the metadata "#:ktap_speed: very_slow" is meant to belong to
+test_2. It is in the correct location but does not fall below a ktap_test line
+for test_2. Instead this metadata might be mistaken for belonging to suite_1
+because it does fall under the ktap_test line for suite_1. This lack of header
+is not allowed.
+
+**Types of KTAP Metadata:**
+
+This is the current list of KTAP metadata types recognized in this
+specification. Note that all of these metadata types are optional (except for
+ktap_test as the KTAP metadata header).
+
+- ``ktap_test``: Name of test (used as header of KTAP metadata). This should
+ match the test name printed in the test result line: "ok 1 [test_name]".
+
+- ``ktap_module``: Name of the module containing the test
+
+- ``ktap_subsystem``: Name of the subsystem being tested
+
+- ``ktap_start_time``: Time tests started in ISO8601 format
+
+ - Example: "#:ktap_start_time: 2024-01-09T13:09:01.990000+00:00"
+
+- ``ktap_duration``: Time taken (in seconds) to execute the test
+
+ - Example: "#:ktap_duration: 10.154s"
+
+- ``ktap_speed``: Category of how fast test runs: "normal", "slow", or
+ "very_slow"
+
+- ``ktap_test_file``: Path to source file containing the test. This metadata
+ line can be repeated if the test is spread across multiple files.
+
+ - Example: "#:ktap_test_file: lib/test.c"
+
+- ``ktap_generated_file``: Description of and path to file generated during
+ test execution. This could be a core dump, generated filesystem image, some
+ form of visual output (for graphics drivers), etc. This metadata line can be
+ repeated to attach multiple files to the test. Note use ktap_log_file or
+ ktap_error_file instead of this type if more applicable.
+
+ - Example: "#:ktap_generated_file: Core dump: /var/lib/systemd/coredump/hello.core"
+
+- ``ktap_log_file``: Path to file containing kernel log test output
+
+ - Example: "#:ktap_log_file: /sys/kernel/debugfs/kunit/example/results"
+
+- ``ktap_error_file``: Path to file containing context for test failure or
+ error. This could include the difference between optimal test output and
+ actual test output.
+
+ - Example: "#:ktap_error_file: fs/results/example.out.bad"
+
+- ``ktap_results_url``: Link to webpage describing this test run and its
+ results
+
+ - Example: "#:ktap_results_url: https://kcidb.kernelci.org/hello"
+
+- ``ktap_arch``: Architecture used during test run
+
+ - Example: "#:ktap_arch: x86_64"
+
+- ``ktap_compiler``: Compiler used during test run
+
+ - Example: "#:ktap_compiler: gcc (GCC) 10.1.1 20200507 (Red Hat 10.1.1-1)"
+
+- ``ktap_respository_url``: Link to git repository of the checked out code.
+
+ - Example: "#:ktap_respository_url: https://github.com/torvalds/linux.git"
+
+- ``ktap_git_branch``: Name of git branch of checked out code
+
+ - Example: "#:ktap_git_branch: kselftest/kunit"
+
+- ``ktap_kernel_version``: Version of Linux Kernel being used during test run
+
+ - Example: "#:ktap_kernel_version: 6.7-rc1"
+
+- ``ktap_config``: Config name and value. This does not necessarly need to be
+ restricted to Kconfig.
+
+ - Example: "#:ktap_config: CONFIG_SYSFS=y"
+
+- ``ktap_id``: Description of ID and ID value. This is an open-ended metadata
+ used for IDs, such as checkout id or test run id.
+
+ - Example: "#:ktap_id: Test run id: 14e782"
+
+- ``ktap_commit_hash``: The full git commit hash of the checked out base code.
+
+ - Example: "#:ktap_commit_hash: 064725faf8ec2e6e36d51e22d3b86d2707f0f47f"
+
+**Other Metadata Types:**
+
+There can also be KTAP metadata that is not included in the recognized list
+above. This metadata must be prefixed with the test framework, ie. "kselftest",
+or with the prefix "custom". For example, "# custom_batch: 20".
+
Unknown lines
-------------
@@ -206,6 +440,7 @@ An example of a test with two nested subtests:
KTAP version 2
1..1
KTAP version 2
+ #:ktap_test: example
1..2
ok 1 test_1
not ok 2 test_2
@@ -219,6 +454,7 @@ An example format with multiple levels of nested testing:
KTAP version 2
1..2
KTAP version 2
+ #:ktap_test: example_test_1
1..2
KTAP version 2
1..2
@@ -254,6 +490,7 @@ Example KTAP output
KTAP version 2
1..1
KTAP version 2
+ #:ktap_test: main_test
1..3
KTAP version 2
1..1
@@ -261,11 +498,14 @@ Example KTAP output
ok 1 test_1
ok 1 example_test_1
KTAP version 2
+ #:ktap_test: example_test_2
+ #:ktap_speed: slow
1..2
ok 1 test_1 # SKIP test_1 skipped
ok 2 test_2
ok 2 example_test_2
KTAP version 2
+ #:ktap_test: example_test_3
1..3
ok 1 test_1
# test_2: FAIL
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.44.0.478.gd926399ef9-goog
The commit e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in
the dirty ring logging test") backported the fix from v6.8 to stable
v6.1. However, since the patch uses 'TEST_ASSERT_EQ()', which doesn't
exist on v6.1, the following build error is seen:
dirty_log_test.c:775:2: error: call to undeclared function
'TEST_ASSERT_EQ'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
TEST_ASSERT_EQ(sem_val, 0);
^
1 error generated.
Replace the macro with its equivalent, 'ASSERT_EQ()' to fix the issue.
Fixes: e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com>
---
tools/testing/selftests/kvm/dirty_log_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ec40a33c29fd..711b9e4d86aa 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -772,9 +772,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
* verification of all iterations.
*/
sem_getvalue(&sem_vcpu_stop, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
sem_getvalue(&sem_vcpu_cont, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
base-commit: e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
--
2.44.0.478.gd926399ef9-goog
The commit e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in
the dirty ring logging test") backported the fix from v6.8 to stable
v6.1. However, since the patch uses 'TEST_ASSERT_EQ()', which doesn't
exist on v6.1, the following build error is seen:
dirty_log_test.c:775:2: error: call to undeclared function
'TEST_ASSERT_EQ'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
TEST_ASSERT_EQ(sem_val, 0);
^
1 error generated.
Replace the macro with its equivalent, 'ASSERT_EQ()' to fix the issue.
Fixes: e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com>
Change-Id: I52c2c28d962e482bb4f40f285229a2465ed59d7e
---
tools/testing/selftests/kvm/dirty_log_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ec40a33c29fd..711b9e4d86aa 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -772,9 +772,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
* verification of all iterations.
*/
sem_getvalue(&sem_vcpu_stop, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
sem_getvalue(&sem_vcpu_cont, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
base-commit: e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
--
2.44.0.478.gd926399ef9-goog
As discussed in the LKML thread [1], the asynchronous nature of cpuset
hotplug handling code is causing problem with RCU testing. With recent
changes in the way locking is being handled in the cpuset code, it is
now possible to make the cpuset hotplug code synchronous again without
major changes.
This series enables the hotplug code to call directly into cpuset hotplug
core without indirection with the exception of the special case of v1
cpuset becoming empty still being handled indirectly with workqueue.
A new simple test case was also written to test this special v1 cpuset
case. The test_cpuset_prs.sh script was also run with LOCKDEP on to
verify that there is no regression.
[1] https://lore.kernel.org/lkml/ZgYikMb5kZ7rxPp6@slm.duckdns.org/
Waiman Long (2):
cgroup/cpuset: Make cpuset hotplug processing synchronous
cgroup/cpuset: Add test_cpuset_v1_hp.sh
include/linux/cpuset.h | 3 -
kernel/cgroup/cpuset.c | 131 +++++++-----------
kernel/cpu.c | 48 -------
kernel/power/process.c | 2 -
tools/testing/selftests/cgroup/Makefile | 2 +-
.../selftests/cgroup/test_cpuset_v1_hp.sh | 40 ++++++
6 files changed, 88 insertions(+), 138 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
--
2.39.3
In a follow up to these patches,
- commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect")
- commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()")
- commit c889a99a21bf("net: prevent address rewrite in kernel_bind()")
- commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg")
this patch series introduces BPF selftests that test the interaction
between BPF sockaddr hooks and socket operations in kernel space. It
focuses on regression test coverage to ensure that these operations do not
overwrite their address parameter and also provides some sanity checks
around kernel_getpeername() and kernel_getsockname().
It introduces two new components: a kernel module called sock_addr_testmod
and a new test program called sock_addr_kern which is loosely modeled after
and adapted from the old-style bpf/test_sock_addr.c selftest. When loaded,
the kernel module will perform some socket operation in kernel space. The
kernel module accepts five parameters controlling which socket operation
will be performed and its inputs:
MODULE_PARM_DESC(ip, "IPv4/IPv6/Unix address to use for socket operation");
MODULE_PARM_DESC(port, "Port number to use for socket operation");
MODULE_PARM_DESC(af, "Address family (AF_INET, AF_INET6, or AF_UNIX)");
MODULE_PARM_DESC(type, "Socket type (SOCK_STREAM or SOCK_DGRAM)");
MODULE_PARM_DESC(op, "Socket operation (BIND=0, CONNECT=1, SENDMSG=2)");
On module init, the socket operation is performed and results of are
exposed through debugfs.
- /sys/kernel/debug/sock_addr_testmod/success
Indicates success or failure of the operation.
- /sys/kernel/debug/sock_addr_testmod/addr
The value of the address parameter after the operation.
- /sys/kernel/debug/sock_addr_testmod/sock_name
The value of kernel_getsockname() after the socket operation (if relevant).
- /sys/kernel/debug/sock_addr_testmod/peer_name
The value of kernel_getpeername(() after the socket operation (if relevant).
The sock_addr_kern test program loads and unloads the kernel module to
drive kernel socket operations, reads the results from debugfs, makes sure
that the operation did not overwrite the address, and any result from
kernel_getpeername() or kernel_getsockname() were as expected.
== Patches ==
- Patch 1 introduces sock_addr_testmod and functions necessary for the test
program to load and unload the module.
- Patches 2-6 transform existing test helpers and introduce new test helpers
to enable the sock_addr_kern test program.
- Patch 7 implements the sock_addr_kern test program.
- Patch 8 fixes the sock_addr bind test program to work for big endian
architectures such as s390x.
Jordan Rife (8):
selftests/bpf: Introduce sock_addr_testmod
selftests/bpf: Add module load helpers
selftests/bpf: Factor out cmp_addr
selftests/bpf: Add recv_msg_from_client to network helpers
selftests/bpf: Factor out load_path and defines from test_sock_addr
selftests/bpf: Add setup/cleanup subcommands
selftests/bpf: Add sock_addr_kern prog_test
selftests/bpf: Fix bind program for big endian systems
tools/testing/selftests/bpf/Makefile | 46 +-
tools/testing/selftests/bpf/network_helpers.c | 65 ++
tools/testing/selftests/bpf/network_helpers.h | 5 +
.../selftests/bpf/prog_tests/sock_addr.c | 34 -
.../selftests/bpf/prog_tests/sock_addr_kern.c | 631 ++++++++++++++++++
.../testing/selftests/bpf/progs/bind4_prog.c | 18 +-
.../testing/selftests/bpf/progs/bind6_prog.c | 18 +-
tools/testing/selftests/bpf/progs/bind_prog.h | 19 +
.../testing/selftests/bpf/sock_addr_helpers.c | 46 ++
.../testing/selftests/bpf/sock_addr_helpers.h | 44 ++
.../bpf/sock_addr_testmod/.gitignore | 6 +
.../selftests/bpf/sock_addr_testmod/Makefile | 20 +
.../bpf/sock_addr_testmod/sock_addr_testmod.c | 256 +++++++
tools/testing/selftests/bpf/test_sock_addr.c | 76 +--
tools/testing/selftests/bpf/test_sock_addr.sh | 10 +-
tools/testing/selftests/bpf/testing_helpers.c | 44 +-
tools/testing/selftests/bpf/testing_helpers.h | 2 +
17 files changed, 1196 insertions(+), 144 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_addr_kern.c
create mode 100644 tools/testing/selftests/bpf/progs/bind_prog.h
create mode 100644 tools/testing/selftests/bpf/sock_addr_helpers.c
create mode 100644 tools/testing/selftests/bpf/sock_addr_helpers.h
create mode 100644 tools/testing/selftests/bpf/sock_addr_testmod/.gitignore
create mode 100644 tools/testing/selftests/bpf/sock_addr_testmod/Makefile
create mode 100644 tools/testing/selftests/bpf/sock_addr_testmod/sock_addr_testmod.c
--
2.44.0.478.gd926399ef9-goog
There is a spelling mistake in a ksft_test_result_skip message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mm/ksm_functional_tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
index 2d277620fad2..db845dca8d19 100644
--- a/tools/testing/selftests/mm/ksm_functional_tests.c
+++ b/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -502,7 +502,7 @@ static void test_child_ksm_err(int status)
else if (status == -2)
ksft_test_result_fail("Merge in child failed\n");
else if (status == -3)
- ksft_test_result_skip("Merge in child skiped\n");
+ ksft_test_result_skip("Merge in child skipped\n");
}
/* Verify that prctl ksm flag is inherited. */
--
2.39.2
Here are two fixes related to MPTCP.
The first patch fixes when the MPTcpExtMPCapableFallbackACK MIB counter
is modified: it should only be incremented when a connection was using
MPTCP options, but then a fallback to TCP has been done. This patch also
checks the counter is not incremented by mistake during the connect
selftests. This counter was wrongly incremented since its introduction
in v5.7.
The second patch fixes a wrong parsing of the 'dev' endpoint options in
the selftests: the wrong variable was used. This option was not used
before, but it is going to be soon. This issue is visible since v5.18.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Davide Caratti (1):
mptcp: don't account accept() of non-MPC client as fallback to TCP
Geliang Tang (1):
selftests: mptcp: join: fix dev in check_endpoint
net/mptcp/protocol.c | 2 --
net/mptcp/subflow.c | 2 ++
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++-
4 files changed, 14 insertions(+), 3 deletions(-)
---
base-commit: 0ba80d96585662299d4ea4624043759ce9015421
change-id: 20240329-upstream-net-20240329-fallback-mib-b0fec9c6189b
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The netdev CI runs in a VM and captures serial, so stdout and
stderr get combined. Because there's a missing new line in
stderr the test ends up corrupting KTAP:
# Successok 1 selftests: net: reuseaddr_conflict
which should have been:
# Success
ok 1 selftests: net: reuseaddr_conflict
Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
Low risk and seems worth backporting to stable, hence the fixes tag.
CC: shuah(a)kernel.org
CC: jbacik(a)fb.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/net/reuseaddr_conflict.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/reuseaddr_conflict.c b/tools/testing/selftests/net/reuseaddr_conflict.c
index 7c5b12664b03..bfb07dc49518 100644
--- a/tools/testing/selftests/net/reuseaddr_conflict.c
+++ b/tools/testing/selftests/net/reuseaddr_conflict.c
@@ -109,6 +109,6 @@ int main(void)
fd1 = open_port(0, 1);
if (fd1 >= 0)
error(1, 0, "Was allowed to create an ipv4 reuseport on an already bound non-reuseport socket with no ipv6");
- fprintf(stderr, "Success");
+ fprintf(stderr, "Success\n");
return 0;
}
--
2.44.0
I had sent RFC patchset early this year (January) [7] to enable CPU assisted
control-flow integrity for usermode on riscv. Since then I've been able to do
more testing of the changes. As part of testing effort, compiled a rootfs with
shadow stack and landing pad enabled (libraries and binaries) and booted to
shell. As part of long running tests, I have been able to run some spec 2006
benchmarks [8] (here link is provided only for list of benchmarks that were
tested for long running tests, excel sheet provided here actually is for some
static stats like code size growth on spec binaries). Thus converting from RFC
to regular patchset.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach
reach a target that they actually intend to reach.
- Securing backward control flow : All function returns must
return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward
control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
enforces that all indirect calls or jmps must land on a landing pad instr
and label embedded in landing pad instr must match a value programmed in
`x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
which can only be writeable via shadow stack instructions (sspush and
ssamoswap) and thus can't be tampered with via inadvertent stores. More
details about extension can be read from [2] and there are details in
documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
(specifically expected from dynamic loader). There has been a lot of earlier
discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
overall consensus had been to let dynamic loader (or usermode) to decide for
enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv. arm64 is expected
to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time
***********************
Spec changes
------------
- Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
`auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
Thus label width is 20bit.
- Shadow stack management instructions are reduced to
sspush - to push x1/x5 on shadow stack
sspopchk - pops from shadow stack and comapres with x1/x5.
ssamoswap - atomically swap value on shadow stack.
rdssp - reads current shadow stack pointer
- Shadow stack accesses on readonly memory always raise AMO/store page fault.
`sspopchk` is load but if underlying page is readonly, it'll raise a store
page fault. It simplifies hardware and kernel for COW handling for shadow
stack pages.
- riscv defines a new exception type `software check exception` and control flow
violations raise software check exception.
- enabling controls for shadow stack and landing are in xenvcfg CSR and controls
lower privilege mode enabling. As an example senvcfg controls enabling for U and
menvcfg controls enabling for S mode.
core mm shadow stack enabling
-----------------------------
Shadow stack for x86 usermode are now in mainline and thus this patch
series builds on top of that for arch-agnostic mm related changes. Big
thanks and shout out to Rick Edgecombe for that.
selftests
---------
Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
[2] - https://github.com/riscv/riscv-cfi
[3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#m…
[4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.co…
[5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRh…
[6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kerne…
[7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/
[8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_i…
Deepak Gupta (26):
riscv: envcfg save and restore on task switching
riscv: define default value for envcfg
riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv
riscv: zicfiss/zicfilp enumeration
riscv: zicfiss/zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap
entry/exit
mm: Define VM_SHADOW_STACK for RISC-V
mm: abstract shadow stack vma behind `arch_is_shadow_stack`
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
prctl: arch-agnostic prtcl for indirect branch tracking
riscv: Implements arch agnostic shadow stack prctls
riscv: Implements arch argnostic indirect branch tracking prctls
riscv/kernel: update __show_regs to print shadow stack register
riscv/traps: Introduce software check exception
riscv sigcontext: adding cfi state field in sigcontext
riscv signal: Save and restore of shadow stack for signal
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Mark Brown (1):
prctl: arch-agnostic prctl for shadow stack
Documentation/arch/riscv/zicfilp.rst | 104 ++++
Documentation/arch/riscv/zicfiss.rst | 169 ++++++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig | 19 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/cpufeature.h | 13 +
arch/riscv/include/asm/csr.h | 18 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 24 +
arch/riscv/include/asm/pgtable.h | 32 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/switch_to.h | 10 +
arch/riscv/include/asm/thread_info.h | 4 +
arch/riscv/include/asm/usercfi.h | 118 ++++
arch/riscv/include/uapi/asm/ptrace.h | 18 +
arch/riscv/include/uapi/asm/sigcontext.h | 5 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/asm-offsets.c | 4 +
arch/riscv/kernel/cpufeature.c | 2 +
arch/riscv/kernel/entry.S | 29 +
arch/riscv/kernel/process.c | 35 +-
arch/riscv/kernel/ptrace.c | 83 +++
arch/riscv/kernel/signal.c | 45 ++
arch/riscv/kernel/sys_riscv.c | 11 +
arch/riscv/kernel/traps.c | 38 ++
arch/riscv/kernel/usercfi.c | 510 ++++++++++++++++++
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 21 +
include/linux/mm.h | 35 +-
include/uapi/asm-generic/mman.h | 1 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 49 ++
kernel/sys.c | 60 +++
mm/gup.c | 5 +-
mm/internal.h | 2 +-
mm/mmap.c | 1 +
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/Makefile | 10 +
.../testing/selftests/riscv/cfi/cfi_rv_test.h | 85 +++
.../selftests/riscv/cfi/riscv_cfi_test.c | 91 ++++
.../testing/selftests/riscv/cfi/shadowstack.c | 376 +++++++++++++
.../testing/selftests/riscv/cfi/shadowstack.h | 39 ++
42 files changed, 2077 insertions(+), 11 deletions(-)
create mode 100644 Documentation/arch/riscv/zicfilp.rst
create mode 100644 Documentation/arch/riscv/zicfiss.rst
create mode 100644 arch/riscv/include/asm/mman.h
create mode 100644 arch/riscv/include/asm/usercfi.h
create mode 100644 arch/riscv/kernel/usercfi.c
create mode 100644 tools/testing/selftests/riscv/cfi/Makefile
create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h
create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c
create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
--
2.43.2
For now, the BPF program of type BPF_PROG_TYPE_TRACING is not allowed to
be attached to multiple hooks, and we have to create a BPF program for
each kernel function, for which we want to trace, even through all the
program have the same (or similar) logic. This can consume extra memory,
and make the program loading slow if we have plenty of kernel function to
trace.
In this series, we add the support to allow attaching a tracing BPF
program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
In the 1st patch, we add the support to record index of the accessed
function args of the target for tracing program. Meanwhile, we add the
function btf_check_func_part_match() to compare the accessed function args
of two function prototype. This function will be used in the next commit.
In the 2nd patch, we refactor the struct modules_array to ptr_array, as
we need similar function to hold the target btf, target program and kernel
modules that we reference to in the following commit.
In the 3rd patch, we introduce the struct bpf_tramp_link_conn to be the
bridge between bpf_link and trampoline, as the releation between bpf_link
and trampoline is not one-to-one anymore.
In the 4th patch, we add the struct bpf_tramp_multi_link and
bpf_trampoline_multi_{link,unlink}_prog for multi-link of trampoline.
In the 5th patch, we add target btf to the function args of
bpf_check_attach_target(), then the caller can specify the btf to check.
The 6th patch is the main part to add multi-link supporting for tracing.
For now, only the following attach type is supported:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
BPF_MODIFY_RETURN_MULTI
The attach type of BPF_TRACE_RAW_TP has different link type, so we skip
this part in this series for now.
In the 7th and 8th patches, we add multi-link supporting of tracing to
libbpf. Note that we don't free btfs that we load after the bpf programs
are loaded into the kernel now if any programs of type tracing multi-link
existing, as we need to lookup the btf types during attaching.
In the 9th patch, we add the testcases for this series.
Changes since v1:
- According to the advice of Alexei, introduce multi-link for tracing
instead of attaching a tracing program to multiple trampolines with
creating multi instance of bpf_link.
Menglong Dong (9):
bpf: tracing: add support to record and check the accessed args
bpf: refactor the modules_array to ptr_array
bpf: trampoline: introduce struct bpf_tramp_link_conn
bpf: trampoline: introduce bpf_tramp_multi_link
bpf: verifier: add btf to the function args of bpf_check_attach_target
bpf: tracing: add multi-link support
libbpf: don't free btf if program of multi-link tracing existing
libbpf: add support for the multi-link of tracing
selftests/bpf: add testcases for multi-link of tracing
arch/arm64/net/bpf_jit_comp.c | 4 +-
arch/riscv/net/bpf_jit_comp64.c | 4 +-
arch/s390/net/bpf_jit_comp.c | 4 +-
arch/x86/net/bpf_jit_comp.c | 4 +-
include/linux/bpf.h | 51 ++-
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 10 +
kernel/bpf/bpf_struct_ops.c | 2 +-
kernel/bpf/btf.c | 113 ++++-
kernel/bpf/syscall.c | 425 +++++++++++++++++-
kernel/bpf/trampoline.c | 97 +++-
kernel/bpf/verifier.c | 24 +-
kernel/trace/bpf_trace.c | 48 +-
net/bpf/test_run.c | 3 +
net/core/bpf_sk_storage.c | 2 +
tools/bpf/bpftool/common.c | 3 +
tools/include/uapi/linux/bpf.h | 10 +
tools/lib/bpf/bpf.c | 10 +
tools/lib/bpf/bpf.h | 6 +
tools/lib/bpf/libbpf.c | 215 ++++++++-
tools/lib/bpf/libbpf.h | 16 +
tools/lib/bpf/libbpf.map | 2 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 49 ++
.../bpf/prog_tests/tracing_multi_link.c | 153 +++++++
.../selftests/bpf/progs/tracing_multi_test.c | 209 +++++++++
25 files changed, 1366 insertions(+), 99 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
--
2.39.2
By default, HLT instruction executed by guest is intercepted by hypervisor.
However, KVM_CAP_X86_DISABLE_EXITS capability can be used to not intercept
HLT by setting KVM_X86_DISABLE_EXITS_HLT.
By default, vms are created with in-kernel APIC support in KVM selftests.
VM needs to be created without in-kernel APIC support for this test, so
that HLT will exit to userspace. To do so, __vm_create() is modified to not
call KVM_CREATE_IRQCHIP ioctl while creating vm.
Add a test case to test KVM_X86_DISABLE_EXITS_HLT functionality.
Patch 1, 2 -> Preparatory patches to add the KVM_X86_DISABLE_EXITS_HLT test
case
Patch 3 -> Adds a test case for KVM_X86_DISABLE_EXITS_HLT
Testing done:
Tested KVM_X86_DISABLE_EXITS_HLT test case on AMD and Intel machines.
Manali Shukla (3):
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Change __vm_create() to create a vm without in-kernel
APIC
KVM: selftests: Add a test case for KVM_X86_DISABLE_EXITS_HLT
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/dirty_log_test.c | 2 +-
.../selftests/kvm/include/kvm_util_base.h | 4 +-
.../selftests/kvm/include/x86_64/processor.h | 17 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +-
.../kvm/x86_64/halt_disable_exit_test.c | 113 ++++++++++++++++++
.../kvm/x86_64/ucna_injection_test.c | 2 +-
7 files changed, 143 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/halt_disable_exit_test.c
base-commit: e9da6f08edb0bd4c621165496778d77a222e1174
--
2.34.1
The mremap_test, in a worst case controlled by the -t flag, does a for loop
iteration in orders of GB. Without compromising on the stdout report, the aim
is to reduce this time.
A pre-filled random buffer is allocated based on the seed, replacing repetitive
rand() calls. The byte pattern in the memory locations is set through memcpy()
from the random buffer.
Replacing the loop for printing the mismatch index to stdout, employ an
efficient algorithm by breaking the comparison into chunks, use the highly
optimized memcmp() library function, and when a mismatch does occur, only
then do a brute force iteration.
Also, use sscanf() to parse /proc/self/maps for consistency across files.
Execution time results (x86 system):
./mremap_test
Original: 3 seconds
After change: 0.8 seconds
./mremap_test -t100
Original: 17 seconds
After change: 2 seconds
./mremap_test -t0 (worst case):
Original: 9:40 minutes
After change: 45 seconds
Dev Jain (3):
selftests/mm: mremap_test: Optimize using pre-filled random array and
memcpy
selftests/mm: mremap_test: Optimize execution time from minutes to
seconds using chunkwise memcmp
selftests/mm: mremap_test: Use sscanf to parse /proc/self/maps
tools/testing/selftests/mm/mremap_test.c | 204 +++++++++++++++++------
1 file changed, 153 insertions(+), 51 deletions(-)
--
2.34.1
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.9rc2.
This kselftest fixes update for Linux 6.9-rc2 consists of fixes
to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4cece764965020c22cff7665b18a012006359095:
Linux 6.9-rc1 (2024-03-24 14:10:05 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.9-rc2
for you to fetch changes up to 224fe424c356cb5c8f451eca4127f32099a6f764:
selftests: dmabuf-heap: add config file for the test (2024-03-29 13:57:14 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.9-rc2
This kselftest fixes update for Linux 6.9-rc2 consists of fixes
to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage.
----------------------------------------------------------------
Mark Brown (1):
selftests/seccomp: Try to fit runtime of benchmark into timeout
Mark Rutland (1):
selftests/ftrace: Fix event filter target_func selection
Muhammad Usama Anjum (1):
selftests: dmabuf-heap: add config file for the test
tools/testing/selftests/dmabuf-heaps/config | 3 +++
tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
tools/testing/selftests/seccomp/settings | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/dmabuf-heaps/config
----------------------------------------------------------------
Hi Linus,
Please pull the following kunit fixes update for Linux 6.9rc2.
This kunit update for Linux 6.9-rc2 consists of one urgent fix for
--alltests build failure related to renaming of CONFIG_DAMON_DBGFS
to DAMON_DBGFS_DEPRECATED to the missing config option.
This is one of the two fixes to --alltests breakage. The other
one is in the following PR from net:
https://lore.kernel.org/netdev/20240328143117.26574-1-pabeni@redhat.com
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4cece764965020c22cff7665b18a012006359095:
Linux 6.9-rc1 (2024-03-24 14:10:05 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-fixes-6.9-rc2
for you to fetch changes up to cfedfb24c9ddee2bf1641545f6e9b6a02b924aee:
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests (2024-03-28 11:47:30 -0600)
----------------------------------------------------------------
linux_kselftest-kunit-fixes-6.9-rc2
This kunit update for Linux 6.9-rc2 consists of one urgent fix for
--alltests build failure related to renaming of CONFIG_DAMON_DBGFS
to DAMON_DBGFS_DEPRECATED to the missing config option.
----------------------------------------------------------------
David Gow (1):
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
tools/testing/kunit/configs/all_tests.config | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
The test results reported for the clone3_set_tid tests interact poorly with
automation for running kselftest since the reported test names include TIDs
dynamically allocated at runtime. A lot of automation for running kselftest
will compare runs by looking at the test name to identify if the same test
is being run so changing names make it look like the testsuite has been
updated to include new tests. This makes the results display less clearly
and breaks cases like bisection.
Address this by providing a brief description of the tests and logging that
along with the stable parameters for the test currently logged. The TIDs
are already logged separately in existing logging except for the final test
which has a new log message added. We also tweak the formatting of the
logging of expected/actual values for clarity.
There are still issues with the logging of skipped tests (many are simply
not logged at all when skipped and all are logged with different names) but
these are less disruptive since the skips are all based on not being run as
root, a condition likely to be stable for a given test system.
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase onto v6.9-rc1.
- This is the second release I've posted this for with no changes or
review comments.
- Link to v2: https://lore.kernel.org/r/20240122-kselftest-clone3-set-tid-v2-1-72af5d7dba…
Changes in v2:
- Rebase onto v6.8-rc1.
- Link to v1: https://lore.kernel.org/r/20231115-kselftest-clone3-set-tid-v1-1-c1932591c4…
---
tools/testing/selftests/clone3/clone3_set_tid.c | 117 ++++++++++++++----------
1 file changed, 69 insertions(+), 48 deletions(-)
diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
index ed785afb6077..9ae38733cb6e 100644
--- a/tools/testing/selftests/clone3/clone3_set_tid.c
+++ b/tools/testing/selftests/clone3/clone3_set_tid.c
@@ -114,7 +114,8 @@ static int call_clone3_set_tid(pid_t *set_tid,
return WEXITSTATUS(status);
}
-static void test_clone3_set_tid(pid_t *set_tid,
+static void test_clone3_set_tid(const char *desc,
+ pid_t *set_tid,
size_t set_tid_size,
int flags,
int expected,
@@ -129,17 +130,13 @@ static void test_clone3_set_tid(pid_t *set_tid,
ret = call_clone3_set_tid(set_tid, set_tid_size, flags, expected_pid,
wait_for_it);
ksft_print_msg(
- "[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n",
+ "[%d] clone3() with CLONE_SET_TID %d says: %d - expected %d\n",
getpid(), set_tid[0], ret, expected);
- if (ret != expected)
- ksft_test_result_fail(
- "[%d] Result (%d) is different than expected (%d)\n",
- getpid(), ret, expected);
- else
- ksft_test_result_pass(
- "[%d] Result (%d) matches expectation (%d)\n",
- getpid(), ret, expected);
+
+ ksft_test_result(ret == expected, "%s with %d TIDs and flags 0x%x\n",
+ desc, set_tid_size, flags);
}
+
int main(int argc, char *argv[])
{
FILE *f;
@@ -172,73 +169,91 @@ int main(int argc, char *argv[])
/* Try invalid settings */
memset(&set_tid, 0, sizeof(set_tid));
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
- -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+ -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
/*
* This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
* nested PID namespace.
*/
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
memset(&set_tid, 0xff, sizeof(set_tid));
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
- -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+ -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
/*
* This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
* nested PID namespace.
*/
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
memset(&set_tid, 0, sizeof(set_tid));
/* Try with an invalid PID */
set_tid[0] = 0;
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("valid size, 0 TID",
+ set_tid, 1, 0, -EINVAL, 0, 0);
set_tid[0] = -1;
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("valid size, -1 TID",
+ set_tid, 1, 0, -EINVAL, 0, 0);
/* Claim that the set_tid array actually contains 2 elements. */
- test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("2 TIDs, -1 and 0",
+ set_tid, 2, 0, -EINVAL, 0, 0);
/* Try it in a new PID namespace */
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("valid size, -1 TID",
+ set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
/* Try with a valid PID (1) this should return -EEXIST. */
set_tid[0] = 1;
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, 0, -EEXIST, 0, 0);
+ test_clone3_set_tid("duplicate PID 1",
+ set_tid, 1, 0, -EEXIST, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
/* Try it in a new PID namespace */
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, 0, 0, 0);
+ test_clone3_set_tid("duplicate PID 1",
+ set_tid, 1, CLONE_NEWPID, 0, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
/* pid_max should fail everywhere */
set_tid[0] = pid_max;
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("set TID to maximum",
+ set_tid, 1, 0, -EINVAL, 0, 0);
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("set TID to maximum",
+ set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
@@ -262,10 +277,12 @@ int main(int argc, char *argv[])
/* After the child has finished, its PID should be free. */
set_tid[0] = pid;
- test_clone3_set_tid(set_tid, 1, 0, 0, 0, 0);
+ test_clone3_set_tid("reallocate child TID",
+ set_tid, 1, 0, 0, 0, 0);
/* This should fail as there is no PID 1 in that namespace */
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("duplicate child TID",
+ set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
/*
* Creating a process with PID 1 in the newly created most nested
@@ -274,7 +291,8 @@ int main(int argc, char *argv[])
*/
set_tid[0] = 1;
set_tid[1] = pid;
- test_clone3_set_tid(set_tid, 2, CLONE_NEWPID, 0, pid, 0);
+ test_clone3_set_tid("create PID 1 in new NS",
+ set_tid, 2, CLONE_NEWPID, 0, pid, 0);
ksft_print_msg("unshare PID namespace\n");
if (unshare(CLONE_NEWPID) == -1)
@@ -284,7 +302,8 @@ int main(int argc, char *argv[])
set_tid[0] = pid;
/* This should fail as there is no PID 1 in that namespace */
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("duplicate PID 1",
+ set_tid, 1, 0, -EINVAL, 0, 0);
/* Let's create a PID 1 */
ns_pid = fork();
@@ -295,21 +314,25 @@ int main(int argc, char *argv[])
*/
set_tid[0] = 43;
set_tid[1] = -1;
- test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("check leak on invalid TID -1",
+ set_tid, 2, 0, -EINVAL, 0, 0);
set_tid[0] = 43;
set_tid[1] = pid;
- test_clone3_set_tid(set_tid, 2, 0, 0, 43, 0);
+ test_clone3_set_tid("check leak on invalid specific TID",
+ set_tid, 2, 0, 0, 43, 0);
ksft_print_msg("Child in PID namespace has PID %d\n", getpid());
set_tid[0] = 2;
- test_clone3_set_tid(set_tid, 1, 0, 0, 2, 0);
+ test_clone3_set_tid("create PID 2 in child NS",
+ set_tid, 1, 0, 0, 2, 0);
set_tid[0] = 1;
set_tid[1] = -1;
set_tid[2] = pid;
/* This should fail as there is invalid PID at level '1'. */
- test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("fail due to invalid TID at level 1",
+ set_tid, 3, CLONE_NEWPID, -EINVAL, 0, 0);
set_tid[0] = 1;
set_tid[1] = 42;
@@ -319,13 +342,15 @@ int main(int argc, char *argv[])
* namespaces. Again assuming this is running in the host's
* PID namespace. Not yet nested.
*/
- test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("fail due to too few active PID NSs",
+ set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
/*
* This should work and from the parent we should see
* something like 'NSpid: pid 42 1'.
*/
- test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, 0, 42, true);
+ test_clone3_set_tid("verify that we have 3 PID NSs",
+ set_tid, 3, CLONE_NEWPID, 0, 42, true);
child_exit(ksft_cnt.ksft_fail);
}
@@ -380,14 +405,10 @@ int main(int argc, char *argv[])
ksft_cnt.ksft_pass += 6 - (ksft_cnt.ksft_fail - WEXITSTATUS(status));
ksft_cnt.ksft_fail = WEXITSTATUS(status);
- if (ns3 == pid && ns2 == 42 && ns1 == 1)
- ksft_test_result_pass(
- "PIDs in all namespaces as expected (%d,%d,%d)\n",
- ns3, ns2, ns1);
- else
- ksft_test_result_fail(
- "PIDs in all namespaces not as expected (%d,%d,%d)\n",
- ns3, ns2, ns1);
+ ksft_print_msg("Expecting PIDs %d, 42, 1\n", pid);
+ ksft_print_msg("Have PIDs in namespaces: %d, %d, %d\n", ns3, ns2, ns1);
+ ksft_test_result(ns3 == pid && ns2 == 42 && ns1 == 1,
+ "PIDs in all namespaces as expected\n");
out:
ret = 0;
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231114-kselftest-clone3-set-tid-c0c35111f18f
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.
The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.
With CONFIG_KUNIT enabled, image size increase with this series applied is
approximately 1%. The image size increase (and with it the functionality
introduced by this series) can be avoided by disabling
CONFIG_KUNIT_SUPPRESS_BACKTRACE.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b…
and offers a more comprehensive solution of the problem discussed there.
Design note:
Function pointers are only added to the __bug_table section if both
CONFIG_KUNIT_SUPPRESS_BACKTRACE and CONFIG_DEBUG_BUGVERBOSE are enabled
to avoid image size increases if CONFIG_KUNIT is disabled. There would be
some benefits to adding those pointers all the time (reduced complexity,
ability to display function names in BUG/WARNING messages). That change,
if desired, can be made later.
Checkpatch note:
Remaining checkpatch errors and warnings were deliberately ignored.
Some are triggered by matching coding style or by comments interpreted
as code, others by assembler macros which are disliked by checkpatch.
Suggestions for improvements are welcome.
Changes since RFC:
- Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests
Changes since v1:
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags
[I retained those tags since there have been no functional changes]
- Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by
default.
Dear Linux folks,
Running the selftests in a QEMU q35 VM, they hang.
```
$ make kselftest
[…]
ok 6 selftests: pidfd: pidfd_getfd_test
# timeout set to 45
# selftests: pidfd: pidfd_setns_test
# TAP version 13
# 1..7
# # Starting 7 tests from 2 test cases.
# # RUN global.setns_einval ...
# # OK global.setns_einval
# ok 1 global.setns_einval
# # RUN current_nsset.invalid_flags ...
# # pidfd_setns_test.c:161:invalid_flags:Expected self->child_pid_exited
(0) > 0 (0)
# # OK current_nsset.invalid_flags
# ok 2 current_nsset.invalid_flags
# # RUN current_nsset.pidfd_exited_child ...
# # pidfd_setns_test.c:161:pidfd_exited_child:Expected
self->child_pid_exited (0) > 0 (0)
# # OK current_nsset.pidfd_exited_child
# ok 3 current_nsset.pidfd_exited_child
# # RUN current_nsset.pidfd_incremental_setns ...
# # pidfd_setns_test.c:161:pidfd_incremental_setns:Expected
self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to user namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to mnt namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to pid namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to uts namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to ipc namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to net namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to cgroup namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to pid_for_children namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:391:pidfd_incremental_setns:Expected
setns(self->child_pidfd1, info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:392:pidfd_incremental_setns:Too many users -
Failed to setns to time namespace of 31570 via pidfd 20
# # pidfd_incremental_setns: Test terminated by timeout
# # FAIL current_nsset.pidfd_incremental_setns
# not ok 4 current_nsset.pidfd_incremental_setns
# # RUN current_nsset.nsfd_incremental_setns ...
# # pidfd_setns_test.c:161:nsfd_incremental_setns:Expected
self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to user namespace of 31577 via nsfd 19
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to mnt namespace of 31577 via nsfd 24
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to pid namespace of 31577 via nsfd 27
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to uts namespace of 31577 via nsfd 30
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to ipc namespace of 31577 via nsfd 33
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to net namespace of 31577 via nsfd 36
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to cgroup namespace of 31577 via nsfd 39
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to pid_for_children namespace of 31577 via nsfd 42
# # pidfd_setns_test.c:427:nsfd_incremental_setns:Expected
setns(self->child_nsfds1[i], info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:428:nsfd_incremental_setns:Too many users -
Failed to setns to time namespace of 31577 via nsfd 45
```
Ctrl + c gets me back to the shell prompt.
```
$ ps auxf
[…]
build 31574 0.0 0.0 2528 1024 pts/0 D 09:57 0:00
./pidfd_setns_test
build 31575 99.9 0.0 2528 1024 pts/0 R 09:57 37:37 \_
./pidfd_setns_test
build 31576 0.0 0.0 0 0 pts/0 Z 09:57 0:00
\_ [pidfd_setns_tes] <defunct>
build 31577 0.0 0.0 0 0 pts/0 Z 09:57 0:00
\_ [pidfd_setns_tes] <defunct>
build 31578 0.0 0.0 2528 512 pts/0 S 09:57 0:00
\_ ./pidfd_setns_test
```
During shutdown, the system waits for that process:
[42733.704347] systemd-shutdown[1]: Waiting for process: 31578
(pidfd_setns_tes)
This is commit 8d025e2092e2 with one patch on top:
$ git log --no-decorate --oneline -2 a2ce022afcbb
a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated
*.mod.c intermediaries
8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
KCSAN is not enabled though.
Kind regards,
Paul
As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:
- The forwarding selftests use a configuration file where names of
interfaces are defined and various variables can be overridden. There
is also forwarding.config.sample that users can use as a template to
refer to when creating the config file. What happens a fair bit is
that users either do not know about this at all, or simply forget, and
are confused by cryptic failures about interfaces that cannot be
created.
In patches #1 - #3 have lib.sh just be the single source of truth with
regards to which variables exist. That includes the topology variables
which were previously only in the sample file, and any "tweak
variables", such as what tools to use, sleep times, etc.
forwarding.config.sample then becomes just a placeholder with a couple
examples. Unless specific HW should be exercised, or specific tools
used, the defaults are usually just fine.
- Several net/forwarding/ selftests (and one net/ one) cannot be run on
veth pairs, they need an actual HW interface to run on. They are
generic in the sense that any capable HW should pass them, which is
why they have been put to net/forwarding/ as opposed to drivers/net/,
but they do not generalize to veth. The fact that these tests are in
net/forwarding/, but still complaining when run, is confusing.
In patches #4 - #6 move these tests to a new directory
drivers/net/hw.
- The following patches extend the codebase to handle well test results
other than pass and fail.
Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
so that tests do not spuriously end up returning non-0 when they
are not supposed to.
In patches #8 - #10, introduce some missing ksft constants, then support
having those constants in RET, and then finally in EXIT_STATUS.
- The traffic scheduler tests generate a large amount of network traffic
to test the behavior of the scheduler. This demands a relatively
high-performance computer. On slow machines, such as with a debugging
kernel, the test would spuriously fail.
It can still be useful to "go through the motions" though, to possibly
catch bugs in setup of the scheduler graph and passing packets around.
Thus we still want to run the tests, just with lowered demands.
To that end, in patches #11 - #12, introduce an environment variable
KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
is provided to mark performance-sensitive parts of the selftest.
- In patch #13, use a similar mechanism to mark a NH group stats
selftest to XFAIL HW stats tests when run on VETH pairs.
- All these changes complicate the hitherto straightforward logging and
checking logic, so in patch #14, add a selftest that checks this
functionality in lib.sh.
v1 (vs. an RFC circulated through linux-kselftest):
- Patch #9:
- Clarify intended usage by s/set_ret/ret_set_ksft_status/,
s/nret/ksft_status/
Petr Machata (14):
selftests: net: libs: Change variable fallback syntax
selftests: forwarding.config.sample: Move overrides to lib.sh
selftests: forwarding: README: Document customization
selftests: forwarding: ipip_lib: Do not import lib.sh
selftests: forwarding: Move several selftests
selftests: forwarding: Ditch skip_on_veth()
selftests: forwarding: Change inappropriate log_test_skip() calls
selftests: lib: Define more kselftest exit codes
selftests: forwarding: Have RET track kselftest framework constants
selftests: forwarding: Convert log_test() to recognize RET values
selftests: forwarding: Support for performance sensitive tests
selftests: forwarding: Mark performance-sensitive tests
selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
selftests: forwarding: Add a test for testing lib.sh functionality
.../testing/selftests/drivers/net/hw/Makefile | 25 ++
.../net/hw}/devlink_port_split.py | 0
.../forwarding => drivers/net/hw}/ethtool.sh | 5 +-
.../net/hw}/ethtool_extended_state.sh | 5 +-
.../net/hw}/ethtool_lib.sh | 0
.../net/hw}/ethtool_mm.sh | 3 +-
.../net/hw}/ethtool_rmon.sh | 7 +-
.../net/hw}/hw_stats_l3.sh | 19 +-
.../net/hw}/hw_stats_l3_gre.sh | 7 +-
.../forwarding => drivers/net/hw}/loopback.sh | 5 +-
.../testing/selftests/drivers/net/hw/settings | 1 +
.../selftests/drivers/net/mlxsw/mlxsw_lib.sh | 2 +-
.../net/mlxsw/spectrum-2/resource_scale.sh | 1 -
.../net/mlxsw/spectrum/resource_scale.sh | 1 -
tools/testing/selftests/net/Makefile | 1 -
.../testing/selftests/net/forwarding/Makefile | 9 +-
tools/testing/selftests/net/forwarding/README | 33 +++
.../net/forwarding/forwarding.config.sample | 53 ++--
.../selftests/net/forwarding/ipip_lib.sh | 1 -
tools/testing/selftests/net/forwarding/lib.sh | 255 +++++++++++++-----
.../selftests/net/forwarding/lib_sh_test.sh | 208 ++++++++++++++
.../net/forwarding/router_mpath_nh_lib.sh | 12 +-
.../selftests/net/forwarding/sch_ets_tests.sh | 19 +-
.../selftests/net/forwarding/sch_red.sh | 10 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 2 +-
.../selftests/net/forwarding/tc_common.sh | 2 +-
.../selftests/net/forwarding/tc_tunnel_key.sh | 2 -
tools/testing/selftests/net/lib.sh | 48 +++-
28 files changed, 565 insertions(+), 171 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/hw/Makefile
rename tools/testing/selftests/{net => drivers/net/hw}/devlink_port_split.py (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool.sh (98%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_extended_state.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_lib.sh (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_mm.sh (99%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_rmon.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3_gre.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/loopback.sh (92%)
create mode 100644 tools/testing/selftests/drivers/net/hw/settings
create mode 100755 tools/testing/selftests/net/forwarding/lib_sh_test.sh
--
2.43.0
Hi,
Commit 28b3df1fe6ba2cb4 ("kunit: add wireless unit tests") which I can't
seem to find on lore breaks full kunit runs on non-UML builds and is now
present in mainline. If I run:
./tools/testing/kunit/kunit.py run --alltests --cross_compile x86_64-linux-gnu- --arch x86_64
on a clean tree then I get:
[15:09:20] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=x86_64 O=.kunit olddefconfig CROSS_COMPILE=x86_64-linux-gnu-
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_IWLWIFI=y, CONFIG_WLAN_VENDOR_INTEL=y
UML works fine, but other real architectures (eg, arm64) seem similarly
broken. I've not looked properly yet, I'm a bit confused given that
there's not even any dependencies for WLAN_VENDOR_INTEL and it's not
mentoned in the defconfig.
Thanks,
Mark
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (13):
queue_api: define queue api
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 256 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/netdevice.h | 3 +
include/linux/skbuff.h | 73 +++-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/netdev_queues.h | 27 ++
include/net/netdev_rx_queue.h | 2 +
include/net/netmem.h | 234 +++++++++-
include/net/page_pool/helpers.h | 155 +++++--
include/net/page_pool/types.h | 33 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 2 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 425 ++++++++++++++++++
net/core/gro.c | 8 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 107 +++++
net/core/page_pool.c | 364 +++++++++-------
net/core/skbuff.c | 110 ++++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 2 +-
net/ipv4/tcp.c | 254 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 9 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 2 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++
46 files changed, 2776 insertions(+), 277 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.44.0.396.g6e790dbe36-goog
Cleaning up after tests is implemented separately for individual tests
and called at the end of each test execution. Since these functions are
very similar and a more generalized test framework was introduced a
function pointer in the resctrl_test struct can be used to reduce the
amount of function calls.
These functions are also all called in the ctrl-c handler because the
handler isn't aware which test is currently running. Since the handler
is implemented with a sigaction no function parameters can be passed
there but information about what test is currently running can be passed
with a global variable.
Series applies cleanly on top of kselftests/next.
Changelog v5:
- Rebase onto kselftests/next.
- Add Reinette's reviewed-by tag.
Changelog v4:
- Check current_test pointer and reset it in signal unregistering.
- Move cleanup call to test_cleanup function.
Changelog v3:
- Make current_test static.
- Add callback NULL check to the ctrl-c handler.
Changelog v2:
- Make current_test a const pointer limited in scope to resctrl_val
file.
- Remove tests_cleanup from resctrl.h.
- Cleanup 'goto out' path and labels in individual test functions.
Older versions of this series:
[v1] https://lore.kernel.org/all/cover.1708434017.git.maciej.wieczor-retman@inte…
[v2] https://lore.kernel.org/all/cover.1708596015.git.maciej.wieczor-retman@inte…
[v3] https://lore.kernel.org/all/cover.1708599491.git.maciej.wieczor-retman@inte…
[v4] https://lore.kernel.org/all/cover.1708949785.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (3):
selftests/resctrl: Add cleanup function to test framework
selftests/resctrl: Simplify cleanup in ctrl-c handler
selftests/resctrl: Move cleanups out of individual tests
tools/testing/selftests/resctrl/cat_test.c | 8 +++-----
tools/testing/selftests/resctrl/cmt_test.c | 4 ++--
tools/testing/selftests/resctrl/mba_test.c | 8 +++-----
tools/testing/selftests/resctrl/mbm_test.c | 8 +++-----
tools/testing/selftests/resctrl/resctrl.h | 9 +++------
.../testing/selftests/resctrl/resctrl_tests.c | 20 +++++++------------
tools/testing/selftests/resctrl/resctrl_val.c | 8 ++++++--
7 files changed, 27 insertions(+), 38 deletions(-)
--
2.43.2
This is required, as CONFIG_DAMON_DEBUGFS is enabled, and --alltests UML
builds will fail due to the missing config option otherwise.
Fixes: f4cba4bf6777 ("mm/damon: rename CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED")
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is breaking all UML alltests builds, so we'd like to fix it sooner
rather than later. SeongJae, would you rather take this yourself, or can
we push it alongside any other KUnit fixes?
Johannes: Does this conflict with the CONFIG_NETDEVICES / CONFIG_WLAN
fixes to all_tests.config? I'd assume not, but I'm happy to take them
via KUnit if you'd prefer anyway.
Thanks,
-- David
---
tools/testing/kunit/configs/all_tests.config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index aa5ec149f96c..f388742cf266 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -38,6 +38,7 @@ CONFIG_DAMON_VADDR=y
CONFIG_DAMON_PADDR=y
CONFIG_DEBUG_FS=y
CONFIG_DAMON_DBGFS=y
+CONFIG_DAMON_DBGFS_DEPRECATED=y
CONFIG_REGMAP_BUILD=y
--
2.44.0.396.g6e790dbe36-goog
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/netdev/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.c…
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (4):
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 ++--
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 36 +++++++----
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +--
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 6 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 49 ++------------
net/ipv4/fou_core.c | 9 +--
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 79 ++++++++++++++++++-----
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 26 ++++----
net/ipv6/ip6_offload.c | 41 +++++-------
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 ++--
tools/testing/selftests/net/udpgro_fwd.sh | 10 ++-
24 files changed, 187 insertions(+), 155 deletions(-)
--
2.36.1
The longest running netdevsim test, nexthop.sh, currently takes
5 min to finish. Around 260s to be exact, and 310s on a debug kernel.
The default timeout in selftest is 45sec, so we need an explicit
config. Give ourselves some headroom and use 10min.
Commit under Fixes isn't really to "blame" but prior to that
netdevsim tests weren't integrated with kselftest infra
so blaming the tests themselves doesn't seem right, either.
Fixes: 8ff25dac88f6 ("netdevsim: add Makefile for selftests")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: dw(a)davidwei.uk
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/netdevsim/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/drivers/net/netdevsim/settings
diff --git a/tools/testing/selftests/drivers/net/netdevsim/settings b/tools/testing/selftests/drivers/net/netdevsim/settings
new file mode 100644
index 000000000000..a62d2fa1275c
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netdevsim/settings
@@ -0,0 +1 @@
+timeout=600
--
2.44.0
This cleans up the output of the tty_tstamp_update selftest to play a
bit more nicely with automated systems parsing the test output.
To do this I've also added a new helper ksft_test_result() which takes a
KSFT_ code as a report, this is something I've wanted on other occasions
but restructured things to avoid needing it. This time I figured I'd
just add it since it keeps coming up.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest: Add mechanism for reporting a KSFT_ result code
kselftest/tty: Report a consistent test name for the one test we run
tools/testing/selftests/kselftest.h | 22 ++++++++++++
tools/testing/selftests/tty/tty_tstamp_update.c | 48 +++++++++++++++++--------
2 files changed, 55 insertions(+), 15 deletions(-)
---
base-commit: 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478
change-id: 20240305-kselftest-tty-tname-5411444ce037
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.9-rc1.
- Link to v1: https://lore.kernel.org/r/20231219-b4-kselftest-seccomp-benchmark-timeout-v…
---
tools/testing/selftests/seccomp/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/seccomp/settings b/tools/testing/selftests/seccomp/settings
index 6091b45d226b..a953c96aa16e 100644
--- a/tools/testing/selftests/seccomp/settings
+++ b/tools/testing/selftests/seccomp/settings
@@ -1 +1 @@
-timeout=120
+timeout=180
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231219-b4-kselftest-seccomp-benchmark-timeout-05b66e7d29d1
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae3..3f74c09c56b6 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.39.2
As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:
- The forwarding selftests use a configuration file where names of
interfaces are defined and various variables can be overridden. There
is also forwarding.config.sample that users can use as a template to
refer to when creating the config file. What happens a fair bit is
that users either do not know about this at all, or simply forget, and
are confused by cryptic failures about interfaces that cannot be
created.
In patches #1 - #3 have lib.sh just be the single source of truth with
regards to which variables exist. That includes the topology variables
which were previously only in the sample file, and any "tweak
variables", such as what tools to use, sleep times, etc.
forwarding.config.sample then becomes just a placeholder with a couple
examples. Unless specific HW should be exercised, or specific tools
used, the defaults are usually just fine.
- Several net/forwarding/ selftests (and one net/ one) cannot be run on
veth pairs, they need an actual HW interface to run on. They are
generic in the sense that any capable HW should pass them, which is
why they have been put to net/forwarding/ as opposed to drivers/net/,
but they do not generalize to veth. The fact that these tests are in
net/forwarding/, but still complaining when run, is confusing.
In patches #4 - #6 move these tests to a new directory
drivers/net/hw.
- The following patches extend the codebase to handle well test results
other than pass and fail.
Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
so that tests do not spuriously end up returning non-0 when they
are not supposed to.
In patches #8 - #10, introduce some missing ksft constants, then support
having those constants in RET, and then finally in EXIT_STATUS.
- The traffic scheduler tests generate a large amount of network traffic
to test the behavior of the scheduler. This demands a relatively
high-performance computer. On slow machines, such as with a debugging
kernel, the test would spuriously fail.
It can still be useful to "go through the motions" though, to possibly
catch bugs in setup of the scheduler graph and passing packets around.
Thus we still want to run the tests, just with lowered demands.
To that end, in patches #11 - #12, introduce an environment variable
KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
is provided to mark performance-sensitive parts of the selftest.
- In patch #13, use a similar mechanism to mark a NH group stats
selftest to XFAIL HW stats tests when run on VETH pairs.
- All these changes complicate the hitherto straightforward logging and
checking logic, so in patch #14, add a selftest that checks this
functionality in lib.sh.
Petr Machata (14):
selftests: net: libs: Change variable fallback syntax
selftests: forwarding.config.sample: Move overrides to lib.sh
selftests: forwarding: README: Document customization
selftests: forwarding: ipip_lib: Do not import lib.sh
selftests: forwarding: Move several selftests
selftests: forwarding: Ditch skip_on_veth()
selftests: forwarding: Change inappropriate log_test_skip() calls
selftests: lib: Define more kselftest exit codes
selftests: forwarding: Have RET track kselftest framework constants
selftests: forwarding: Convert log_test() to recognize RET values
selftests: forwarding: Support for performance sensitive tests
selftests: forwarding: Mark performance-sensitive tests
selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
selftests: forwarding: Add a test for testing lib.sh functionality
.../testing/selftests/drivers/net/hw/Makefile | 25 ++
.../net/hw}/devlink_port_split.py | 0
.../forwarding => drivers/net/hw}/ethtool.sh | 5 +-
.../net/hw}/ethtool_extended_state.sh | 5 +-
.../net/hw}/ethtool_lib.sh | 0
.../net/hw}/ethtool_mm.sh | 3 +-
.../net/hw}/ethtool_rmon.sh | 7 +-
.../net/hw}/hw_stats_l3.sh | 19 +-
.../net/hw}/hw_stats_l3_gre.sh | 7 +-
.../forwarding => drivers/net/hw}/loopback.sh | 5 +-
.../testing/selftests/drivers/net/hw/settings | 1 +
.../selftests/drivers/net/mlxsw/mlxsw_lib.sh | 2 +-
.../net/mlxsw/spectrum-2/resource_scale.sh | 1 -
.../net/mlxsw/spectrum/resource_scale.sh | 1 -
tools/testing/selftests/net/Makefile | 1 -
.../testing/selftests/net/forwarding/Makefile | 9 +-
tools/testing/selftests/net/forwarding/README | 33 +++
.../net/forwarding/forwarding.config.sample | 53 ++--
.../selftests/net/forwarding/ipip_lib.sh | 1 -
tools/testing/selftests/net/forwarding/lib.sh | 255 +++++++++++++-----
.../selftests/net/forwarding/lib_sh_test.sh | 208 ++++++++++++++
.../net/forwarding/router_mpath_nh_lib.sh | 12 +-
.../selftests/net/forwarding/sch_ets_tests.sh | 19 +-
.../selftests/net/forwarding/sch_red.sh | 10 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 2 +-
.../selftests/net/forwarding/tc_common.sh | 2 +-
.../selftests/net/forwarding/tc_tunnel_key.sh | 2 -
tools/testing/selftests/net/lib.sh | 48 +++-
28 files changed, 565 insertions(+), 171 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/hw/Makefile
rename tools/testing/selftests/{net => drivers/net/hw}/devlink_port_split.py (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool.sh (98%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_extended_state.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_lib.sh (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_mm.sh (99%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_rmon.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3_gre.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/loopback.sh (92%)
create mode 100644 tools/testing/selftests/drivers/net/hw/settings
create mode 100755 tools/testing/selftests/net/forwarding/lib_sh_test.sh
--
2.43.0
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v5->v6:
- Rebased to v6.9-rc1.
- Collect RB.
- Add multiple filter test.
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
v5: https://lore.kernel.org/all/20240229065625.114207-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 336 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 28 ++
4 files changed, 372 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 325 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 29 ++
4 files changed, 362 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
This series has some conflicts with SNC series from Maciej and also
with the MBA/MBM series from Babu.
--
i.
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (13):
selftests/resctrl: Convert get_mem_bw_imc() fd close to for loop
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 21 +-
tools/testing/selftests/resctrl/mba_test.c | 34 ++-
tools/testing/selftests/resctrl/mbm_test.c | 33 ++-
tools/testing/selftests/resctrl/resctrl.h | 48 ++--
tools/testing/selftests/resctrl/resctrl_val.c | 269 ++++++------------
tools/testing/selftests/resctrl/resctrlfs.c | 55 ++--
8 files changed, 224 insertions(+), 247 deletions(-)
--
2.39.2
Currently, VA exhaustion is being checked by passing a hint to mmap() and
expecting it to fail. This patch makes a stricter test by successful
write() calls from /proc/self/maps to a dump file, confirming that a
free chunk is indeed not available.
Changes in v2:
- Replace SZ_1GB with MAP_CHUNK_SIZE, tidy-up
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240314122250.68534-1-dev.jain@arm.com/
.../selftests/mm/virtual_address_range.c | 66 +++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 7bcf8d48256a..050e997e3be2 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -12,6 +12,8 @@
#include <errno.h>
#include <sys/mman.h>
#include <sys/time.h>
+#include <fcntl.h>
+
#include "../kselftest.h"
/*
@@ -93,6 +95,66 @@ static int validate_lower_address_hint(void)
return 1;
}
+static int validate_complete_va_space(void)
+{
+ unsigned long start_addr, end_addr, prev_end_addr;
+ char line[400];
+ char prot[6];
+ FILE *file;
+ int fd;
+
+ fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
+ unlink("va_dump");
+ if (fd < 0) {
+ ksft_test_result_skip("cannot create or open dump file\n");
+ ksft_finished();
+ }
+
+ file = fopen("/proc/self/maps", "r");
+ if (file == NULL)
+ ksft_exit_fail_msg("cannot open /proc/self/maps\n");
+
+ prev_end_addr = 0;
+ while (fgets(line, sizeof(line), file)) {
+ unsigned long hop;
+
+ if (sscanf(line, "%lx-%lx %s[rwxp-]",
+ &start_addr, &end_addr, prot) != 3)
+ ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
+
+ /* end of userspace mappings; ignore vsyscall mapping */
+ if (start_addr & (1UL << 63))
+ return 0;
+
+ /* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
+ if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
+ return 1;
+
+ prev_end_addr = end_addr;
+
+ if (prot[0] != 'r')
+ continue;
+
+ /*
+ * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
+ * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
+ * addresses after that. If the address was not held by this
+ * process, write would fail with errno set to EFAULT.
+ * Anyways, if write returns anything apart from 1, exit the
+ * program since that would mean a bug in /proc/self/maps.
+ */
+ hop = 0;
+ while (start_addr + hop < end_addr) {
+ if (write(fd, (void *)(start_addr + hop), 1) != 1)
+ return 1;
+ lseek(fd, 0, SEEK_SET);
+
+ hop += MAP_CHUNK_SIZE;
+ }
+ }
+ return 0;
+}
+
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
@@ -135,6 +197,10 @@ int main(int argc, char *argv[])
validate_addr(hptr[i], 1);
}
hchunks = i;
+ if (validate_complete_va_space()) {
+ ksft_test_result_fail("BUG in mmap() or /proc/self/maps\n");
+ ksft_finished();
+ }
for (i = 0; i < lchunks; i++)
munmap(ptr[i], MAP_CHUNK_SIZE);
--
2.34.1
Currently, VA exhaustion is being checked by passing a hint to mmap() and
expecting it to fail. This patch makes a stricter test by successful write()
calls from /proc/self/maps to a dump file, confirming that a free chunk is
indeed not available.
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240314122250.68534-1-dev.jain@arm.com/
.../selftests/mm/virtual_address_range.c | 69 +++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 7bcf8d48256a..31063613dfd9 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -12,6 +12,8 @@
#include <errno.h>
#include <sys/mman.h>
#include <sys/time.h>
+#include <fcntl.h>
+
#include "../kselftest.h"
/*
@@ -93,6 +95,69 @@ static int validate_lower_address_hint(void)
return 1;
}
+static int validate_complete_va_space(void)
+{
+ unsigned long start_addr, end_addr, prev_end_addr;
+ char line[400];
+ char prot[6];
+ FILE *file;
+ int fd;
+
+ fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
+ unlink("va_dump");
+ if (fd < 0) {
+ ksft_test_result_skip("cannot create or open dump file\n");
+ ksft_finished();
+ }
+
+ file = fopen("/proc/self/maps", "r");
+ if (file == NULL)
+ ksft_exit_fail_msg("cannot open /proc/self/maps\n");
+
+ prev_end_addr = 0;
+ while (fgets(line, sizeof(line), file)) {
+ unsigned long hop;
+ int ret;
+
+ ret = sscanf(line, "%lx-%lx %s[rwxp-]",
+ &start_addr, &end_addr, prot);
+ if (ret != 3)
+ ksft_exit_fail_msg("sscanf failed, cannot parse\n");
+
+ /* end of userspace mappings; ignore vsyscall mapping */
+ if (start_addr & (1UL << 63))
+ return 0;
+
+ /* /proc/self/maps must have gaps less than 1GB only */
+ if (start_addr - prev_end_addr >= SZ_1GB)
+ return 1;
+
+ prev_end_addr = end_addr;
+
+ if (prot[0] != 'r')
+ continue;
+
+ /*
+ * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
+ * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
+ * addresses after that. If the address was not held by this
+ * process, write would fail with errno set to EFAULT.
+ * Anyways, if write returns anything apart from 1, exit the
+ * program since that would mean a bug in /proc/self/maps.
+ */
+ hop = 0;
+ while (start_addr + hop < end_addr) {
+ if (write(fd, (void *)(start_addr + hop), 1) != 1)
+ return 1;
+ else
+ lseek(fd, 0, SEEK_SET);
+
+ hop += MAP_CHUNK_SIZE;
+ }
+ }
+ return 0;
+}
+
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
@@ -135,6 +200,10 @@ int main(int argc, char *argv[])
validate_addr(hptr[i], 1);
}
hchunks = i;
+ if (validate_complete_va_space()) {
+ ksft_test_result_fail("BUG in mmap() or /proc/self/maps\n");
+ ksft_finished();
+ }
for (i = 0; i < lchunks; i++)
munmap(ptr[i], MAP_CHUNK_SIZE);
--
2.34.1
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature will be available in the
next version of "AMD64 Architecture Programmer’s Manual".
Testing Done:
Added a selftest to test the Idle HLT intercept functionality.
Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
tools: Add KVM exit reason for the Idle HLT
selftests: Add an interface to read the data of named vcpu stat
selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 11 +-
tools/arch/x86/include/uapi/asm/svm.h | 2 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 11 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 41 ++++++
.../selftests/kvm/x86_64/svm_idlehlt_test.c | 119 ++++++++++++++++++
9 files changed, 186 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
base-commit: fdd58834d132046149699b88a27a0db26829f4fb
--
2.34.1
Hi,
While running kselftest on vanilla torvalds tree kernel commit v6.8-11167-g4438a810f396,
the test suite reported a number of errors.
I was using the latest iproute2-next suite on an Ubuntu 22.04 LTS box.
# Tests passed: 558
# Tests failed: 84
not ok 90 selftests: net: test_vxlan_mdb.sh # exit=1
495:# TEST: Destination IP - match [FAIL]
496:# TEST: Destination IP - no match [FAIL]
497:# TEST: Default destination port - match [FAIL]
498:# TEST: Default destination port - no match [FAIL]
499:# TEST: Non-default destination port - match [FAIL]
500:# TEST: Non-default destination port - no match [FAIL]
501:# TEST: Default destination VNI - match [FAIL]
502:# TEST: Default destination VNI - no match [FAIL]
503:# TEST: Non-default destination VNI - match [FAIL]
504:# TEST: Non-default destination VNI - no match [FAIL]
521:# TEST: Destination IP - match [FAIL]
522:# TEST: Destination IP - no match [FAIL]
523:# TEST: Default destination port - match [FAIL]
524:# TEST: Default destination port - no match [FAIL]
525:# TEST: Non-default destination port - match [FAIL]
526:# TEST: Non-default destination port - no match [FAIL]
527:# TEST: Default destination VNI - match [FAIL]
528:# TEST: Default destination VNI - no match [FAIL]
529:# TEST: Non-default destination VNI - match [FAIL]
530:# TEST: Non-default destination VNI - no match [FAIL]
549:# TEST: Forward valid source - first VTEP [FAIL]
550:# TEST: Forward valid source - second VTEP [FAIL]
551:# TEST: Block excluded source after removal - first VTEP [FAIL]
552:# TEST: Block excluded source after removal - second VTEP [FAIL]
553:# TEST: Forward valid source after removal - first VTEP [FAIL]
554:# TEST: Forward valid source after removal - second VTEP [FAIL]
571:# TEST: Forward valid source - first VTEP [FAIL]
572:# TEST: Forward valid source - second VTEP [FAIL]
573:# TEST: Block excluded source after removal - first VTEP [FAIL]
574:# TEST: Block excluded source after removal - second VTEP [FAIL]
575:# TEST: Forward valid source after removal - first VTEP [FAIL]
576:# TEST: Forward valid source after removal - second VTEP [FAIL]
593:# TEST: Forward valid source - first VTEP [FAIL]
594:# TEST: Forward valid source - second VTEP [FAIL]
595:# TEST: Block excluded source after removal - first VTEP [FAIL]
596:# TEST: Block excluded source after removal - second VTEP [FAIL]
597:# TEST: Forward valid source after removal - first VTEP [FAIL]
598:# TEST: Forward valid source after removal - second VTEP [FAIL]
615:# TEST: Forward valid source - first VTEP [FAIL]
616:# TEST: Forward valid source - second VTEP [FAIL]
617:# TEST: Block excluded source after removal - first VTEP [FAIL]
618:# TEST: Block excluded source after removal - second VTEP [FAIL]
619:# TEST: Forward valid source after removal - first VTEP [FAIL]
620:# TEST: Forward valid source after removal - second VTEP [FAIL]
636:# TEST: Forward valid source [FAIL]
637:# TEST: Receive of valid source after removal from group [FAIL]
648:# TEST: Forward valid source [FAIL]
649:# TEST: Receive of valid source after removal from group [FAIL]
660:# TEST: Forward valid source [FAIL]
661:# TEST: Receive of valid source after removal from group [FAIL]
672:# TEST: Forward valid source [FAIL]
673:# TEST: Receive of valid source after removal from group [FAIL]
683:# TEST: Egress VNI translation - PVID configured [FAIL]
684:# TEST: Egress VNI translation - no PVID configured [FAIL]
685:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
695:# TEST: Egress VNI translation - PVID configured [FAIL]
696:# TEST: Egress VNI translation - no PVID configured [FAIL]
697:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
707:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
709:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
710:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
711:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
712:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
713:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
714:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
715:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
716:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
734:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
736:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
737:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
738:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
739:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
740:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
741:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
742:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
743:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
761:# TEST: IP multicast - first VTEP [FAIL]
763:# TEST: Broadcast - first VTEP [FAIL]
765:# TEST: IP multicast after removal - first VTEP [FAIL]
766:# TEST: IP multicast after removal - second VTEP [FAIL]
779:# TEST: IP multicast - first VTEP [FAIL]
781:# TEST: Broadcast - first VTEP [FAIL]
783:# TEST: IP multicast after removal - first VTEP [FAIL]
784:# TEST: IP multicast after removal - second VTEP [FAIL]
The problem is present at least since 6.8-rc7.
Please find attached the config and the full output of test_vxlan_mdb.sh.
Hope this helps.
Best regards,
Mirsad Todorovac
New version of the sleepable bpf_timer code, without the HID changes, as
they can now go through the HID tree independantly.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (6):
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 132 ++++++++++++++++++++-
kernel/bpf/verifier.c | 92 +++++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/bpf_experimental.h | 4 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 1 +
tools/testing/selftests/bpf/progs/timer.c | 40 ++++++-
tools/testing/selftests/bpf/progs/timer_failure.c | 114 +++++++++++++++++-
11 files changed, 387 insertions(+), 11 deletions(-)
---
base-commit: 9187210eee7d87eea37b45ea93454a88681894a4
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a pasid.
This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address
space or nested on top of an GPA address space. This series only add VT-d
support as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Change log:
v1:
- Implemnet iommu_replace_device_pasid() to fall back to the original domain
if this replacement failed (Kevin)
- Add check in do_attach() to check corressponding attach_fn per the pasid value.
rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 50 +++--
drivers/iommu/iommufd/iommufd_private.h | 23 +++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++
drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 785 insertions(+), 42 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c
--
2.34.1
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Acked-by: Stanislav Fomichev <sdf(a)google.com>
---
V2 -> V3: No changes
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: c733239f8f530872a1f80d8c45dcafbaff368737
--
2.40.1
On riscv, mmap currently returns an address from the largest address
space that can fit entirely inside of the hint address. This makes it
such that the hint address is almost never returned. This patch raises
the mappable area up to and including the hint address. This allows mmap
to often return the hint address, which allows a performance improvement
over searching for a valid address as well as making the behavior more
similar to other architectures.
Note that a previous patch introduced stronger semantics compared to
other architectures for riscv mmap. On riscv, mmap will not use bits in
the upper bits of the virtual address depending on the hint address. On
other architectures, a random address is returned in the address space
requested. On all architectures the hint address will be returned if it
is available. This allows riscv applications to configure how many bits
in the virtual address should be left empty. This has the two benefits
of being able to request address spaces that are smaller than the
default and doesn't require the application to know the page table
layout of riscv.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Add back forgotten "mmap_end = STACK_TOP_MAX"
- Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@…
---
Charlie Jenkins (3):
riscv: mm: Use hint address in mmap if available
selftests: riscv: Generalize mm selftests
docs: riscv: Define behavior of mmap
Documentation/arch/riscv/vm-layout.rst | 16 ++--
arch/riscv/include/asm/processor.h | 22 +++---
tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_default.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++-----------
5 files changed, 67 insertions(+), 104 deletions(-)
---
base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc
change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1
--
- Charlie
v3:
- pull forward to v6.8
- style and small fixups recommended by jcameron
- update syscall number (will do all archs when RFC tag drops)
- update for new folio code
- added OCP link to device-tracked address hotness proposal
- kept void* over __u64 simply because it integrates cleanly with
existing migration code. If there's strong opinions, I can refactor.
This patch set is a proposal for a syscall analogous to move_pages,
that migrates pages between NUMA nodes using physical addressing.
The intent is to better enable user-land system-wide memory tiering
as CXL devices begin to provide memory resources on the PCIe bus.
For example, user-land software which is making decisions based on
data sources which expose physical address information no longer
must convert that information to virtual addressing to act upon it
(see background for info on how physical addresses are acquired).
The syscall requires CAP_SYS_ADMIN, since physical address source
information is typically protected by the same (or CAP_SYS_NICE).
This patch set broken into 3 patches:
1) refactor of existing migration code for code reuse
2) The sys_move_phys_pages system call.
3) ktest of the syscall
The sys_move_phys_pages system call validates the page may be
migrated by checking migratable-status of each vma mapping the page,
and the intersection of cpuset policies each vma's task.
Background:
Userspace job schedulers, memory managers, and tiering software
solutions depend on page migration syscalls to reallocate resources
across NUMA nodes. Currently, these calls enable movement of memory
associated with a specific PID. Moves can be requested in coarse,
process-sized strokes (as with migrate_pages), and on specific virtual
pages (via move_pages).
However, a number of profiling mechanisms provide system-wide information
that would benefit from a physical-addressing version move_pages.
There are presently at least 4 ways userland can acquire physical
address information for use with this interface, and 1 hardware offload
mechanism being proposed by opencompute.
1) /proc/pid/pagemap: can be used to do page table translations.
This is only really useful for testing, and the ktest was
written using this functionality.
2) X86: IBS (AMD) and PEBS (Intel) can be configured to return physical
and/or vitual address information.
3) zoneinfo: /proc/zoneinfo exposes the start PFN of zones
4) /sys/kernel/mm/page_idle: A way to query whether a PFN is idle.
So long as the page size is known, this can be used to identify
system-wide idle pages that could be migrated to lower tiers.
https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
5) CXL Offloaded Hotness Monitoring (Proposed): a CXL memory device
may provide hot/cold information about its memory. For example,
it may report the hottest device addresses (0-based) or a physical
address (if it has access to decoders for convert bases).
DPA can be cheaply converted to HPA by combining it with data
exposed by /sys/bus/cxl/ information (region address bases).
See: https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Information from these sources facilitates systemwide resource management,
but with the limitations of migrate_pages and move_pages applying to
individual tasks, their outputs must be converted back to virtual addresses
and re-associated with specific PIDs.
Doing this reverse-translation outside of the kernel requires considerable
space and compute, and it will have to be performed again by the existing
system calls. Much of this work can be avoided if the pages can be
migrated directly with physical memory addressing.
Gregory Price (3):
mm/migrate: refactor add_page_for_migration for code re-use
mm/migrate: Create move_phys_pages syscall
ktest: sys_move_phys_pages ktest
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 5 +
include/uapi/asm-generic/unistd.h | 8 +-
kernel/sys_ni.c | 1 +
mm/migrate.c | 288 ++++++++++++++++++++----
tools/include/uapi/asm-generic/unistd.h | 8 +-
tools/testing/selftests/mm/migration.c | 99 ++++++++
8 files changed, 370 insertions(+), 41 deletions(-)
--
2.39.1
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..7989ec608454 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.34.1
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
The series adding SNC support to the kernel is currently in review [1]
but the selftests for resctrl need NUMA-aware adjustments to use these
changes. Issues with resctrl selftests not working properly with SNC
enabled were originally reported by Shaopeng Tan [2][3] and the
following series resolves them.
The main concept currently missing from resctrl selftests is that while
resctrl tracks memory accesses on a single NUMA node (which normally is
the same as the CPU socket) on machines with SNC enabled memory accesses
can leak outside of the local NUMA node and into other NUMA nodes on the
same socket. In that case resctrl could report a diminished value in one
of its monitoring technologies: Cache Monitoring Technology (CMT) or
Memory Bandwidth Monitoring (MBM) .
Implemented solutions for both CMT and MBM follow the same idea which is
to simply sum values reported by different NUMA nodes for a single
Resource Monitoring ID (RMID).
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240228112215.8044-tony.luck@intel.com/
[2] https://lore.kernel.org/all/TYAPR01MB6330B9B17686EF426D2C3F308B25A@TYAPR01M…
[3] https://lore.kernel.org/lkml/TYAPR01MB6330A4EB3633B791939EA45E8B39A@TYAPR01…
Maciej Wieczor-Retman (4):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: SNC support for CMT
selftests/resctrl: SNC support for MBM
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cache.c | 17 ++-
tools/testing/selftests/resctrl/cat_test.c | 2 +-
tools/testing/selftests/resctrl/cmt_test.c | 6 +-
tools/testing/selftests/resctrl/mba_test.c | 3 +-
tools/testing/selftests/resctrl/mbm_test.c | 4 +-
tools/testing/selftests/resctrl/resctrl.h | 13 +-
tools/testing/selftests/resctrl/resctrl_val.c | 46 ++++---
tools/testing/selftests/resctrl/resctrlfs.c | 128 +++++++++++++++++-
8 files changed, 185 insertions(+), 34 deletions(-)
--
2.44.0
Hi,
With the commit v6.8-11167-g4438a810f396 in vanilla torvalds tree, there seem to be problems with
the icmp_redirect.sh tests.
The iproute2-next tools were used, commit 7a6d30c95da9.
# timeout set to 3600
# selftests: net: icmp_redirect.sh
#
# ###########################################################################
# Legacy routing
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Legacy routing with VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects and VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# Tests passed: 28
# Tests failed: 12
# Tests xfailed: 0
not ok 45 selftests: net: icmp_redirect.sh # exit=1
These errors are not introduced with this commit, but were already present at least in 6.8-rc7.
Hope this helps.
Best regards,
Mirsad Todorovac
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
---
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: 4c8644f86c854c214aaabbcc24a27fa4c7e6a951
--
2.40.1
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v3:
- Changing from input() to stdin
- Add checking for non-regular files
- Spacing fix
- Small printing fix
tools/testing/kunit/kunit.py | 54 +++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..641b8ca83e3e 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,40 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append('/dev/stdin')
+ elif len(parsed_files) == 1 and parsed_files[0] == "debugfs":
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isdir(file):
+ print(f'Ignoring directory "{file}"')
+ elif os.path.exists(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ else:
+ print(f'Could not find "{file}"')
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +590,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.291.gc1ea87d7ee-goog
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v2:
- Fixed bug with input from command line. I changed this to use
input(). Daniel, let me know if this works for you.
- Add more specific warning messages
tools/testing/kunit/kunit.py | 56 +++++++++++++++++++++++++-----------
1 file changed, 40 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..1aa3d736d80c 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,42 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append(input("File path: "))
+
+ if parsed_files[0] == "debugfs" and len(parsed_files) == 1:
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isfile(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ elif os.path.isdir(file):
+ print("Ignoring directory ", file)
+ else:
+ print("Could not find ", file)
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +592,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.278.ge034bb2e1d-goog
Add missing flags argument to open(2) call with O_CREAT.
Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid
value) (together with -O), resulting in similar error messages such as:
In file included from /usr/include/fcntl.h:342,
from gup_test.c:1:
In function 'open',
inlined from 'main' at gup_test.c:206:10:
/usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
50 | __open_missing_mode ();
| ^~~~~~~~~~~~~~~~~~~~~~
_FORTIFY_SOURCE is enabled by default in some distributions, so the
tests are not built by default and are skipped.
open(2) man-page warns about missing flags argument: "if it is not
supplied, some arbitrary bytes from the stack will be applied as the
file mode."
Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file")
Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split")
Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect")
Cc: Keith Busch <kbusch(a)kernel.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Vitaly Chikunov <vt(a)altlinux.org>
---
tools/testing/selftests/mm/gup_test.c | 2 +-
tools/testing/selftests/mm/soft-dirty.c | 2 +-
tools/testing/selftests/mm/split_huge_page_test.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/gup_test.c b/tools/testing/selftests/mm/gup_test.c
index cbe99594d319..18a49c70d4c6 100644
--- a/tools/testing/selftests/mm/gup_test.c
+++ b/tools/testing/selftests/mm/gup_test.c
@@ -203,7 +203,7 @@ int main(int argc, char **argv)
ksft_print_header();
ksft_set_plan(nthreads);
- filed = open(file, O_RDWR|O_CREAT);
+ filed = open(file, O_RDWR|O_CREAT, 0664);
if (filed < 0)
ksft_exit_fail_msg("Unable to open %s: %s\n", file, strerror(errno));
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index cc5f144430d4..7dbfa53d93a0 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -137,7 +137,7 @@ static void test_mprotect(int pagemap_fd, int pagesize, bool anon)
if (!map)
ksft_exit_fail_msg("anon mmap failed\n");
} else {
- test_fd = open(fname, O_RDWR | O_CREAT);
+ test_fd = open(fname, O_RDWR | O_CREAT, 0664);
if (test_fd < 0) {
ksft_test_result_skip("Test %s open() file failed\n", __func__);
return;
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index 856662d2f87a..6c988bd2f335 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -223,7 +223,7 @@ void split_file_backed_thp(void)
ksft_exit_fail_msg("Fail to create file-backed THP split testing file\n");
}
- fd = open(testfile, O_CREAT|O_WRONLY);
+ fd = open(testfile, O_CREAT|O_WRONLY, 0664);
if (fd == -1) {
ksft_perror("Cannot open testing file");
goto cleanup;
--
2.42.1