When logging an error from calling waitpid() on the child we print a
misleading error message saying that the error we report was returned by
the chilld. Fix this to say the error is from waitpid().
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 3c9bf0cd82a8..eb108727c35c 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -95,7 +95,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
getpid(), pid);
if (waitpid(-1, &status, __WALL) < 0) {
- ksft_print_msg("Child returned %s\n", strerror(errno));
+ ksft_print_msg("waitpid() returned %s\n", strerror(errno));
return -errno;
}
if (WEXITSTATUS(status))
---
base-commit: 8cb4a9a82b21623dbb4b3051dd30d98356cf95bc
change-id: 20240405-kselftest-clone3-waitpid-68c4833cf5ff
Best regards,
--
Mark Brown <broonie(a)kernel.org>
When the child exits during the clone3() selftest we use WEXITSTATUS() to
get the exit status from the process without first checking WIFEXITED() to
see if the result will be valid. This can lead to incorrect results, for
example if the child exits due to signal. Add a WIFEXTED() check and report
any non-standard exit as a failure, using EXIT_FAILURE as the exit status
for call_clone3() since we otherwise report 0 or negative errnos.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 3c9bf0cd82a8..0e0e5dfa97c6 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -98,6 +98,11 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
ksft_print_msg("Child returned %s\n", strerror(errno));
return -errno;
}
+ if (!WIFEXITED(status)) {
+ ksft_print_msg("Child did not exit normally, status 0x%x\n",
+ status);
+ return EXIT_FAILURE;
+ }
if (WEXITSTATUS(status))
return WEXITSTATUS(status);
---
base-commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
change-id: 20240405-kselftest-clone3-signal-1edb1a3f5473
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v5:
- address Martin's comments for v4 (thanks).
- update patch 2, use 'return err' instead of 'return -1/0'.
- drop patch 3 in v4.
v4:
- fix a bug in v3, it should be 'if (err)', not 'if (!err)'.
- move "selftests/bpf: Use log_err in network_helpers" out of this
series.
v3:
- add two more patches.
- use log_err instead of ASSERT in v3.
- let send_recv_data return int as Martin suggested.
v2:
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (2):
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
tools/testing/selftests/bpf/network_helpers.c | 96 +++++++++++++++++++
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +-------------
3 files changed, 98 insertions(+), 70 deletions(-)
--
2.40.1
These patches from Geliang add support for the "last time" field in
MPTCP Info, and verify that the counters look valid.
Patch 1 adds these counters: last_data_sent, last_data_recv and
last_ack_recv. They are available in the MPTCP Info, so exposed via
getsockopt(MPTCP_INFO) and the Netlink Diag interface.
Patch 2 adds a test in diag.sh MPTCP selftest, to check that the
counters have moved by at least 250ms, after having waited twice that
time.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Only patch 1/2 has been modified following Eric's suggestion, see the
individual changelog for more details.
- Link to v1: https://lore.kernel.org/r/20240405-upstream-net-next-20240405-mptcp-last-ti…
---
Geliang Tang (2):
mptcp: add last time fields in mptcp_info
selftests: mptcp: test last time mptcp_info
include/uapi/linux/mptcp.h | 4 +++
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 7 ++++
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 16 +++++++---
tools/testing/selftests/net/mptcp/diag.sh | 53 +++++++++++++++++++++++++++++++
6 files changed, 79 insertions(+), 5 deletions(-)
---
base-commit: 2ecd487b670fcbb1ad4893fff1af4aafdecb6023
change-id: 20240405-upstream-net-next-20240405-mptcp-last-time-info-9b03618e08f1
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch series adds support to freeze the task cgroup hierarchy
that is on a default cgroup v2 without going through kernfs interface.
For some cases we want to freeze the cgroup of a task based on some
signals, doing so from bpf is better than user space which could be
too late.
Planned users of this feature are: tetragon and systemd when freezing
a cgroup hierarchy that could be a K8s pod, container, system service
or a user session.
Patch 1: cgroup: add cgroup_freeze_no_kn() to freeze a cgroup from bpf
Patch 2: bpf: add bpf_task_freeze_cgroup() to freeze the cgroup of a task
Patch 3: selftests/bpf: add selftest for bpf_task_freeze_cgroup
include/linux/cgroup.h | 2 ++
kernel/bpf/helpers.c | 31 ++++
kernel/cgroup/cgroup.c | 67 ++++++++
tools/testing/selftests/bpf/prog_tests/task_freeze_cgroup.c | 165 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/test_task_freeze_cgroup.c | 110 ++++++++++++++
5 files changed, 375 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_freeze_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_freeze_cgroup.c
--
2.34.1
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
PATCH 2,3,14 : FW_READ_HI function implementation
PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
PATCH 16-17: Generic improvements for kvm selftests
PATCH 18-22: KVM selftests for SBI PMU extension
The series is based on v6.9-rc1 and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v5
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v4->v5:
1. Moved sbi related definitions to its own header file from processor.h
2. Added few helper functions for selftests.
3. Improved firmware counter read and RV32 start/stop functions.
4. Converted all the shifting operations to use BIT macro
5. Addressed all other comments on v4.
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (22):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
drivers/perf: riscv: Use BIT macro for shifting operations
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
drivers/perf: riscv: Fix counter mask iteration for RV32
RISC-V: KVM: Fix the initial sample period value
RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
RISC-V: KVM: Improve firmware counter read function
KVM: riscv: selftests: Move sbi definitions to its own header file
KVM: riscv: selftests: Add helper functions for extension checks
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
arch/riscv/include/asm/sbi.h | 34 +-
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/paravirt.c | 6 +-
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 15 +-
arch/riscv/kvm/vcpu_onereg.c | 5 +
arch/riscv/kvm/vcpu_pmu.c | 260 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 264 +++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 49 +-
.../testing/selftests/kvm/include/riscv/sbi.h | 141 +++++
.../selftests/kvm/include/riscv/ucall.h | 1 +
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
.../selftests/kvm/riscv/get-reg-list.c | 4 +
.../selftests/kvm/riscv/sbi_pmu_test.c | 581 ++++++++++++++++++
tools/testing/selftests/kvm/steal_time.c | 4 +-
23 files changed, 1322 insertions(+), 112 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
--
2.34.1
This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,6,7 : Generic cleanups/improvements.
PATCH 2,3,10 : FW_READ_HI function implementation
PATCH 4-5: Add PMU snapshot feature in sbi pmu driver
PATCH 6-7: KVM implementation for snapshot and sampling in kvm guests
PATCH 11-15: KVM selftests for SBI PMU extension
The series is based on kvm-next and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v4
The series is based on kvm-riscv/queue branch + fixes suggested on the following
series
https://patchwork.kernel.org/project/kvm/cover/cover.1705916069.git.haibo1.…
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
...
$ Running 'sched/messaging' benchmark:
...
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (15):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
RISC-V: Add SBI PMU snapshot definitions
drivers/perf: riscv: Implement SBI PMU snapshot function
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/errata_list.h | 2 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 14 +-
arch/riscv/include/asm/sbi.h | 12 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 14 +-
arch/riscv/kvm/vcpu_onereg.c | 9 +-
arch/riscv/kvm/vcpu_pmu.c | 247 +++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 15 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 229 ++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/riscv/processor.h | 92 +++
.../selftests/kvm/lib/riscv/processor.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
tools/testing/selftests/kvm/riscv/sbi_pmu.c | 588 ++++++++++++++++++
18 files changed, 1212 insertions(+), 45 deletions(-)
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu.c
--
2.34.1
This series implements support for the 2023 dpISA extensions in KVM
guests, it was previously posted as part of a series with the host
support but that has now been merged so only the KVM portions remain.
Most of these extensions add only new instructions so the guest support
consists of adding the relevant ID registers, masking out other features
like the 2023 MTE extensions.
FEAT_FPMR introduces a new system register FPMR to the floating point
state which we enable guest access to and context switch when the ID
registers indicate that it is supported. Currently we implement
visibility for FPMR with a fpmr_visibility() function as for other
system registers, I will separately look into adding support for
specifying this in the struct sys_reg_desc.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v6:
- Rebase onto v6.9-rc1.
- The host portions of the series were merged so only the KVM guest
support remains.
- Link to v5: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-0-c568edc8ed7f@kerne…
Changes in v5:
- Rebase onto v6.8-rc3.
- Use u64 rather than unsigned long for storing FPMR.
- Temporarily drop KVM guest support due to issues with KVM being a
moving target.
- Link to v4: https://lore.kernel.org/r/20240122-arm64-2023-dpisa-v4-0-776e094861df@kerne…
Changes in v4:
- Rebase onto v6.8-rc1.
- Move KVM support to the end of the series.
- Link to v3: https://lore.kernel.org/r/20231205-arm64-2023-dpisa-v3-0-dbcbcd867a7f@kerne…
Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kerne…
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (5):
KVM: arm64: Share all userspace hardened thread data with the hypervisor
KVM: arm64: Add newly allocated ID registers to register descriptions
KVM: arm64: Support FEAT_FPMR for guests
KVM: arm64: selftests: Document feature registers added in 2023 extensions
KVM: arm64: selftests: Teach get-reg-list about FPMR
arch/arm64/include/asm/kvm_host.h | 6 ++++--
arch/arm64/include/asm/processor.h | 2 +-
arch/arm64/kvm/emulate-nested.c | 9 ++++++++
arch/arm64/kvm/fpsimd.c | 15 +++++++-------
arch/arm64/kvm/hyp/include/hyp/switch.h | 9 ++++++--
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++--
arch/arm64/kvm/sys_regs.c | 24 +++++++++++++++++++---
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 11 ++++++++--
8 files changed, 60 insertions(+), 20 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The test_offload.py test fits in networking and bpf equally
well. We started adding more Python tests in networking
and some of the code in test_offload.py can be reused,
so move it to networking. Looks like it bit rotted over
time and some fixes are needed.
Admittedly more code could be extracted but I only had
the time for a minor cleanup :(
Jakub Kicinski (4):
selftests: move bpf-offload test from bpf to net
selftests: net: bpf_offload: wait for maps
selftests: net: declare section names for bpf_offload
selftests: net: reuse common code in bpf_offload
tools/testing/selftests/bpf/Makefile | 1 -
tools/testing/selftests/net/Makefile | 11 +-
.../test_offload.py => net/bpf_offload.py} | 138 +++++-------------
tools/testing/selftests/net/lib/py/nsim.py | 9 +-
.../sample_map_ret0.bpf.c} | 2 +-
.../sample_ret0.c => net/sample_ret0.bpf.c} | 3 +
6 files changed, 57 insertions(+), 107 deletions(-)
rename tools/testing/selftests/{bpf/test_offload.py => net/bpf_offload.py} (93%)
rename tools/testing/selftests/{bpf/progs/sample_map_ret0.c => net/sample_map_ret0.bpf.c} (96%)
rename tools/testing/selftests/{bpf/progs/sample_ret0.c => net/sample_ret0.bpf.c} (70%)
--
2.44.0
New version of the sleepable bpf_timer code, without BPF changes, as
they can now go through the HID tree independantly:
https://lore.kernel.org/all/20240315-hid-bpf-sleepable-v4-0-5658f2540564@ke…
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Jiri Kosina <jikos(a)kernel.org>
To: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <linux-input(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-doc(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the BPF changes, they can go independently in bpf-core
- dropped the HID-BPF integration tests with the sleppable timers,
I'll re-add them once both series (this and sleepable timers) are
merged
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (7):
HID: bpf/dispatch: regroup kfuncs definitions
HID: bpf: export hid_hw_output_report as a BPF kfunc
selftests/hid: add KASAN to the VM tests
selftests/hid: Add test for hid_bpf_hw_output_report
HID: bpf: allow to inject HID event from BPF
selftests/hid: add tests for hid_bpf_input_report
HID: bpf: allow to use bpf_timer_set_sleepable_cb() in tracing callbacks.
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 226 ++++++++++++++-------
drivers/hid/hid-core.c | 2 +
include/linux/hid_bpf.h | 3 +
tools/testing/selftests/hid/config.common | 1 +
tools/testing/selftests/hid/hid_bpf.c | 112 +++++++++-
tools/testing/selftests/hid/progs/hid.c | 46 +++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 6 +
8 files changed, 324 insertions(+), 74 deletions(-)
---
base-commit: 3e78a6c0d3e02e4cf881dc84c5127e9990f939d6
change-id: 20240314-b4-hid-bpf-new-funcs-ecf05d0ef870
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Currently, the sud_test expects the emulated syscall to return the
emulated syscall number. This assumption only works on architectures
were the syscall calling convention use the same register for syscall
number/syscall return value. This is not the case for RISC-V and thus
the return value must be also emulated using the provided ucontext.
Signed-off-by: Clément Léger <cleger(a)rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Acked-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
Changes in V2:
- Changes comment to be more explicit
- Use A7 syscall arg rather than hardcoding MAGIC_SYSCALL_1
---
.../selftests/syscall_user_dispatch/sud_test.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/syscall_user_dispatch/sud_test.c b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
index b5d592d4099e..d975a6767329 100644
--- a/tools/testing/selftests/syscall_user_dispatch/sud_test.c
+++ b/tools/testing/selftests/syscall_user_dispatch/sud_test.c
@@ -158,6 +158,20 @@ static void handle_sigsys(int sig, siginfo_t *info, void *ucontext)
/* In preparation for sigreturn. */
SYSCALL_DISPATCH_OFF(glob_sel);
+
+ /*
+ * The tests for argument handling assume that `syscall(x) == x`. This
+ * is a NOP on x86 because the syscall number is passed in %rax, which
+ * happens to also be the function ABI return register. Other
+ * architectures may need to swizzle the arguments around.
+ */
+#if defined(__riscv)
+/* REG_A7 is not defined in libc headers */
+# define REG_A7 (REG_A0 + 7)
+
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A0] =
+ ((ucontext_t *)ucontext)->uc_mcontext.__gregs[REG_A7];
+#endif
}
TEST(dispatch_and_return)
--
2.43.0
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v4 -> v5:
- Add 1st commit - flush id checks in udp_gro_receive segment which can be
backported by itself
- Add TCP measurements for the 5th commit
- Add flush id tests to ensure flush id logic is preserved in GRO
- Simplify gro_inet_flush by removing a branch
- v4:
https://lore.kernel.org/all/20240325182543.87683-1-richardbgobert@gmail.com/
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/all/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.com/
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (6):
net: gro: add flush check in udp_gro_receive_segment
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
selftests/net: add flush id selftests
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 +-
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 93 ++++++++++++--
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +-
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 7 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 54 +-------
net/ipv4/fou_core.c | 9 +-
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 22 +---
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 31 +++--
net/ipv6/ip6_offload.c | 41 +++---
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 +-
tools/testing/selftests/net/gro.c | 144 ++++++++++++++++++++++
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
25 files changed, 336 insertions(+), 161 deletions(-)
--
2.36.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Add start_server_addr helper
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Add function pointer for __start_server
selftests/bpf: Add start_server_setsockopt helper
selftests/bpf: Use start_server_setsockopt in sockopt_inherit
selftests/bpf: Use connect_to_fd in sockopt_inherit
selftests/bpf: Use start_server_* in test_tcp_check_syncookie
selftests/bpf: Use connect_to_addr in test_tcp_check_syncookie
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 53 +++++++++----
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/sk_assign.c | 53 +------------
.../bpf/prog_tests/sockopt_inherit.c | 64 ++++------------
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++----------------
.../bpf/test_tcp_check_syncookie_user.c | 76 +++----------------
8 files changed, 85 insertions(+), 281 deletions(-)
--
2.40.1
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace. IMO, the
kernel should provide a set of workingset interfaces that should be
generic enough to accommodate the various use cases, and be extensible
to potential future use cases. The current proposed interfaces are not
sufficient in that regard, but I would like to start somewhere, solicit
feedback, and iterate.
Use cases
==========
Job scheduling
For data center machines, workingset information allows the job scheduler
to right-size each job and land more jobs on the same host or NUMA node,
and in the case of a job with increasing workingset, policy decisions
can be made to migrate other jobs off the host/NUMA node, or oom-kill
the misbehaving job. If the job shape is very different from the machine
shape, knowing the workingset per-node can also help inform page
allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting enables the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
While this patch series does not extend the virtio-balloon device,
balloon policies benefit from workingset to more precisely determine
the size of the memory balloon. On desktops/laptops/mobile devices where
memory is scarce and overcommitted, the balloon sizing in multiple VMs
running on the same device can be orchestrated with workingset reports
from each one.
Promotion/Demotion
Similar to proactive reclaim, a workingset report enables demotion to a
slower tier of memory.
For promotion, the workingset report interfaces need to be extended to
report hotness and gather hotness information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
I realize this does not generalize well to hotness information, but I
lack the intuition for an abstraction that presents hotness in a useful
way. Based on a recent proposal for move_phys_pages[2], it seems like
userspace tiering software would like to move specific physical pages,
instead of informing the kernel "move x number of hot pages to y
device". Please advise.
[2]
https://lore.kernel.org/lkml/20240319172609.332900-1-gregory.price@memverge…
Implementation
==========
Currently, the reporting of user pages is based off of MGLRU, and
therefore requires CONFIG_LRU_GEN=y. We would benefit from more MGLRU
generations for a more fine-grained workingset report. I will make the
generation count configurable in the next version. The workingset
reporting mechanism is gated behind CONFIG_WORKINGSET_REPORT, and the
aging thread is behind CONFIG_WORKINGSET_REPORT_AGING.
--
Changes from RFC v2 -> RFC v3:
- Update to v6.8
- Added an aging kernel thread (gated behind config)
- Added basic selftests for sysfs interface files
- Track swapped out pages for reaccesses
- Refactoring and cleanup
- Dropped the virtio-balloon extension to make things manageable
Changes from RFC v1 -> RFC v2:
- Refactored the patchs into smaller pieces
- Renamed interfaces and functions from wss to wsr (Working Set Reporting)
- Fixed build errors when CONFIG_WSR is not set
- Changed working_set_num_bins to u8 for virtio-balloon
- Added support for per-NUMA node reporting for virtio-balloon
[rfc v1]
https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.co…
[rfc v2]
https://lore.kernel.org/linux-mm/20230621180454.973862-1-yuanchu@google.com/
Yuanchu Xie (8):
mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true
mm: aggregate working set information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend working set reporting to memcgs
mm: add per-memcg reaccess histogram
mm: add kernel aging thread for workingset reporting
mm: test system-wide workingset reporting
drivers/base/node.c | 3 +
include/linux/memcontrol.h | 5 +
include/linux/mmzone.h | 4 +
include/linux/workingset_report.h | 107 +++
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 45 ++
mm/memcontrol.c | 386 ++++++++-
mm/mmzone.c | 2 +
mm/vmscan.c | 95 ++-
mm/workingset.c | 9 +-
mm/workingset_report.c | 757 ++++++++++++++++++
mm/workingset_report_aging.c | 127 +++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
.../testing/selftests/mm/workingset_report.c | 315 ++++++++
.../testing/selftests/mm/workingset_report.h | 37 +
.../selftests/mm/workingset_report_test.c | 328 ++++++++
18 files changed, 2231 insertions(+), 10 deletions(-)
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.44.0.396.g6e790dbe36-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- add two more patches.
- use log_err instead of ASSERT in v3.
- let send_recv_data return int as Martin suggested.
v2:
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (4):
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
selftests/bpf: Support nonblock for send_recv_data
tools/testing/selftests/bpf/network_helpers.c | 125 +++++++++++++++++-
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +---------
3 files changed, 121 insertions(+), 76 deletions(-)
--
2.40.1
Hi,
This patch series teaches KUnit to handle kthread faults as errors, and
it brings a few related fixes and improvements.
Shuah, everything should be OK now, could you please merge this series?
All these tests pass (on top of v6.8):
./tools/testing/kunit/kunit.py run --alltests
./tools/testing/kunit/kunit.py run --alltests --arch x86_64
./tools/testing/kunit/kunit.py run --alltests --arch arm64 \
--cross_compile=aarch64-linux-gnu-
I also built and ran KUnit tests as a kernel module.
A new test case check NULL pointer dereference, which wasn't possible
before.
This is useful to test current kernel self-protection mechanisms or
future ones such as Heki: https://github.com/heki-linux
Previous versions:
v3: https://lore.kernel.org/r/20240319104857.70783-1-mic@digikod.net
v2: https://lore.kernel.org/r/20240301194037.532117-1-mic@digikod.net
v1: https://lore.kernel.org/r/20240229170409.365386-1-mic@digikod.net
Regards,
Mickaël Salaün (7):
kunit: Handle thread creation error
kunit: Fix kthread reference
kunit: Fix timeout message
kunit: Handle test faults
kunit: Fix KUNIT_SUCCESS() calls in iov_iter tests
kunit: Print last test location on fault
kunit: Add tests for fault
include/kunit/test.h | 24 ++++++++++++++++++---
include/kunit/try-catch.h | 3 ---
kernel/kthread.c | 1 +
lib/kunit/kunit-test.c | 45 ++++++++++++++++++++++++++++++++++++++-
lib/kunit/try-catch.c | 38 ++++++++++++++++++++++-----------
lib/kunit_iov_iter.c | 18 ++++++++--------
6 files changed, 101 insertions(+), 28 deletions(-)
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
--
2.44.0
riscv-debug-spec [1] Chapter 5: Sdtrig extension introduces
trigger CSRs which can cause a breakpoint exception,
entry into Debug Mode, or a trace action without having to
execute a special instruction.
The focus in the following patches is on the two CSRs from
the Sdtrig extension: hcontext and scontext.
These two CSRs are optional according to the spec, apart from
the Smstateen extension [2], which has bit 57 to control the
accessbility of the hcontext/scontext CSRs.
We also introduce dt-binding in the CPU DTS for the existence of
the CSRs in situations where the Smstaten extension is not available.
The hcontext/scontext CSRs can help to raise triggers with the
textra32/textra64 CSRs set up correctly. (Chapter 5.7.17/ 5.7.18 [1])
Therefore, as part of Linux awareness debugging.
We propose the scontext CSR be filled by the Linux PID,
And the hcontext CSR be filled with a self-maintained Guest OS ID.
The reason for using the self-maintained Guest OS ID instead of VMID is
that VMID might change over time, and the user setting up the trigger
might enter the previous value, invoking the wrong VM for debugging.
The tests have been done on QEMU with Sdtrig CSRs
(mcontext/hcontext/scontext implemented) [3] boot on virt machine
and also run the Guest OS as virt machine with KVM enabled,
the two hcontext/scontext CSRs can be written correctly.
This patch series is based on v6.9-rc1.
Link: https://github.com/riscv/riscv-debug-spec/releases/download/ar20231208/risc… [1]
Link: https://github.com/riscvarchive/riscv-state-enable/releases/download/v1.0.0… [2]
Link: https://github.com/sifive/qemu/tree/dev/maxh/sdtrig_ISA [3]
Signed-off-by: Max Hsu <max.hsu(a)sifive.com>
---
Max Hsu (7):
dt-bindings: riscv: Add Sdtrig ISA extension
dt-bindings: riscv: Add Sdtrig optional CSRs existence on DT
riscv: Add ISA extension parsing for Sdtrig
riscv: Add Sdtrig CSRs definition, Smstateen bit to access Sdtrig CSRs
riscv: cpufeature: Add Sdtrig optional CSRs checks
riscv: suspend: add Smstateen CSRs save/restore
riscv: Add task switch support for scontext CSR
Yong-Xuan Wang (4):
riscv: KVM: Add Sdtrig Extension Support for Guest/VM
riscv: KVM: Add scontext to ONE_REG
riscv: KVM: Add hcontext support
KVM: riscv: selftests: Add Sdtrig Extension to get-reg-list test
Documentation/devicetree/bindings/riscv/cpus.yaml | 18 +++
.../devicetree/bindings/riscv/extensions.yaml | 7 +
arch/riscv/include/asm/csr.h | 6 +
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/kvm_host.h | 14 ++
arch/riscv/include/asm/kvm_vcpu_debug.h | 24 +++
arch/riscv/include/asm/suspend.h | 7 +
arch/riscv/include/asm/switch_to.h | 15 ++
arch/riscv/include/uapi/asm/kvm.h | 9 ++
arch/riscv/kernel/cpufeature.c | 162 +++++++++++++++++++++
arch/riscv/kernel/suspend.c | 25 ++++
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/main.c | 4 +
arch/riscv/kvm/vcpu.c | 8 +
arch/riscv/kvm/vcpu_debug.c | 107 ++++++++++++++
arch/riscv/kvm/vcpu_onereg.c | 63 +++++++-
arch/riscv/kvm/vm.c | 4 +
tools/testing/selftests/kvm/riscv/get-reg-list.c | 27 ++++
18 files changed, 500 insertions(+), 2 deletions(-)
---
base-commit: 317c7bc0ef035d8ebfc3e55c5dde0566fd5fb171
change-id: 20240329-dev-maxh-lin-452-6-9-c6209e6db67f
Best regards,
--
Max Hsu <max.hsu(a)sifive.com>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Address Martin's comments for v1 (thanks.)
- drop patch 1, "export send_byte helper".
- drop "WRITE_ONCE(arg.stop, 0)".
- rebased.
send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
Geliang Tang (2):
selftests/bpf: Add struct send_recv_arg
selftests/bpf: Export send_recv_data helper
tools/testing/selftests/bpf/network_helpers.c | 85 +++++++++++++++++++
tools/testing/selftests/bpf/network_helpers.h | 1 +
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 71 +---------------
3 files changed, 87 insertions(+), 70 deletions(-)
--
2.40.1
From: Jason Xing <kernelxing(a)tencent.com>
The output goes like this if I make samples/bpf:
...warning: no previous prototype for ‘get_cgroup_id_from_path’...
Make this function static could solve the warning problem since
no one outside of the file calls it.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index 19be9c63d5e8..f2952a65dcc2 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -429,7 +429,7 @@ int create_and_get_cgroup(const char *relative_path)
* which is an invalid cgroup id.
* If there is a failure, it prints the error to stderr.
*/
-unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
+static unsigned long long get_cgroup_id_from_path(const char *cgroup_workdir)
{
int dirfd, err, flags, mount_id, fhsize;
union {
--
2.37.3
v2:
- Found that rebuild_sched_domains() has external callers, revert its
change and introduce rebuild_sched_domains_cpuslocked() instead.
As discussed in the LKML thread [1], the asynchronous nature of cpuset
hotplug handling code is causing problem with RCU testing. With recent
changes in the way locking is being handled in the cpuset code, it is
now possible to make the cpuset hotplug code synchronous again without
major changes.
This series enables the hotplug code to call directly into cpuset hotplug
core without indirection with the exception of the special case of v1
cpuset becoming empty still being handled indirectly with workqueue.
A new simple test case was also written to test this special v1 cpuset
case. The test_cpuset_prs.sh script was also run with LOCKDEP on to
verify that there is no regression.
[1] https://lore.kernel.org/lkml/ZgYikMb5kZ7rxPp6@slm.duckdns.org/
Waiman Long (2):
cgroup/cpuset: Make cpuset hotplug processing synchronous
cgroup/cpuset: Add test_cpuset_v1_hp.sh
include/linux/cpuset.h | 3 -
kernel/cgroup/cpuset.c | 141 +++++++-----------
kernel/cpu.c | 48 ------
kernel/power/process.c | 2 -
tools/testing/selftests/cgroup/Makefile | 2 +-
.../selftests/cgroup/test_cpuset_v1_hp.sh | 46 ++++++
6 files changed, 103 insertions(+), 139 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
--
2.39.3
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (14):
queue_api: define queue api
net: page_pool: factor out page_pool recycle check
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 52 +++
Documentation/networking/devmem.rst | 271 ++++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/netdevice.h | 24 ++
include/linux/skbuff.h | 67 ++-
include/linux/socket.h | 1 +
include/net/devmem.h | 127 ++++++
include/net/netdev_rx_queue.h | 1 +
include/net/netmem.h | 234 +++++++++-
include/net/page_pool/helpers.h | 154 +++++--
include/net/page_pool/types.h | 28 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 14 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 2 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 413 ++++++++++++++++++
net/core/gro.c | 7 +-
net/core/netdev-genl-gen.c | 19 +
net/core/netdev-genl-gen.h | 2 +
net/core/netdev-genl.c | 123 ++++++
net/core/page_pool.c | 362 +++++++++-------
net/core/skbuff.c | 110 ++++-
net/core/sock.c | 61 +++
net/ipv4/tcp.c | 257 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 9 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++
42 files changed, 2764 insertions(+), 270 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.44.0.rc1.240.g4c46232300-goog
Here are some patches from Geliang, doing different cleanups, and
supporting 'ip mptcp' in more MPTCP selftests.
Patch 1 checks that TC is available in selftests requiring it.
Patch 2 adds 'ms' units in TC commands, to avoid confusions.
Patches 3-9 are some prerequisites for patch 10: some export code from
mptcp_join.sh to mptcp_lib.sh, to be re-used in pm_netlink.sh,
mptcp_sockopt.sh and simult_flows.sh ; and others add helpers to
pm_netlink.sh to easily support both 'ip mptcp' and 'pm_nl_ctl' tools to
interact with the in-kernel MPTCP path-manager.
Patch 10 adds a '-i' parameter in mptcp_sockopt.sh, pm_netlink.sh, and
simult_flows.sh to use 'ip mptcp' tool instead of 'pm_nl_ctl'.
Patch 11 fixes some ShellCheck warnings in pm_netlink.sh, in order to
drop a ShellCheck's 'disable' instruction.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (11):
selftests: mptcp: add tc check for check_tools
selftests: mptcp: add ms units for tc-netem delay
selftests: mptcp: export ip_mptcp to mptcp_lib
selftests: mptcp: netlink: add 'limits' helpers
selftests: mptcp: add {get,format}_endpoint(s) helpers
selftests: mptcp: netlink: add change_address helper
selftests: mptcp: join: update endpoint ops
selftests: mptcp: export pm_nl endpoint ops
selftests: mptcp: use pm_nl endpoint ops
selftests: mptcp: ip_mptcp option for more scripts
selftests: mptcp: netlink: drop disable=SC2086
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 155 ++----------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 135 ++++++++++
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 34 ++-
tools/testing/selftests/net/mptcp/pm_netlink.sh | 281 +++++++++++++--------
tools/testing/selftests/net/mptcp/simult_flows.sh | 20 +-
6 files changed, 375 insertions(+), 252 deletions(-)
---
base-commit: d76c740b2eaaddc5fc3a8b21eaec5b6b11e8c3f5
change-id: 20240405-upstream-net-next-20240405-mptcp-selftests-refactoring-f5ed9780df8e
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.
The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.
This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.
Running tests in tree directly:
$ ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..2
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
in tree via make:
$ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
[ ... ]
and installed externally, all seem to work:
$ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
$ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
[ ... ]
For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.
v3:
- fix up netdevsim C
- various small nits in other patches (see changelog in patches)
v2: https://lore.kernel.org/all/20240403023426.1762996-1-kuba@kernel.org/
- don't add to TARGETS, create a deperate variable with deps
- support and use with
- support and use passing arguments to tests
v1: https://lore.kernel.org/all/20240402010520.1209517-1-kuba@kernel.org/
Jakub Kicinski (5):
selftests: net: add scaffolding for Netlink tests in Python
selftests: nl_netdev: add a trivial Netlink netdev test
netdevsim: report stats by default, like a real device
selftests: drivers: add scaffolding for Netlink tests in Python
testing: net-drv: add a driver test for stats reporting
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 49 ++++++++
tools/testing/selftests/Makefile | 10 +-
tools/testing/selftests/drivers/net/Makefile | 7 ++
.../testing/selftests/drivers/net/README.rst | 30 +++++
.../selftests/drivers/net/lib/py/__init__.py | 17 +++
.../selftests/drivers/net/lib/py/env.py | 52 ++++++++
tools/testing/selftests/drivers/net/stats.py | 86 +++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/lib/Makefile | 8 ++
.../testing/selftests/net/lib/py/__init__.py | 7 ++
tools/testing/selftests/net/lib/py/consts.py | 9 ++
tools/testing/selftests/net/lib/py/ksft.py | 96 +++++++++++++++
tools/testing/selftests/net/lib/py/nsim.py | 115 ++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 47 +++++++
tools/testing/selftests/net/lib/py/ynl.py | 49 ++++++++
tools/testing/selftests/net/nl_netdev.py | 24 ++++
17 files changed, 617 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/drivers/net/Makefile
create mode 100644 tools/testing/selftests/drivers/net/README.rst
create mode 100644 tools/testing/selftests/drivers/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/env.py
create mode 100755 tools/testing/selftests/drivers/net/stats.py
create mode 100644 tools/testing/selftests/net/lib/Makefile
create mode 100644 tools/testing/selftests/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/net/lib/py/consts.py
create mode 100644 tools/testing/selftests/net/lib/py/ksft.py
create mode 100644 tools/testing/selftests/net/lib/py/nsim.py
create mode 100644 tools/testing/selftests/net/lib/py/utils.py
create mode 100644 tools/testing/selftests/net/lib/py/ynl.py
create mode 100755 tools/testing/selftests/net/nl_netdev.py
--
2.44.0
From: Mark Rutland <mark.rutland(a)arm.com>
[ Upstream commit 8ecab2e64572f1aecdfc5a8feae748abda6e3347 ]
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae30..3f74c09c56b62 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.43.0
From: Mark Rutland <mark.rutland(a)arm.com>
[ Upstream commit 8ecab2e64572f1aecdfc5a8feae748abda6e3347 ]
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae30..3f74c09c56b62 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Incorrect arguments are passed to fcntl() in test_sockmap.c when invoking
it to set file status flags. If O_NONBLOCK is used as 2nd argument and
passed into fcntl, -EINVAL will be returned (See do_fcntl() in fs/fcntl.c).
The correct approach is to use F_SETFL as 2nd argument, and O_NONBLOCK as
3rd one.
Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 024a0faafb3b..34d6a1e6f664 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -603,7 +603,7 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
struct timeval timeout;
fd_set w;
- fcntl(fd, fd_flags);
+ fcntl(fd, F_SETFL, fd_flags);
/* Account for pop bytes noting each iteration of apply will
* call msg_pop_data helper so we need to account for this
* by calculating the number of apply iterations. Note user
--
2.40.1
Introduce ring__consume_n() and ring_buffer__consume_n() API to
partially consume items from one (or more) ringbuffer(s).
This can be useful, for example, to consume just a single item or when
we need to copy multiple items to a limited user-space buffer from the
ringbuffer callback.
Practical example (where this API can be used):
https://github.com/sched-ext/scx/blob/b7c06b9ed9f72cad83c31e39e9c4e2cfd8683…
See also:
https://lore.kernel.org/lkml/20240310154726.734289-1-andrea.righi@canonical…
v4:
- add a selftest to test the new API
- open a new 1.5.0 cycle
v3:
- rename ring__consume_max() -> ring__consume_n() and
ring_buffer__consume_max() -> ring_buffer__consume_n()
- add new API to a new 1.5.0 cycle
- fixed minor nits / comments
v2:
- introduce a new API instead of changing the callback's retcode
behavior
Andrea Righi (4):
libbpf: Start v1.5 development cycle
libbpf: ringbuf: allow to consume up to a certain amount of items
libbpf: Add ring__consume_n / ring_buffer__consume_n
selftests/bpf: Add tests for ring__consume_n and ring_buffer__consume_n
tools/lib/bpf/libbpf.h | 12 +++++
tools/lib/bpf/libbpf.map | 6 +++
tools/lib/bpf/libbpf_version.h | 2 +-
tools/lib/bpf/ringbuf.c | 59 ++++++++++++++++++++----
tools/testing/selftests/bpf/prog_tests/ringbuf.c | 8 ++++
5 files changed, 76 insertions(+), 11 deletions(-)
This patchset allows for io_uring zerocopy to support REQ_F_CQE_SKIP,
skipping the normal completion notification, but not the zerocopy buffer
release notification.
This patchset also includes a test to test these changes, and a patch to
mini_liburing to enable io_uring_peek_cqe, which is needed for the test.
Oliver Crumrine (3):
io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
io_uring: Add io_uring_peek_cqe to mini_liburing
io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test
io_uring/net.c | 6 +--
tools/include/io_uring/mini_liburing.h | 18 +++++++++
.../selftests/net/io_uring_zerocopy_tx.c | 37 +++++++++++++++++--
.../selftests/net/io_uring_zerocopy_tx.sh | 7 +++-
4 files changed, 59 insertions(+), 10 deletions(-)
--
2.44.0
This series aims to improve the usability of the ftrace selftests when
running as part of the kselftest runner, mainly for use with automated
systems. It fixes the output of verbose mode when run in KTAP output
mode and then enables verbose mode by default when invoked from the
kselftest runner so that the diagnostic information is there by default
when run in automated systems.
I've split this into two patches in case there is a concern with one
part but not the other, especially given the verbosity of the verbose
output when it triggers.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
tracing/selftests: Support log output when generating KTAP output
tracing/selftests: Default to verbose mode when running in kselftest
tools/testing/selftests/ftrace/ftracetest | 8 +++++++-
tools/testing/selftests/ftrace/ftracetest-ktap | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240319-kselftest-ftrace-ktap-verbose-72e37957e213
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Earlier commit fc8b2a619469378 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
added check of potential number of UDP segment vs UDP_MAX_SEGMENTS
in linux/virtio_net.h.
After this change certification test of USO guest-to-guest
transmit on Windows driver for virtio-net device fails,
for example with packet size of ~64K and mss of 536 bytes.
In general the USO should not be more restrictive than TSO.
Indeed, in case of unreasonably small mss a lot of segments
can cause queue overflow and packet loss on the destination.
Limit of 128 segments is good for any practical purpose,
with minimal meaningful mss of 536 the maximal UDP packet will
be divided to ~120 segments.
Signed-off-by: Yuri Benditovich <yuri.benditovich(a)daynix.com>
---
include/linux/udp.h | 2 +-
tools/testing/selftests/net/udpgso.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 3748e82b627b..7e75ccdf25fe 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -108,7 +108,7 @@ struct udp_sock {
#define udp_assign_bit(nr, sk, val) \
assign_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags, val)
-#define UDP_MAX_SEGMENTS (1 << 6UL)
+#define UDP_MAX_SEGMENTS (1 << 7UL)
#define udp_sk(ptr) container_of_const(ptr, struct udp_sock, inet.sk)
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
index 1d975bf52af3..85b3baa3f7f3 100644
--- a/tools/testing/selftests/net/udpgso.c
+++ b/tools/testing/selftests/net/udpgso.c
@@ -34,7 +34,7 @@
#endif
#ifndef UDP_MAX_SEGMENTS
-#define UDP_MAX_SEGMENTS (1 << 6UL)
+#define UDP_MAX_SEGMENTS (1 << 7UL)
#endif
#define CONST_MTU_TEST 1500
--
2.34.3
"Bail out! " is not descriptive. It rather should be: "Failed: " and
then this added prefix doesn't need to be added everywhere. Usually in
the logs, we are searching for "Failed" or "Error" instead of "Bail
out" so it must be replace.
Remove Error/Failed prefixes from all usages as well.
Muhammad Usama Anjum (2):
selftests: Replace "Bail out" with "Error"
selftests: Remove Error/Failed prefix from ksft_exit_fail*() usages
tools/testing/selftests/exec/load_address.c | 8 +-
.../testing/selftests/exec/recursion-depth.c | 10 +-
tools/testing/selftests/kselftest.h | 2 +-
.../selftests/mm/map_fixed_noreplace.c | 24 +--
tools/testing/selftests/mm/map_populate.c | 2 +-
tools/testing/selftests/mm/mremap_dontunmap.c | 2 +-
tools/testing/selftests/mm/pagemap_ioctl.c | 166 +++++++++---------
.../selftests/mm/split_huge_page_test.c | 2 +-
8 files changed, 108 insertions(+), 108 deletions(-)
--
2.39.2
New version of the sleepable bpf_timer code, without the HID changes, as
they can now go through the HID tree indepandantly.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v5:
- took various reviews into account
- rewrote the tests to be separated to not have a uggly include
- Link to v4: https://lore.kernel.org/r/20240315-hid-bpf-sleepable-v4-0-5658f2540564@kern…
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (6):
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 132 ++++++++++++++-
kernel/bpf/verifier.c | 96 ++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 34 ++++
.../testing/selftests/bpf/progs/timer_sleepable.c | 185 +++++++++++++++++++++
10 files changed, 458 insertions(+), 9 deletions(-)
---
base-commit: 9187210eee7d87eea37b45ea93454a88681894a4
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
These patches from Geliang add support for the "last time" field in
MPTCP Info, and verify that the counters look valid.
Patch 1 adds these counters: last_data_sent, last_data_recv and
last_ack_recv. They are available in the MPTCP Info, so exposed via
getsockopt(MPTCP_INFO) and the Netlink Diag interface.
Patch 2 adds a test in diag.sh MPTCP selftest, to check that the
counters have moved by at least 250ms, after having waited twice that
time.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (2):
mptcp: add last time fields in mptcp_info
selftests: mptcp: test last time mptcp_info
include/uapi/linux/mptcp.h | 4 +++
net/mptcp/options.c | 1 +
net/mptcp/protocol.c | 7 ++++
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 4 +++
tools/testing/selftests/net/mptcp/diag.sh | 53 +++++++++++++++++++++++++++++++
6 files changed, 72 insertions(+)
---
base-commit: d76c740b2eaaddc5fc3a8b21eaec5b6b11e8c3f5
change-id: 20240405-upstream-net-next-20240405-mptcp-last-time-info-9b03618e08f1
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.
The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.
This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.
Running tests in tree directly:
$ ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..2
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
in tree via make:
$ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
[ ... ]
and installed externally, all seem to work:
$ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
$ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
[ ... ]
For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.
v2: (see patches for minor changes)
- don't add to TARGETS, create a deperate variable with deps
- support and use with
- support and use passing arguments to tests
v1: https://lore.kernel.org/all/20240402010520.1209517-1-kuba@kernel.org/
Jakub Kicinski (7):
netlink: specs: define ethtool header flags
tools: ynl: copy netlink error to NlError
selftests: net: add scaffolding for Netlink tests in Python
selftests: nl_netdev: add a trivial Netlink netdev test
netdevsim: report stats by default, like a real device
selftests: drivers: add scaffolding for Netlink tests in Python
testing: net-drv: add a driver test for stats reporting
Documentation/netlink/specs/ethtool.yaml | 6 +
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 45 +++++++
tools/net/ynl/lib/ynl.py | 3 +-
tools/testing/selftests/Makefile | 10 +-
tools/testing/selftests/drivers/net/Makefile | 7 ++
.../testing/selftests/drivers/net/README.rst | 30 +++++
.../selftests/drivers/net/lib/py/__init__.py | 17 +++
.../selftests/drivers/net/lib/py/env.py | 52 ++++++++
tools/testing/selftests/drivers/net/stats.py | 86 +++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/lib/Makefile | 8 ++
.../testing/selftests/net/lib/py/__init__.py | 7 ++
tools/testing/selftests/net/lib/py/consts.py | 9 ++
tools/testing/selftests/net/lib/py/ksft.py | 96 ++++++++++++++
tools/testing/selftests/net/lib/py/nsim.py | 118 ++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 47 +++++++
tools/testing/selftests/net/lib/py/ynl.py | 49 ++++++++
tools/testing/selftests/net/nl_netdev.py | 24 ++++
19 files changed, 624 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/Makefile
create mode 100644 tools/testing/selftests/drivers/net/README.rst
create mode 100644 tools/testing/selftests/drivers/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/env.py
create mode 100755 tools/testing/selftests/drivers/net/stats.py
create mode 100644 tools/testing/selftests/net/lib/Makefile
create mode 100644 tools/testing/selftests/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/net/lib/py/consts.py
create mode 100644 tools/testing/selftests/net/lib/py/ksft.py
create mode 100644 tools/testing/selftests/net/lib/py/nsim.py
create mode 100644 tools/testing/selftests/net/lib/py/utils.py
create mode 100644 tools/testing/selftests/net/lib/py/ynl.py
create mode 100755 tools/testing/selftests/net/nl_netdev.py
--
2.44.0
This patchset allows for io_uring zerocopy to support REQ_F_CQE_SKIP,
skipping the normal completion notification, but not the zerocopy buffer
release notification.
This patchset also includes a test to test these changes, and a patch to
mini_liburing to enable io_uring_peek_cqe, which is needed for the test.
Oliver Crumrine (3):
io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
io_uring: Add io_uring_peek_cqe to mini_liburing
io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test
io_uring/net.c | 6 +--
tools/include/io_uring/mini_liburing.h | 18 +++++++++
.../selftests/net/io_uring_zerocopy_tx.c | 37 +++++++++++++++++--
.../selftests/net/io_uring_zerocopy_tx.sh | 7 +++-
4 files changed, 59 insertions(+), 10 deletions(-)
--
2.44.0
Hi,
As mentioned in each patch, this implements the solution that we discussed in
December 2023, in [1]. This turned out to be very clean and easy. It should also
be quite easy to maintain.
This should also make Peter Zijlstra happy, because it directly addresses the
root cause of his "NAK NAK NAK" reply [2]. :)
I haven't done much build testing, because selftests are not so easy to build
with a cross-compiler. So it's just tested on x86 64-bit so far.
[1] https://lore.kernel.org/all/783a4178-1dec-4e30-989a-5174b8176b09@redhat.com/
[2] https://lore.kernel.org/lkml/20231103121652.GA6217@noisy.programming.kicks-…
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
John Hubbard (2):
selftests: break the dependency upon local header files
selftests/mm: fix additional build errors for selftests
tools/include/uapi/linux/memfd.h | 39 +++
tools/include/uapi/linux/userfaultfd.h | 386 +++++++++++++++++++++++++
tools/testing/selftests/lib.mk | 9 +
tools/testing/selftests/mm/Makefile | 2 +-
4 files changed, 435 insertions(+), 1 deletion(-)
create mode 100644 tools/include/uapi/linux/memfd.h
create mode 100644 tools/include/uapi/linux/userfaultfd.h
base-commit: 98560e9019851bf55b8a4073978a623a3bcf98c0
--
2.44.0
In this series, ksft_exit_fail_perror() is being added which is helper
function on top of ksft_exit_fail_msg(). It prints errno and its string
form always. After writing and porting several kselftests, I've found
out that most of times ksft_exit_fail_msg() isn't useful if errno value
isn't printed. The ksft_exit_fail_perror() provides a convenient way to
always print errno when its used.
Muhammad Usama Anjum (2):
selftests: add ksft_exit_fail_perror()
selftests: exec: Use new ksft_exit_fail_perror() helper
tools/testing/selftests/exec/recursion-depth.c | 10 +++++-----
tools/testing/selftests/kselftest.h | 14 ++++++++++++++
2 files changed, 19 insertions(+), 5 deletions(-)
--
2.39.2
The comment on top of the file is used by many developers to glance over
all the available functions. Add the recently added ksft_perror() to it.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/kselftest.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 7d650a06ca359..159bf8e314fa3 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -16,6 +16,7 @@
* For each test, report any progress, debugging, etc with:
*
* ksft_print_msg(fmt, ...);
+ * ksft_perror(msg);
*
* and finally report the pass/fail/skip/xfail state of the test with one of:
*
--
2.39.2
Hi,
(This is verified on the second test box.)
In the most recent 6.8.0 release of torvalds tree kernel with selftest configs on,
process ./iommufd appears to consume 99% of a CPU core for quote a while in an
endless loop:
root 59502 8816 0 Mar11 pts/2 00:00:00 make OUTPUT=/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu -C iommu run_tests O=/home/marvin/linux/kernel/linux_torvalds
root 59503 59502 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59516 59503 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59517 59516 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59518 59517 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59522 59518 0 Mar11 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests"; . /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_fail_nth
root 59523 59522 0 Mar11 pts/2 00:00:00 perl /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/kselftest/prefix.pl
root 59635 2367 99 Mar11 pts/2 11:28:03 ./iommufd
root@stargazer:/home/marvin# strace -p 59635
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
ioctl(5, _IOC(_IOC_NONE, 0x3b, 0xa0, 0), 0x7ffdd9eebc00) = 0
.
.
.
Please find attached config. It is the vanilla kernel, the build suite marked it "dirty"
because of the modifications to the selftests (adding debug option, mostly).
The kseltest output is:
make[3]: Entering directory '/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu'
TAP version 13
1..2
# timeout set to 45
# selftests: iommu: iommufd
# TAP version 13
# 1..180
# # Starting 180 tests from 18 test cases.
# # RUN iommufd.simple_close ...
# # OK iommufd.simple_close
# ok 1 iommufd.simple_close
# # RUN iommufd.cmd_fail ...
# # OK iommufd.cmd_fail
# ok 2 iommufd.cmd_fail
# # RUN iommufd.cmd_length ...
# # OK iommufd.cmd_length
# ok 3 iommufd.cmd_length
# # RUN iommufd.cmd_ex_fail ...
# # OK iommufd.cmd_ex_fail
# ok 4 iommufd.cmd_ex_fail
# # RUN iommufd.global_options ...
# # OK iommufd.global_options
# ok 5 iommufd.global_options
# # RUN iommufd.simple_ioctls ...
# # OK iommufd.simple_ioctls
# ok 6 iommufd.simple_ioctls
# # RUN iommufd.unmap_cmd ...
# # OK iommufd.unmap_cmd
# ok 7 iommufd.unmap_cmd
# # RUN iommufd.map_cmd ...
# # OK iommufd.map_cmd
# ok 8 iommufd.map_cmd
# # RUN iommufd.info_cmd ...
# # OK iommufd.info_cmd
# ok 9 iommufd.info_cmd
# # RUN iommufd.set_iommu_cmd ...
# # OK iommufd.set_iommu_cmd
# ok 10 iommufd.set_iommu_cmd
# # RUN iommufd.vfio_ioas ...
# # OK iommufd.vfio_ioas
# ok 11 iommufd.vfio_ioas
# # RUN iommufd_ioas.no_domain.ioas_auto_destroy ...
# # OK iommufd_ioas.no_domain.ioas_auto_destroy
# ok 12 iommufd_ioas.no_domain.ioas_auto_destroy
# # RUN iommufd_ioas.no_domain.ioas_destroy ...
# # OK iommufd_ioas.no_domain.ioas_destroy
# ok 13 iommufd_ioas.no_domain.ioas_destroy
# # RUN iommufd_ioas.no_domain.alloc_hwpt_nested ...
# # OK iommufd_ioas.no_domain.alloc_hwpt_nested
# ok 14 iommufd_ioas.no_domain.alloc_hwpt_nested
# # RUN iommufd_ioas.no_domain.hwpt_attach ...
# # iommufd.c:541:hwpt_attach:Expected 2 (2) == errno (22)
# # hwpt_attach: Test failed at step #6
# # FAIL iommufd_ioas.no_domain.hwpt_attach
# not ok 15 iommufd_ioas.no_domain.hwpt_attach
# # RUN iommufd_ioas.no_domain.ioas_area_destroy ...
# # OK iommufd_ioas.no_domain.ioas_area_destroy
# ok 16 iommufd_ioas.no_domain.ioas_area_destroy
# # RUN iommufd_ioas.no_domain.ioas_area_auto_destroy ...
# # OK iommufd_ioas.no_domain.ioas_area_auto_destroy
# ok 17 iommufd_ioas.no_domain.ioas_area_auto_destroy
# # RUN iommufd_ioas.no_domain.get_hw_info ...
# # OK iommufd_ioas.no_domain.get_hw_info
# ok 18 iommufd_ioas.no_domain.get_hw_info
# # RUN iommufd_ioas.no_domain.area ...
# # OK iommufd_ioas.no_domain.area
# ok 19 iommufd_ioas.no_domain.area
# # RUN iommufd_ioas.no_domain.unmap_fully_contained_areas ...
# # OK iommufd_ioas.no_domain.unmap_fully_contained_areas
# ok 20 iommufd_ioas.no_domain.unmap_fully_contained_areas
# # RUN iommufd_ioas.no_domain.area_auto_iova ...
# # OK iommufd_ioas.no_domain.area_auto_iova
# ok 21 iommufd_ioas.no_domain.area_auto_iova
# # RUN iommufd_ioas.no_domain.area_allowed ...
# # OK iommufd_ioas.no_domain.area_allowed
# ok 22 iommufd_ioas.no_domain.area_allowed
# # RUN iommufd_ioas.no_domain.copy_area ...
# # OK iommufd_ioas.no_domain.copy_area
# ok 23 iommufd_ioas.no_domain.copy_area
# # RUN iommufd_ioas.no_domain.iova_ranges ...
# # OK iommufd_ioas.no_domain.iova_ranges
# ok 24 iommufd_ioas.no_domain.iova_ranges
# # RUN iommufd_ioas.no_domain.access_domain_destory ...
# # iommufd.c:916:access_domain_destory:Expected MAP_FAILED (18446744073709551615) != buf (18446744073709551615)
# # access_domain_destory: Test terminated by timeout
# # FAIL iommufd_ioas.no_domain.access_domain_destory
# not ok 25 iommufd_ioas.no_domain.access_domain_destory
# # RUN iommufd_ioas.no_domain.access_pin ...
# # iommufd.c:991:access_pin:Expected 0 (0) == _test_cmd_mock_domain(self->fd, self->ioas_id, &mock_stdev_id, &mock_hwpt_id, ((void *)0)) (-1)
The failing assert seems to be here:
987 /* Add/remove a domain with a user */
988 ASSERT_EQ(0, ioctl(self->fd,
989 _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
990 &access_cmd));
→ 991 test_cmd_mock_domain(self->ioas_id, &mock_stdev_id,
992 &mock_hwpt_id, NULL);
993 check_map_cmd.id = mock_hwpt_id;
994 ASSERT_EQ(0, ioctl(self->fd,
995 _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP),
996 &check_map_cmd));
For those of you who still do not have a clue what went wrong (like myself), I am trying
to generate a reproducer.
attempt to tap gdb on the running ./iommufd gave this:
root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu# gdb ./iommufd --pid 63963
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.1) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./iommufd...
Attaching to program: /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd, process 63963
Reading symbols from /usr/libexec/coreutils/libstdbuf.so...
(No debugging symbols found in /usr/libexec/coreutils/libstdbuf.so)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/c2/89da5071a3399de893d2af81d6a30c62646e1e.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/15/921ea631d9f36502d20459c43e5c85b7d6ab76.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__GI___ioctl (fd=5, request=request@entry=15264) at ../sysdeps/unix/sysv/linux/ioctl.c:36
36 ../sysdeps/unix/sysv/linux/ioctl.c: No such file or directory.
(gdb) bt
#0 __GI___ioctl (fd=5, request=request@entry=15264) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1 0x000057ed23d1f1ae in _test_ioctl_set_temp_memory_limit (limit=65536, fd=<optimized out>) at /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/iommu/iommufd_utils.h:585
#2 iommufd_ioas_teardown (_metadata=_metadata@entry=0x57ed23d42860 <_iommufd_ioas_access_domain_destory_object>, self=self@entry=0x7ffdcdce5ef0, variant=<optimized out>) at iommufd.c:229
#3 0x000057ed23d23b7f in wrapper_iommufd_ioas_access_domain_destory (_metadata=0x57ed23d42860 <_iommufd_ioas_access_domain_destory_object>, variant=0x57ed23d43860 <_iommufd_ioas_mock_domain_object>) at iommufd.c:902
#4 0x000057ed23d1bfe9 in __run_test (f=f@entry=0x57ed23d438a0 <_iommufd_ioas_fixture_object>, variant=variant@entry=0x57ed23d43860 <_iommufd_ioas_mock_domain_object>,
t=t@entry=0x57ed23d42860 <_iommufd_ioas_access_domain_destory_object>) at ../kselftest_harness.h:1134
#5 0x000057ed23d12146 in test_harness_run (argv=0x7ffdcdce61a8, argc=1) at ../kselftest_harness.h:1199
#6 main (argc=1, argv=0x7ffdcdce61a8) at iommufd.c:2349
(gdb) list iommufd.c:2349
2344 &unmap_cmd));
2345 }
2346 }
2347 }
2348
2349 TEST_HARNESS_MAIN
(gdb)
Hope this helps someone.
Best regards,
Mirsad Todorovac
This patch addresses an issue in the selftests/harness where an assertion within FIXTURE_TEARDOWN could trigger an infinite loop. The problem arises because the teardown procedure is meant to execute once, but the presence of failing assertions (ASSERT_EQ(0, 1)) leads to repeated attempts to execute teardown due to the long jump mechanism used by the harness for handling assertions.
To resolve this, the patch ensures that the teardown process runs only once, regardless of assertion outcomes, preventing the infinite loop and allowing tests to fail.
Signed-off-by: Shengyu Li <shengyu.li.evgeny(a)gmail.com>
---
tools/testing/selftests/kselftest_harness.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 4fd735e48ee7..230d62884885 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -383,6 +383,7 @@
FIXTURE_DATA(fixture_name) self; \
pid_t child = 1; \
int status = 0; \
+ bool jmp = false; \
memset(&self, 0, sizeof(FIXTURE_DATA(fixture_name))); \
if (setjmp(_metadata->env) == 0) { \
/* Use the same _metadata. */ \
@@ -399,8 +400,10 @@
_metadata->exit_code = KSFT_FAIL; \
} \
} \
+ else \
+ jmp = true; \
if (child == 0) { \
- if (_metadata->setup_completed && !_metadata->teardown_parent) \
+ if (_metadata->setup_completed && !_metadata->teardown_parent && !jmp) \
fixture_name##_teardown(_metadata, &self, variant->data); \
_exit(0); \
} \
--
2.25.1
Hi,
We have caught bugs in kselftest suites on linux-next and on stable-RCs etc
when using clang. There are two types of bugs (logs with clang-17 are
attached.):
As usually people use GCC, there are GCC-specific flags added to the
Makefiles that clang doesn't recognize. For example:
* clang: error: argument unused during compilation: '-pie'
[-Werror,-Wunused-command-line-argument]
* clang: error: unknown argument '-static-libasan'; did you mean
'-static-libsan'?
* clang: error: cannot specify -o when generating multiple output files
Clang has best static analysis tools. It is reporting static errors. For
example:
* test_execve.c:121:13: warning: variable 'have_outer_privilege' is used
uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
* test_execve.c:121:9: note: remove the 'if' if its condition is always true
* test_memcontrol.c:727:6: warning: variable 'fd' is used uninitialized
whenever 'if' condition is true [-Wsometimes-uninitialized]
We have found these issues through our new KernelCI system when enabling
kselftest and clang there. The new system dashboard is a WIP, so It is not
the web dashboard you are used-to with in KernelCI. We can show you ways of
pulling the data if you are interest into.
Unless the above is some sort of false-positive or misconfiguration, it
would be great to support clang for kselftests. What we can do from our
side is that clang kselftests builds should be enabled on KernelCI to find
and fix the errors. What is your stance about this?
Thanks,
Usama
Currently the options for writing networking tests are C, bash or
some mix of the two. YAML/Netlink gives us the ability to easily
interface with Netlink in higher level laguages. In particular,
there is a Python library already available in tree, under tools/net.
Add the scaffolding which allows writing tests using this library.
The "scaffolding" is needed because the library lives under
tools/net and uses YAML files from under Documentation/.
So we need a small amount of glue code to find those things
and add them to TEST_FILES.
This series adds both a basic SW sanity test and driver
test which can be run against netdevsim or a real device.
When I develop core code I usually test with netdevsim,
then a real device, and then a backport to Meta's kernel.
Because of the lack of integration, until now I had
to throw away the (YNL-based) test script and netdevsim code.
Running tests in tree directly:
$ ./tools/testing/selftests/net/nl_netdev.py
KTAP version 1
1..2
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
in tree via make:
$ make -C tools/testing/selftests/ TARGETS=net \
TEST_PROGS=nl_netdev.py TEST_GEN_PROGS="" run_tests
[ ... ]
and installed externally, all seem to work:
$ make -C tools/testing/selftests/ TARGETS=net \
install INSTALL_PATH=/tmp/ksft-net
$ /tmp/ksft-net/run_kselftest.sh -t net:nl_netdev.py
[ ... ]
For driver tests I followed the lead of net/forwarding and
get the device name from env and/or a config file.
Jakub Kicinski (7):
netlink: specs: define ethtool header flags
tools: ynl: copy netlink error to NlError
selftests: net: add scaffolding for Netlink tests in Python
selftests: nl_netdev: add a trivial Netlink netdev test
netdevsim: report stats by default, like a real device
selftests: drivers: add scaffolding for Netlink tests in Python
testing: net-drv: add a driver test for stats reporting
Documentation/netlink/specs/ethtool.yaml | 5 +
drivers/net/netdevsim/ethtool.c | 11 ++
drivers/net/netdevsim/netdev.c | 45 +++++++
tools/net/ynl/lib/ynl.py | 3 +-
tools/testing/selftests/Makefile | 8 ++
tools/testing/selftests/drivers/net/Makefile | 7 ++
.../testing/selftests/drivers/net/README.rst | 30 +++++
.../selftests/drivers/net/lib/py/__init__.py | 17 +++
.../selftests/drivers/net/lib/py/env.py | 41 ++++++
tools/testing/selftests/drivers/net/stats.py | 85 +++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/lib/Makefile | 8 ++
.../testing/selftests/net/lib/py/__init__.py | 7 ++
tools/testing/selftests/net/lib/py/consts.py | 9 ++
tools/testing/selftests/net/lib/py/ksft.py | 96 ++++++++++++++
tools/testing/selftests/net/lib/py/nsim.py | 118 ++++++++++++++++++
tools/testing/selftests/net/lib/py/utils.py | 47 +++++++
tools/testing/selftests/net/lib/py/ynl.py | 49 ++++++++
tools/testing/selftests/net/nl_netdev.py | 24 ++++
19 files changed, 610 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/drivers/net/Makefile
create mode 100644 tools/testing/selftests/drivers/net/README.rst
create mode 100644 tools/testing/selftests/drivers/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/drivers/net/lib/py/env.py
create mode 100755 tools/testing/selftests/drivers/net/stats.py
create mode 100644 tools/testing/selftests/net/lib/Makefile
create mode 100644 tools/testing/selftests/net/lib/py/__init__.py
create mode 100644 tools/testing/selftests/net/lib/py/consts.py
create mode 100644 tools/testing/selftests/net/lib/py/ksft.py
create mode 100644 tools/testing/selftests/net/lib/py/nsim.py
create mode 100644 tools/testing/selftests/net/lib/py/utils.py
create mode 100644 tools/testing/selftests/net/lib/py/ynl.py
create mode 100755 tools/testing/selftests/net/nl_netdev.py
--
2.44.0
Add specification for test metadata to the KTAP v2 spec.
KTAP v1 only specifies the output format of very basic test information:
test result and test name. Any additional test information either gets
added to general diagnostic data or is not included in the output at all.
The purpose of KTAP metadata is to create a framework to include and
easily identify additional important test information in KTAP.
KTAP metadata could include any test information that is pertinent for
user interaction before or after the running of the test. For example,
the test file path or the test speed.
Since this includes a large variety of information, this specification
will recognize notable types of KTAP metadata to ensure consistent format
across test frameworks. See the full list of types in the specification.
Example of KTAP Metadata:
KTAP version 2
#:ktap_test: main
#:ktap_arch: uml
1..1
KTAP version 2
#:ktap_test: suite_1
#:ktap_subsystem: example
#:ktap_test_file: lib/test.c
1..2
ok 1 test_1
#:ktap_test: test_2
#:ktap_speed: very_slow
# test_2 has begun
#:custom_is_flaky: true
ok 2 test_2
# suite_1 has passed
ok 1 suite_1
The changes to the KTAP specification outline the format, location, and
different types of metadata.
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Note this version is in reponse to comments made off the list asking for
more explanation on inheritance and edge cases.
Changes since v3:
- Add two metadata ktap_config and ktap_id
- Add section on metadata inheritance
- Add edge case examples
Documentation/dev-tools/ktap.rst | 248 ++++++++++++++++++++++++++++++-
1 file changed, 244 insertions(+), 4 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..55bc43cd5aea 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -17,19 +17,22 @@ KTAP test results describe a series of tests (which may be nested: i.e., test
can have subtests), each of which can contain both diagnostic data -- e.g., log
lines -- and a final result. The test structure and results are
machine-readable, whereas the diagnostic data is unstructured and is there to
-aid human debugging.
+aid human debugging. Since version 2, tests can also contain metadata which
+consists of important supplemental test information and can be
+machine-readable.
+
+KTAP output is built from five different types of lines:
-KTAP output is built from four different types of lines:
- Version lines
- Plan lines
- Test case result lines
- Diagnostic lines
+- Metadata lines
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
-a couple of places (notably the "Subtest" header), which are described where
-relevant later in this document.
+a couple of places, which are described where relevant later in this document.
Version lines
-------------
@@ -166,6 +169,237 @@ even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
+KTAP metadata lines
+-------------------
+
+KTAP metadata lines are used to include and easily identify important
+supplemental test information in KTAP. These lines may appear similar to
+diagnostic lines. The format of metadata lines is below:
+
+.. code-block:: none
+
+ #:<prefix>_<metadata type>: <metadata value>
+
+The <prefix> indicates where to find the specification for the type of
+metadata, such as the name of a test framework or "ktap" to indicate this
+specification. The list of currently approved prefixes and where to find the
+documentation of the metadata types is below. Note any metadata type that does
+not use a prefix from the list below must use the prefix "custom".
+
+Current List of Approved Prefixes:
+
+- ``ktap``: See Types of KTAP Metadata below for the list of metadata types.
+
+The format of <metadata type> and <value> varies based on the type. See the
+individual specification. For "custom" types the <metadata type> can be any
+string excluding ":", spaces, or newline characters and the <value> can be any
+string.
+
+**Location:**
+
+The first KTAP metadata line for a test must be "#:ktap_test: <test name>",
+which acts as a header to associate metadata with the correct test. Metadata
+for the main KTAP level uses the test name "main". A test's metadata ends
+with a "ktap_test" line for a different test.
+
+For test cases, the location of the metadata is between the prior test result
+line and the current test result line. For test suites, the location of the
+metadata is between the suite's version line and test plan line. For the main
+level, the location of the metadata is between the main version line and main
+test plan line. See the example below.
+
+Note that a test case's metadata is inline with the test's result line. Whereas
+a suite's metadata is inline with the suite's version line and thus will be
+more indented than the suite's result line. Additionally, metadata for the main
+level is inline with the main version line.
+
+KTAP metadata for a test does not need to be contiguous. For example, a kernel
+warning or other diagnostic output could interrupt metadata lines. However, it
+is recommended to keep a test's metadata lines in the correct location and
+together when possible, as this improves readability.
+
+**Example of KTAP metadata:**
+
+::
+
+ KTAP version 2
+ #:ktap_test: main
+ #:ktap_arch: uml
+ 1..1
+ KTAP version 2
+ #:ktap_test: suite_1
+ #:ktap_subsystem: example
+ #:ktap_test_file: lib/test.c
+ 1..2
+ # WARNING: test_1 skipped
+ ok 1 test_1 # SKIP
+ #:ktap_test: test_2
+ #:ktap_speed: very_slow
+ # test_2 has begun
+ #:custom_is_flaky: true
+ ok 2 test_2
+ # suite_1 passed
+ ok 1 suite_1
+
+In this example, the tests are running on UML. The test suite "suite_1" is part
+of the subsystem "example" and belongs to the file "lib/test.c". It has
+two subtests, "test_1" and "test_2". The subtest "test_2" has a speed of
+"very_slow" and has been marked with a custom KTAP metadata type called
+"custom_is_flaky" with the value of "true".
+
+**Inheritance of KTAP metadata**
+
+Tests can inherit KTAP metadata. A child test inherits all the parent test's
+KTAP metadata except for directly opposing metadata. For example, if a suite
+has a property of "#:ktap_speed: slow", all child test cases are also marked as
+slow. However, if one of the test cases has metadata of "#:ktap_speed:
+very_slow" then that test case would be marked as very_slow instead and not
+slow.
+
+Note if a test case inherits metadata it does not need to appear as a line in
+the KTAP. Using the example above, not every test case would have the line
+"#:ktap_speed: slow" in their metadata.
+
+**Edge Case Examples of KTAP metadata**
+
+Here are a few edge case examples of KTAP metadata. The first example shows
+metadata in the wrong location.
+
+::
+
+ KTAP version 2
+ 1..1
+ KTAP version 2
+ #:ktap_test: suite_1
+ 1..3
+ ok 1 test_1
+ #:ktap_test: test_2
+ #:ktap_speed: very_slow
+ ok 2 test_2
+ #:ktap_duration: 1.342s
+ #:ktap_test: test_3
+ #:ktap_speed: slow
+ ok 3 test_3
+ ok 1 suite_1
+
+In this example, the metadata "#:ktap_duration: 1.342s" is in the wrong
+location. It was meant to belong to test_2 but was printed late. The location
+of this metadata is not recommended. However, it is allowed because the line is
+still below "#:ktap_test: test_2" and above any other ktap_test lines.
+
+This second example shows metadata in the correct location but without the
+proper header.
+
+::
+
+ KTAP version 2
+ 1..1
+ KTAP version 2
+ #:ktap_test: suite_1
+ 1..2
+ not ok 1 test_1
+ #:ktap_speed: very_slow
+ ok 2 test_2
+ ok 1 suite_1
+
+In this example, the metadata "#:ktap_speed: very_slow" is meant to belong to
+test_2. It is in the correct location but does not fall below a ktap_test line
+for test_2. Instead this metadata might be mistaken for belonging to suite_1
+because it does fall under the ktap_test line for suite_1. This lack of header
+is not allowed.
+
+**Types of KTAP Metadata:**
+
+This is the current list of KTAP metadata types recognized in this
+specification. Note that all of these metadata types are optional (except for
+ktap_test as the KTAP metadata header).
+
+- ``ktap_test``: Name of test (used as header of KTAP metadata). This should
+ match the test name printed in the test result line: "ok 1 [test_name]".
+
+- ``ktap_module``: Name of the module containing the test
+
+- ``ktap_subsystem``: Name of the subsystem being tested
+
+- ``ktap_start_time``: Time tests started in ISO8601 format
+
+ - Example: "#:ktap_start_time: 2024-01-09T13:09:01.990000+00:00"
+
+- ``ktap_duration``: Time taken (in seconds) to execute the test
+
+ - Example: "#:ktap_duration: 10.154s"
+
+- ``ktap_speed``: Category of how fast test runs: "normal", "slow", or
+ "very_slow"
+
+- ``ktap_test_file``: Path to source file containing the test. This metadata
+ line can be repeated if the test is spread across multiple files.
+
+ - Example: "#:ktap_test_file: lib/test.c"
+
+- ``ktap_generated_file``: Description of and path to file generated during
+ test execution. This could be a core dump, generated filesystem image, some
+ form of visual output (for graphics drivers), etc. This metadata line can be
+ repeated to attach multiple files to the test. Note use ktap_log_file or
+ ktap_error_file instead of this type if more applicable.
+
+ - Example: "#:ktap_generated_file: Core dump: /var/lib/systemd/coredump/hello.core"
+
+- ``ktap_log_file``: Path to file containing kernel log test output
+
+ - Example: "#:ktap_log_file: /sys/kernel/debugfs/kunit/example/results"
+
+- ``ktap_error_file``: Path to file containing context for test failure or
+ error. This could include the difference between optimal test output and
+ actual test output.
+
+ - Example: "#:ktap_error_file: fs/results/example.out.bad"
+
+- ``ktap_results_url``: Link to webpage describing this test run and its
+ results
+
+ - Example: "#:ktap_results_url: https://kcidb.kernelci.org/hello"
+
+- ``ktap_arch``: Architecture used during test run
+
+ - Example: "#:ktap_arch: x86_64"
+
+- ``ktap_compiler``: Compiler used during test run
+
+ - Example: "#:ktap_compiler: gcc (GCC) 10.1.1 20200507 (Red Hat 10.1.1-1)"
+
+- ``ktap_respository_url``: Link to git repository of the checked out code.
+
+ - Example: "#:ktap_respository_url: https://github.com/torvalds/linux.git"
+
+- ``ktap_git_branch``: Name of git branch of checked out code
+
+ - Example: "#:ktap_git_branch: kselftest/kunit"
+
+- ``ktap_kernel_version``: Version of Linux Kernel being used during test run
+
+ - Example: "#:ktap_kernel_version: 6.7-rc1"
+
+- ``ktap_config``: Config name and value. This does not necessarly need to be
+ restricted to Kconfig.
+
+ - Example: "#:ktap_config: CONFIG_SYSFS=y"
+
+- ``ktap_id``: Description of ID and ID value. This is an open-ended metadata
+ used for IDs, such as checkout id or test run id.
+
+ - Example: "#:ktap_id: Test run id: 14e782"
+
+- ``ktap_commit_hash``: The full git commit hash of the checked out base code.
+
+ - Example: "#:ktap_commit_hash: 064725faf8ec2e6e36d51e22d3b86d2707f0f47f"
+
+**Other Metadata Types:**
+
+There can also be KTAP metadata that is not included in the recognized list
+above. This metadata must be prefixed with the test framework, ie. "kselftest",
+or with the prefix "custom". For example, "# custom_batch: 20".
+
Unknown lines
-------------
@@ -206,6 +440,7 @@ An example of a test with two nested subtests:
KTAP version 2
1..1
KTAP version 2
+ #:ktap_test: example
1..2
ok 1 test_1
not ok 2 test_2
@@ -219,6 +454,7 @@ An example format with multiple levels of nested testing:
KTAP version 2
1..2
KTAP version 2
+ #:ktap_test: example_test_1
1..2
KTAP version 2
1..2
@@ -254,6 +490,7 @@ Example KTAP output
KTAP version 2
1..1
KTAP version 2
+ #:ktap_test: main_test
1..3
KTAP version 2
1..1
@@ -261,11 +498,14 @@ Example KTAP output
ok 1 test_1
ok 1 example_test_1
KTAP version 2
+ #:ktap_test: example_test_2
+ #:ktap_speed: slow
1..2
ok 1 test_1 # SKIP test_1 skipped
ok 2 test_2
ok 2 example_test_2
KTAP version 2
+ #:ktap_test: example_test_3
1..3
ok 1 test_1
# test_2: FAIL
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.44.0.478.gd926399ef9-goog
The commit e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in
the dirty ring logging test") backported the fix from v6.8 to stable
v6.1. However, since the patch uses 'TEST_ASSERT_EQ()', which doesn't
exist on v6.1, the following build error is seen:
dirty_log_test.c:775:2: error: call to undeclared function
'TEST_ASSERT_EQ'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
TEST_ASSERT_EQ(sem_val, 0);
^
1 error generated.
Replace the macro with its equivalent, 'ASSERT_EQ()' to fix the issue.
Fixes: e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com>
---
tools/testing/selftests/kvm/dirty_log_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ec40a33c29fd..711b9e4d86aa 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -772,9 +772,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
* verification of all iterations.
*/
sem_getvalue(&sem_vcpu_stop, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
sem_getvalue(&sem_vcpu_cont, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
base-commit: e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
--
2.44.0.478.gd926399ef9-goog
The commit e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in
the dirty ring logging test") backported the fix from v6.8 to stable
v6.1. However, since the patch uses 'TEST_ASSERT_EQ()', which doesn't
exist on v6.1, the following build error is seen:
dirty_log_test.c:775:2: error: call to undeclared function
'TEST_ASSERT_EQ'; ISO C99 and later do not support implicit function
declarations [-Wimplicit-function-declaration]
TEST_ASSERT_EQ(sem_val, 0);
^
1 error generated.
Replace the macro with its equivalent, 'ASSERT_EQ()' to fix the issue.
Fixes: e5ed6c922537 ("KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com>
Change-Id: I52c2c28d962e482bb4f40f285229a2465ed59d7e
---
tools/testing/selftests/kvm/dirty_log_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ec40a33c29fd..711b9e4d86aa 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -772,9 +772,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
* verification of all iterations.
*/
sem_getvalue(&sem_vcpu_stop, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
sem_getvalue(&sem_vcpu_cont, &sem_val);
- TEST_ASSERT_EQ(sem_val, 0);
+ ASSERT_EQ(sem_val, 0);
pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
base-commit: e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
--
2.44.0.478.gd926399ef9-goog
As discussed in the LKML thread [1], the asynchronous nature of cpuset
hotplug handling code is causing problem with RCU testing. With recent
changes in the way locking is being handled in the cpuset code, it is
now possible to make the cpuset hotplug code synchronous again without
major changes.
This series enables the hotplug code to call directly into cpuset hotplug
core without indirection with the exception of the special case of v1
cpuset becoming empty still being handled indirectly with workqueue.
A new simple test case was also written to test this special v1 cpuset
case. The test_cpuset_prs.sh script was also run with LOCKDEP on to
verify that there is no regression.
[1] https://lore.kernel.org/lkml/ZgYikMb5kZ7rxPp6@slm.duckdns.org/
Waiman Long (2):
cgroup/cpuset: Make cpuset hotplug processing synchronous
cgroup/cpuset: Add test_cpuset_v1_hp.sh
include/linux/cpuset.h | 3 -
kernel/cgroup/cpuset.c | 131 +++++++-----------
kernel/cpu.c | 48 -------
kernel/power/process.c | 2 -
tools/testing/selftests/cgroup/Makefile | 2 +-
.../selftests/cgroup/test_cpuset_v1_hp.sh | 40 ++++++
6 files changed, 88 insertions(+), 138 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
--
2.39.3
In a follow up to these patches,
- commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect")
- commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()")
- commit c889a99a21bf("net: prevent address rewrite in kernel_bind()")
- commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg")
this patch series introduces BPF selftests that test the interaction
between BPF sockaddr hooks and socket operations in kernel space. It
focuses on regression test coverage to ensure that these operations do not
overwrite their address parameter and also provides some sanity checks
around kernel_getpeername() and kernel_getsockname().
It introduces two new components: a kernel module called sock_addr_testmod
and a new test program called sock_addr_kern which is loosely modeled after
and adapted from the old-style bpf/test_sock_addr.c selftest. When loaded,
the kernel module will perform some socket operation in kernel space. The
kernel module accepts five parameters controlling which socket operation
will be performed and its inputs:
MODULE_PARM_DESC(ip, "IPv4/IPv6/Unix address to use for socket operation");
MODULE_PARM_DESC(port, "Port number to use for socket operation");
MODULE_PARM_DESC(af, "Address family (AF_INET, AF_INET6, or AF_UNIX)");
MODULE_PARM_DESC(type, "Socket type (SOCK_STREAM or SOCK_DGRAM)");
MODULE_PARM_DESC(op, "Socket operation (BIND=0, CONNECT=1, SENDMSG=2)");
On module init, the socket operation is performed and results of are
exposed through debugfs.
- /sys/kernel/debug/sock_addr_testmod/success
Indicates success or failure of the operation.
- /sys/kernel/debug/sock_addr_testmod/addr
The value of the address parameter after the operation.
- /sys/kernel/debug/sock_addr_testmod/sock_name
The value of kernel_getsockname() after the socket operation (if relevant).
- /sys/kernel/debug/sock_addr_testmod/peer_name
The value of kernel_getpeername(() after the socket operation (if relevant).
The sock_addr_kern test program loads and unloads the kernel module to
drive kernel socket operations, reads the results from debugfs, makes sure
that the operation did not overwrite the address, and any result from
kernel_getpeername() or kernel_getsockname() were as expected.
== Patches ==
- Patch 1 introduces sock_addr_testmod and functions necessary for the test
program to load and unload the module.
- Patches 2-6 transform existing test helpers and introduce new test helpers
to enable the sock_addr_kern test program.
- Patch 7 implements the sock_addr_kern test program.
- Patch 8 fixes the sock_addr bind test program to work for big endian
architectures such as s390x.
Jordan Rife (8):
selftests/bpf: Introduce sock_addr_testmod
selftests/bpf: Add module load helpers
selftests/bpf: Factor out cmp_addr
selftests/bpf: Add recv_msg_from_client to network helpers
selftests/bpf: Factor out load_path and defines from test_sock_addr
selftests/bpf: Add setup/cleanup subcommands
selftests/bpf: Add sock_addr_kern prog_test
selftests/bpf: Fix bind program for big endian systems
tools/testing/selftests/bpf/Makefile | 46 +-
tools/testing/selftests/bpf/network_helpers.c | 65 ++
tools/testing/selftests/bpf/network_helpers.h | 5 +
.../selftests/bpf/prog_tests/sock_addr.c | 34 -
.../selftests/bpf/prog_tests/sock_addr_kern.c | 631 ++++++++++++++++++
.../testing/selftests/bpf/progs/bind4_prog.c | 18 +-
.../testing/selftests/bpf/progs/bind6_prog.c | 18 +-
tools/testing/selftests/bpf/progs/bind_prog.h | 19 +
.../testing/selftests/bpf/sock_addr_helpers.c | 46 ++
.../testing/selftests/bpf/sock_addr_helpers.h | 44 ++
.../bpf/sock_addr_testmod/.gitignore | 6 +
.../selftests/bpf/sock_addr_testmod/Makefile | 20 +
.../bpf/sock_addr_testmod/sock_addr_testmod.c | 256 +++++++
tools/testing/selftests/bpf/test_sock_addr.c | 76 +--
tools/testing/selftests/bpf/test_sock_addr.sh | 10 +-
tools/testing/selftests/bpf/testing_helpers.c | 44 +-
tools/testing/selftests/bpf/testing_helpers.h | 2 +
17 files changed, 1196 insertions(+), 144 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_addr_kern.c
create mode 100644 tools/testing/selftests/bpf/progs/bind_prog.h
create mode 100644 tools/testing/selftests/bpf/sock_addr_helpers.c
create mode 100644 tools/testing/selftests/bpf/sock_addr_helpers.h
create mode 100644 tools/testing/selftests/bpf/sock_addr_testmod/.gitignore
create mode 100644 tools/testing/selftests/bpf/sock_addr_testmod/Makefile
create mode 100644 tools/testing/selftests/bpf/sock_addr_testmod/sock_addr_testmod.c
--
2.44.0.478.gd926399ef9-goog
There is a spelling mistake in a ksft_test_result_skip message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mm/ksm_functional_tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
index 2d277620fad2..db845dca8d19 100644
--- a/tools/testing/selftests/mm/ksm_functional_tests.c
+++ b/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -502,7 +502,7 @@ static void test_child_ksm_err(int status)
else if (status == -2)
ksft_test_result_fail("Merge in child failed\n");
else if (status == -3)
- ksft_test_result_skip("Merge in child skiped\n");
+ ksft_test_result_skip("Merge in child skipped\n");
}
/* Verify that prctl ksm flag is inherited. */
--
2.39.2
Here are two fixes related to MPTCP.
The first patch fixes when the MPTcpExtMPCapableFallbackACK MIB counter
is modified: it should only be incremented when a connection was using
MPTCP options, but then a fallback to TCP has been done. This patch also
checks the counter is not incremented by mistake during the connect
selftests. This counter was wrongly incremented since its introduction
in v5.7.
The second patch fixes a wrong parsing of the 'dev' endpoint options in
the selftests: the wrong variable was used. This option was not used
before, but it is going to be soon. This issue is visible since v5.18.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Davide Caratti (1):
mptcp: don't account accept() of non-MPC client as fallback to TCP
Geliang Tang (1):
selftests: mptcp: join: fix dev in check_endpoint
net/mptcp/protocol.c | 2 --
net/mptcp/subflow.c | 2 ++
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++-
4 files changed, 14 insertions(+), 3 deletions(-)
---
base-commit: 0ba80d96585662299d4ea4624043759ce9015421
change-id: 20240329-upstream-net-20240329-fallback-mib-b0fec9c6189b
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The netdev CI runs in a VM and captures serial, so stdout and
stderr get combined. Because there's a missing new line in
stderr the test ends up corrupting KTAP:
# Successok 1 selftests: net: reuseaddr_conflict
which should have been:
# Success
ok 1 selftests: net: reuseaddr_conflict
Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
Low risk and seems worth backporting to stable, hence the fixes tag.
CC: shuah(a)kernel.org
CC: jbacik(a)fb.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/net/reuseaddr_conflict.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/reuseaddr_conflict.c b/tools/testing/selftests/net/reuseaddr_conflict.c
index 7c5b12664b03..bfb07dc49518 100644
--- a/tools/testing/selftests/net/reuseaddr_conflict.c
+++ b/tools/testing/selftests/net/reuseaddr_conflict.c
@@ -109,6 +109,6 @@ int main(void)
fd1 = open_port(0, 1);
if (fd1 >= 0)
error(1, 0, "Was allowed to create an ipv4 reuseport on an already bound non-reuseport socket with no ipv6");
- fprintf(stderr, "Success");
+ fprintf(stderr, "Success\n");
return 0;
}
--
2.44.0
I had sent RFC patchset early this year (January) [7] to enable CPU assisted
control-flow integrity for usermode on riscv. Since then I've been able to do
more testing of the changes. As part of testing effort, compiled a rootfs with
shadow stack and landing pad enabled (libraries and binaries) and booted to
shell. As part of long running tests, I have been able to run some spec 2006
benchmarks [8] (here link is provided only for list of benchmarks that were
tested for long running tests, excel sheet provided here actually is for some
static stats like code size growth on spec binaries). Thus converting from RFC
to regular patchset.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach
reach a target that they actually intend to reach.
- Securing backward control flow : All function returns must
return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward
control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
enforces that all indirect calls or jmps must land on a landing pad instr
and label embedded in landing pad instr must match a value programmed in
`x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
which can only be writeable via shadow stack instructions (sspush and
ssamoswap) and thus can't be tampered with via inadvertent stores. More
details about extension can be read from [2] and there are details in
documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
(specifically expected from dynamic loader). There has been a lot of earlier
discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
overall consensus had been to let dynamic loader (or usermode) to decide for
enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv. arm64 is expected
to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time
***********************
Spec changes
------------
- Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
`auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
Thus label width is 20bit.
- Shadow stack management instructions are reduced to
sspush - to push x1/x5 on shadow stack
sspopchk - pops from shadow stack and comapres with x1/x5.
ssamoswap - atomically swap value on shadow stack.
rdssp - reads current shadow stack pointer
- Shadow stack accesses on readonly memory always raise AMO/store page fault.
`sspopchk` is load but if underlying page is readonly, it'll raise a store
page fault. It simplifies hardware and kernel for COW handling for shadow
stack pages.
- riscv defines a new exception type `software check exception` and control flow
violations raise software check exception.
- enabling controls for shadow stack and landing are in xenvcfg CSR and controls
lower privilege mode enabling. As an example senvcfg controls enabling for U and
menvcfg controls enabling for S mode.
core mm shadow stack enabling
-----------------------------
Shadow stack for x86 usermode are now in mainline and thus this patch
series builds on top of that for arch-agnostic mm related changes. Big
thanks and shout out to Rick Edgecombe for that.
selftests
---------
Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
[2] - https://github.com/riscv/riscv-cfi
[3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#m…
[4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.co…
[5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRh…
[6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kerne…
[7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/
[8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_i…
Deepak Gupta (26):
riscv: envcfg save and restore on task switching
riscv: define default value for envcfg
riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv
riscv: zicfiss/zicfilp enumeration
riscv: zicfiss/zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap
entry/exit
mm: Define VM_SHADOW_STACK for RISC-V
mm: abstract shadow stack vma behind `arch_is_shadow_stack`
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
prctl: arch-agnostic prtcl for indirect branch tracking
riscv: Implements arch agnostic shadow stack prctls
riscv: Implements arch argnostic indirect branch tracking prctls
riscv/kernel: update __show_regs to print shadow stack register
riscv/traps: Introduce software check exception
riscv sigcontext: adding cfi state field in sigcontext
riscv signal: Save and restore of shadow stack for signal
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Mark Brown (1):
prctl: arch-agnostic prctl for shadow stack
Documentation/arch/riscv/zicfilp.rst | 104 ++++
Documentation/arch/riscv/zicfiss.rst | 169 ++++++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig | 19 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/cpufeature.h | 13 +
arch/riscv/include/asm/csr.h | 18 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 24 +
arch/riscv/include/asm/pgtable.h | 32 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/switch_to.h | 10 +
arch/riscv/include/asm/thread_info.h | 4 +
arch/riscv/include/asm/usercfi.h | 118 ++++
arch/riscv/include/uapi/asm/ptrace.h | 18 +
arch/riscv/include/uapi/asm/sigcontext.h | 5 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/asm-offsets.c | 4 +
arch/riscv/kernel/cpufeature.c | 2 +
arch/riscv/kernel/entry.S | 29 +
arch/riscv/kernel/process.c | 35 +-
arch/riscv/kernel/ptrace.c | 83 +++
arch/riscv/kernel/signal.c | 45 ++
arch/riscv/kernel/sys_riscv.c | 11 +
arch/riscv/kernel/traps.c | 38 ++
arch/riscv/kernel/usercfi.c | 510 ++++++++++++++++++
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 21 +
include/linux/mm.h | 35 +-
include/uapi/asm-generic/mman.h | 1 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 49 ++
kernel/sys.c | 60 +++
mm/gup.c | 5 +-
mm/internal.h | 2 +-
mm/mmap.c | 1 +
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/Makefile | 10 +
.../testing/selftests/riscv/cfi/cfi_rv_test.h | 85 +++
.../selftests/riscv/cfi/riscv_cfi_test.c | 91 ++++
.../testing/selftests/riscv/cfi/shadowstack.c | 376 +++++++++++++
.../testing/selftests/riscv/cfi/shadowstack.h | 39 ++
42 files changed, 2077 insertions(+), 11 deletions(-)
create mode 100644 Documentation/arch/riscv/zicfilp.rst
create mode 100644 Documentation/arch/riscv/zicfiss.rst
create mode 100644 arch/riscv/include/asm/mman.h
create mode 100644 arch/riscv/include/asm/usercfi.h
create mode 100644 arch/riscv/kernel/usercfi.c
create mode 100644 tools/testing/selftests/riscv/cfi/Makefile
create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h
create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c
create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
--
2.43.2
For now, the BPF program of type BPF_PROG_TYPE_TRACING is not allowed to
be attached to multiple hooks, and we have to create a BPF program for
each kernel function, for which we want to trace, even through all the
program have the same (or similar) logic. This can consume extra memory,
and make the program loading slow if we have plenty of kernel function to
trace.
In this series, we add the support to allow attaching a tracing BPF
program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI.
In the 1st patch, we add the support to record index of the accessed
function args of the target for tracing program. Meanwhile, we add the
function btf_check_func_part_match() to compare the accessed function args
of two function prototype. This function will be used in the next commit.
In the 2nd patch, we refactor the struct modules_array to ptr_array, as
we need similar function to hold the target btf, target program and kernel
modules that we reference to in the following commit.
In the 3rd patch, we introduce the struct bpf_tramp_link_conn to be the
bridge between bpf_link and trampoline, as the releation between bpf_link
and trampoline is not one-to-one anymore.
In the 4th patch, we add the struct bpf_tramp_multi_link and
bpf_trampoline_multi_{link,unlink}_prog for multi-link of trampoline.
In the 5th patch, we add target btf to the function args of
bpf_check_attach_target(), then the caller can specify the btf to check.
The 6th patch is the main part to add multi-link supporting for tracing.
For now, only the following attach type is supported:
BPF_TRACE_FENTRY_MULTI
BPF_TRACE_FEXIT_MULTI
BPF_MODIFY_RETURN_MULTI
The attach type of BPF_TRACE_RAW_TP has different link type, so we skip
this part in this series for now.
In the 7th and 8th patches, we add multi-link supporting of tracing to
libbpf. Note that we don't free btfs that we load after the bpf programs
are loaded into the kernel now if any programs of type tracing multi-link
existing, as we need to lookup the btf types during attaching.
In the 9th patch, we add the testcases for this series.
Changes since v1:
- According to the advice of Alexei, introduce multi-link for tracing
instead of attaching a tracing program to multiple trampolines with
creating multi instance of bpf_link.
Menglong Dong (9):
bpf: tracing: add support to record and check the accessed args
bpf: refactor the modules_array to ptr_array
bpf: trampoline: introduce struct bpf_tramp_link_conn
bpf: trampoline: introduce bpf_tramp_multi_link
bpf: verifier: add btf to the function args of bpf_check_attach_target
bpf: tracing: add multi-link support
libbpf: don't free btf if program of multi-link tracing existing
libbpf: add support for the multi-link of tracing
selftests/bpf: add testcases for multi-link of tracing
arch/arm64/net/bpf_jit_comp.c | 4 +-
arch/riscv/net/bpf_jit_comp64.c | 4 +-
arch/s390/net/bpf_jit_comp.c | 4 +-
arch/x86/net/bpf_jit_comp.c | 4 +-
include/linux/bpf.h | 51 ++-
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 10 +
kernel/bpf/bpf_struct_ops.c | 2 +-
kernel/bpf/btf.c | 113 ++++-
kernel/bpf/syscall.c | 425 +++++++++++++++++-
kernel/bpf/trampoline.c | 97 +++-
kernel/bpf/verifier.c | 24 +-
kernel/trace/bpf_trace.c | 48 +-
net/bpf/test_run.c | 3 +
net/core/bpf_sk_storage.c | 2 +
tools/bpf/bpftool/common.c | 3 +
tools/include/uapi/linux/bpf.h | 10 +
tools/lib/bpf/bpf.c | 10 +
tools/lib/bpf/bpf.h | 6 +
tools/lib/bpf/libbpf.c | 215 ++++++++-
tools/lib/bpf/libbpf.h | 16 +
tools/lib/bpf/libbpf.map | 2 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 49 ++
.../bpf/prog_tests/tracing_multi_link.c | 153 +++++++
.../selftests/bpf/progs/tracing_multi_test.c | 209 +++++++++
25 files changed, 1366 insertions(+), 99 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi_link.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_test.c
--
2.39.2
By default, HLT instruction executed by guest is intercepted by hypervisor.
However, KVM_CAP_X86_DISABLE_EXITS capability can be used to not intercept
HLT by setting KVM_X86_DISABLE_EXITS_HLT.
By default, vms are created with in-kernel APIC support in KVM selftests.
VM needs to be created without in-kernel APIC support for this test, so
that HLT will exit to userspace. To do so, __vm_create() is modified to not
call KVM_CREATE_IRQCHIP ioctl while creating vm.
Add a test case to test KVM_X86_DISABLE_EXITS_HLT functionality.
Patch 1, 2 -> Preparatory patches to add the KVM_X86_DISABLE_EXITS_HLT test
case
Patch 3 -> Adds a test case for KVM_X86_DISABLE_EXITS_HLT
Testing done:
Tested KVM_X86_DISABLE_EXITS_HLT test case on AMD and Intel machines.
Manali Shukla (3):
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Change __vm_create() to create a vm without in-kernel
APIC
KVM: selftests: Add a test case for KVM_X86_DISABLE_EXITS_HLT
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/dirty_log_test.c | 2 +-
.../selftests/kvm/include/kvm_util_base.h | 4 +-
.../selftests/kvm/include/x86_64/processor.h | 17 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +-
.../kvm/x86_64/halt_disable_exit_test.c | 113 ++++++++++++++++++
.../kvm/x86_64/ucna_injection_test.c | 2 +-
7 files changed, 143 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/halt_disable_exit_test.c
base-commit: e9da6f08edb0bd4c621165496778d77a222e1174
--
2.34.1
The mremap_test, in a worst case controlled by the -t flag, does a for loop
iteration in orders of GB. Without compromising on the stdout report, the aim
is to reduce this time.
A pre-filled random buffer is allocated based on the seed, replacing repetitive
rand() calls. The byte pattern in the memory locations is set through memcpy()
from the random buffer.
Replacing the loop for printing the mismatch index to stdout, employ an
efficient algorithm by breaking the comparison into chunks, use the highly
optimized memcmp() library function, and when a mismatch does occur, only
then do a brute force iteration.
Also, use sscanf() to parse /proc/self/maps for consistency across files.
Execution time results (x86 system):
./mremap_test
Original: 3 seconds
After change: 0.8 seconds
./mremap_test -t100
Original: 17 seconds
After change: 2 seconds
./mremap_test -t0 (worst case):
Original: 9:40 minutes
After change: 45 seconds
Dev Jain (3):
selftests/mm: mremap_test: Optimize using pre-filled random array and
memcpy
selftests/mm: mremap_test: Optimize execution time from minutes to
seconds using chunkwise memcmp
selftests/mm: mremap_test: Use sscanf to parse /proc/self/maps
tools/testing/selftests/mm/mremap_test.c | 204 +++++++++++++++++------
1 file changed, 153 insertions(+), 51 deletions(-)
--
2.34.1
Hi Linus,
Please pull the following kunit fixes update for Linux 6.9rc2.
This kunit update for Linux 6.9-rc2 consists of one urgent fix for
--alltests build failure related to renaming of CONFIG_DAMON_DBGFS
to DAMON_DBGFS_DEPRECATED to the missing config option.
This is one of the two fixes to --alltests breakage. The other
one is in the following PR from net:
https://lore.kernel.org/netdev/20240328143117.26574-1-pabeni@redhat.com
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4cece764965020c22cff7665b18a012006359095:
Linux 6.9-rc1 (2024-03-24 14:10:05 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-fixes-6.9-rc2
for you to fetch changes up to cfedfb24c9ddee2bf1641545f6e9b6a02b924aee:
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests (2024-03-28 11:47:30 -0600)
----------------------------------------------------------------
linux_kselftest-kunit-fixes-6.9-rc2
This kunit update for Linux 6.9-rc2 consists of one urgent fix for
--alltests build failure related to renaming of CONFIG_DAMON_DBGFS
to DAMON_DBGFS_DEPRECATED to the missing config option.
----------------------------------------------------------------
David Gow (1):
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
tools/testing/kunit/configs/all_tests.config | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.9rc2.
This kselftest fixes update for Linux 6.9-rc2 consists of fixes
to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4cece764965020c22cff7665b18a012006359095:
Linux 6.9-rc1 (2024-03-24 14:10:05 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.9-rc2
for you to fetch changes up to 224fe424c356cb5c8f451eca4127f32099a6f764:
selftests: dmabuf-heap: add config file for the test (2024-03-29 13:57:14 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.9-rc2
This kselftest fixes update for Linux 6.9-rc2 consists of fixes
to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage.
----------------------------------------------------------------
Mark Brown (1):
selftests/seccomp: Try to fit runtime of benchmark into timeout
Mark Rutland (1):
selftests/ftrace: Fix event filter target_func selection
Muhammad Usama Anjum (1):
selftests: dmabuf-heap: add config file for the test
tools/testing/selftests/dmabuf-heaps/config | 3 +++
tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
tools/testing/selftests/seccomp/settings | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/dmabuf-heaps/config
----------------------------------------------------------------
The test results reported for the clone3_set_tid tests interact poorly with
automation for running kselftest since the reported test names include TIDs
dynamically allocated at runtime. A lot of automation for running kselftest
will compare runs by looking at the test name to identify if the same test
is being run so changing names make it look like the testsuite has been
updated to include new tests. This makes the results display less clearly
and breaks cases like bisection.
Address this by providing a brief description of the tests and logging that
along with the stable parameters for the test currently logged. The TIDs
are already logged separately in existing logging except for the final test
which has a new log message added. We also tweak the formatting of the
logging of expected/actual values for clarity.
There are still issues with the logging of skipped tests (many are simply
not logged at all when skipped and all are logged with different names) but
these are less disruptive since the skips are all based on not being run as
root, a condition likely to be stable for a given test system.
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase onto v6.9-rc1.
- This is the second release I've posted this for with no changes or
review comments.
- Link to v2: https://lore.kernel.org/r/20240122-kselftest-clone3-set-tid-v2-1-72af5d7dba…
Changes in v2:
- Rebase onto v6.8-rc1.
- Link to v1: https://lore.kernel.org/r/20231115-kselftest-clone3-set-tid-v1-1-c1932591c4…
---
tools/testing/selftests/clone3/clone3_set_tid.c | 117 ++++++++++++++----------
1 file changed, 69 insertions(+), 48 deletions(-)
diff --git a/tools/testing/selftests/clone3/clone3_set_tid.c b/tools/testing/selftests/clone3/clone3_set_tid.c
index ed785afb6077..9ae38733cb6e 100644
--- a/tools/testing/selftests/clone3/clone3_set_tid.c
+++ b/tools/testing/selftests/clone3/clone3_set_tid.c
@@ -114,7 +114,8 @@ static int call_clone3_set_tid(pid_t *set_tid,
return WEXITSTATUS(status);
}
-static void test_clone3_set_tid(pid_t *set_tid,
+static void test_clone3_set_tid(const char *desc,
+ pid_t *set_tid,
size_t set_tid_size,
int flags,
int expected,
@@ -129,17 +130,13 @@ static void test_clone3_set_tid(pid_t *set_tid,
ret = call_clone3_set_tid(set_tid, set_tid_size, flags, expected_pid,
wait_for_it);
ksft_print_msg(
- "[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n",
+ "[%d] clone3() with CLONE_SET_TID %d says: %d - expected %d\n",
getpid(), set_tid[0], ret, expected);
- if (ret != expected)
- ksft_test_result_fail(
- "[%d] Result (%d) is different than expected (%d)\n",
- getpid(), ret, expected);
- else
- ksft_test_result_pass(
- "[%d] Result (%d) matches expectation (%d)\n",
- getpid(), ret, expected);
+
+ ksft_test_result(ret == expected, "%s with %d TIDs and flags 0x%x\n",
+ desc, set_tid_size, flags);
}
+
int main(int argc, char *argv[])
{
FILE *f;
@@ -172,73 +169,91 @@ int main(int argc, char *argv[])
/* Try invalid settings */
memset(&set_tid, 0, sizeof(set_tid));
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
- -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+ -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
/*
* This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
* nested PID namespace.
*/
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, 0 TID",
+ set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
memset(&set_tid, 0xff, sizeof(set_tid));
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL + 1, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL * 2, 0, -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
- -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL * 2 + 1, 0,
+ -EINVAL, 0, 0);
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL * 42, 0, -EINVAL, 0, 0);
/*
* This can actually work if this test running in a MAX_PID_NS_LEVEL - 1
* nested PID namespace.
*/
- test_clone3_set_tid(set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("invalid size, TID all 1s",
+ set_tid, MAX_PID_NS_LEVEL - 1, 0, -EINVAL, 0, 0);
memset(&set_tid, 0, sizeof(set_tid));
/* Try with an invalid PID */
set_tid[0] = 0;
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("valid size, 0 TID",
+ set_tid, 1, 0, -EINVAL, 0, 0);
set_tid[0] = -1;
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("valid size, -1 TID",
+ set_tid, 1, 0, -EINVAL, 0, 0);
/* Claim that the set_tid array actually contains 2 elements. */
- test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("2 TIDs, -1 and 0",
+ set_tid, 2, 0, -EINVAL, 0, 0);
/* Try it in a new PID namespace */
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("valid size, -1 TID",
+ set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
/* Try with a valid PID (1) this should return -EEXIST. */
set_tid[0] = 1;
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, 0, -EEXIST, 0, 0);
+ test_clone3_set_tid("duplicate PID 1",
+ set_tid, 1, 0, -EEXIST, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
/* Try it in a new PID namespace */
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, 0, 0, 0);
+ test_clone3_set_tid("duplicate PID 1",
+ set_tid, 1, CLONE_NEWPID, 0, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
/* pid_max should fail everywhere */
set_tid[0] = pid_max;
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("set TID to maximum",
+ set_tid, 1, 0, -EINVAL, 0, 0);
if (uid == 0)
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("set TID to maximum",
+ set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
else
ksft_test_result_skip("Clone3() with set_tid requires root\n");
@@ -262,10 +277,12 @@ int main(int argc, char *argv[])
/* After the child has finished, its PID should be free. */
set_tid[0] = pid;
- test_clone3_set_tid(set_tid, 1, 0, 0, 0, 0);
+ test_clone3_set_tid("reallocate child TID",
+ set_tid, 1, 0, 0, 0, 0);
/* This should fail as there is no PID 1 in that namespace */
- test_clone3_set_tid(set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("duplicate child TID",
+ set_tid, 1, CLONE_NEWPID, -EINVAL, 0, 0);
/*
* Creating a process with PID 1 in the newly created most nested
@@ -274,7 +291,8 @@ int main(int argc, char *argv[])
*/
set_tid[0] = 1;
set_tid[1] = pid;
- test_clone3_set_tid(set_tid, 2, CLONE_NEWPID, 0, pid, 0);
+ test_clone3_set_tid("create PID 1 in new NS",
+ set_tid, 2, CLONE_NEWPID, 0, pid, 0);
ksft_print_msg("unshare PID namespace\n");
if (unshare(CLONE_NEWPID) == -1)
@@ -284,7 +302,8 @@ int main(int argc, char *argv[])
set_tid[0] = pid;
/* This should fail as there is no PID 1 in that namespace */
- test_clone3_set_tid(set_tid, 1, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("duplicate PID 1",
+ set_tid, 1, 0, -EINVAL, 0, 0);
/* Let's create a PID 1 */
ns_pid = fork();
@@ -295,21 +314,25 @@ int main(int argc, char *argv[])
*/
set_tid[0] = 43;
set_tid[1] = -1;
- test_clone3_set_tid(set_tid, 2, 0, -EINVAL, 0, 0);
+ test_clone3_set_tid("check leak on invalid TID -1",
+ set_tid, 2, 0, -EINVAL, 0, 0);
set_tid[0] = 43;
set_tid[1] = pid;
- test_clone3_set_tid(set_tid, 2, 0, 0, 43, 0);
+ test_clone3_set_tid("check leak on invalid specific TID",
+ set_tid, 2, 0, 0, 43, 0);
ksft_print_msg("Child in PID namespace has PID %d\n", getpid());
set_tid[0] = 2;
- test_clone3_set_tid(set_tid, 1, 0, 0, 2, 0);
+ test_clone3_set_tid("create PID 2 in child NS",
+ set_tid, 1, 0, 0, 2, 0);
set_tid[0] = 1;
set_tid[1] = -1;
set_tid[2] = pid;
/* This should fail as there is invalid PID at level '1'. */
- test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("fail due to invalid TID at level 1",
+ set_tid, 3, CLONE_NEWPID, -EINVAL, 0, 0);
set_tid[0] = 1;
set_tid[1] = 42;
@@ -319,13 +342,15 @@ int main(int argc, char *argv[])
* namespaces. Again assuming this is running in the host's
* PID namespace. Not yet nested.
*/
- test_clone3_set_tid(set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
+ test_clone3_set_tid("fail due to too few active PID NSs",
+ set_tid, 4, CLONE_NEWPID, -EINVAL, 0, 0);
/*
* This should work and from the parent we should see
* something like 'NSpid: pid 42 1'.
*/
- test_clone3_set_tid(set_tid, 3, CLONE_NEWPID, 0, 42, true);
+ test_clone3_set_tid("verify that we have 3 PID NSs",
+ set_tid, 3, CLONE_NEWPID, 0, 42, true);
child_exit(ksft_cnt.ksft_fail);
}
@@ -380,14 +405,10 @@ int main(int argc, char *argv[])
ksft_cnt.ksft_pass += 6 - (ksft_cnt.ksft_fail - WEXITSTATUS(status));
ksft_cnt.ksft_fail = WEXITSTATUS(status);
- if (ns3 == pid && ns2 == 42 && ns1 == 1)
- ksft_test_result_pass(
- "PIDs in all namespaces as expected (%d,%d,%d)\n",
- ns3, ns2, ns1);
- else
- ksft_test_result_fail(
- "PIDs in all namespaces not as expected (%d,%d,%d)\n",
- ns3, ns2, ns1);
+ ksft_print_msg("Expecting PIDs %d, 42, 1\n", pid);
+ ksft_print_msg("Have PIDs in namespaces: %d, %d, %d\n", ns3, ns2, ns1);
+ ksft_test_result(ns3 == pid && ns2 == 42 && ns1 == 1,
+ "PIDs in all namespaces as expected\n");
out:
ret = 0;
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231114-kselftest-clone3-set-tid-c0c35111f18f
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.
The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.
With CONFIG_KUNIT enabled, image size increase with this series applied is
approximately 1%. The image size increase (and with it the functionality
introduced by this series) can be avoided by disabling
CONFIG_KUNIT_SUPPRESS_BACKTRACE.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b…
and offers a more comprehensive solution of the problem discussed there.
Design note:
Function pointers are only added to the __bug_table section if both
CONFIG_KUNIT_SUPPRESS_BACKTRACE and CONFIG_DEBUG_BUGVERBOSE are enabled
to avoid image size increases if CONFIG_KUNIT is disabled. There would be
some benefits to adding those pointers all the time (reduced complexity,
ability to display function names in BUG/WARNING messages). That change,
if desired, can be made later.
Checkpatch note:
Remaining checkpatch errors and warnings were deliberately ignored.
Some are triggered by matching coding style or by comments interpreted
as code, others by assembler macros which are disliked by checkpatch.
Suggestions for improvements are welcome.
Changes since RFC:
- Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests
Changes since v1:
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags
[I retained those tags since there have been no functional changes]
- Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by
default.
Dear Linux folks,
Running the selftests in a QEMU q35 VM, they hang.
```
$ make kselftest
[…]
ok 6 selftests: pidfd: pidfd_getfd_test
# timeout set to 45
# selftests: pidfd: pidfd_setns_test
# TAP version 13
# 1..7
# # Starting 7 tests from 2 test cases.
# # RUN global.setns_einval ...
# # OK global.setns_einval
# ok 1 global.setns_einval
# # RUN current_nsset.invalid_flags ...
# # pidfd_setns_test.c:161:invalid_flags:Expected self->child_pid_exited
(0) > 0 (0)
# # OK current_nsset.invalid_flags
# ok 2 current_nsset.invalid_flags
# # RUN current_nsset.pidfd_exited_child ...
# # pidfd_setns_test.c:161:pidfd_exited_child:Expected
self->child_pid_exited (0) > 0 (0)
# # OK current_nsset.pidfd_exited_child
# ok 3 current_nsset.pidfd_exited_child
# # RUN current_nsset.pidfd_incremental_setns ...
# # pidfd_setns_test.c:161:pidfd_incremental_setns:Expected
self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to user namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to mnt namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to pid namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to uts namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to ipc namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to net namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to cgroup namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to pid_for_children namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:391:pidfd_incremental_setns:Expected
setns(self->child_pidfd1, info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:392:pidfd_incremental_setns:Too many users -
Failed to setns to time namespace of 31570 via pidfd 20
# # pidfd_incremental_setns: Test terminated by timeout
# # FAIL current_nsset.pidfd_incremental_setns
# not ok 4 current_nsset.pidfd_incremental_setns
# # RUN current_nsset.nsfd_incremental_setns ...
# # pidfd_setns_test.c:161:nsfd_incremental_setns:Expected
self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to user namespace of 31577 via nsfd 19
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to mnt namespace of 31577 via nsfd 24
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to pid namespace of 31577 via nsfd 27
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to uts namespace of 31577 via nsfd 30
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to ipc namespace of 31577 via nsfd 33
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to net namespace of 31577 via nsfd 36
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to cgroup namespace of 31577 via nsfd 39
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to pid_for_children namespace of 31577 via nsfd 42
# # pidfd_setns_test.c:427:nsfd_incremental_setns:Expected
setns(self->child_nsfds1[i], info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:428:nsfd_incremental_setns:Too many users -
Failed to setns to time namespace of 31577 via nsfd 45
```
Ctrl + c gets me back to the shell prompt.
```
$ ps auxf
[…]
build 31574 0.0 0.0 2528 1024 pts/0 D 09:57 0:00
./pidfd_setns_test
build 31575 99.9 0.0 2528 1024 pts/0 R 09:57 37:37 \_
./pidfd_setns_test
build 31576 0.0 0.0 0 0 pts/0 Z 09:57 0:00
\_ [pidfd_setns_tes] <defunct>
build 31577 0.0 0.0 0 0 pts/0 Z 09:57 0:00
\_ [pidfd_setns_tes] <defunct>
build 31578 0.0 0.0 2528 512 pts/0 S 09:57 0:00
\_ ./pidfd_setns_test
```
During shutdown, the system waits for that process:
[42733.704347] systemd-shutdown[1]: Waiting for process: 31578
(pidfd_setns_tes)
This is commit 8d025e2092e2 with one patch on top:
$ git log --no-decorate --oneline -2 a2ce022afcbb
a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated
*.mod.c intermediaries
8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
KCSAN is not enabled though.
Kind regards,
Paul
As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:
- The forwarding selftests use a configuration file where names of
interfaces are defined and various variables can be overridden. There
is also forwarding.config.sample that users can use as a template to
refer to when creating the config file. What happens a fair bit is
that users either do not know about this at all, or simply forget, and
are confused by cryptic failures about interfaces that cannot be
created.
In patches #1 - #3 have lib.sh just be the single source of truth with
regards to which variables exist. That includes the topology variables
which were previously only in the sample file, and any "tweak
variables", such as what tools to use, sleep times, etc.
forwarding.config.sample then becomes just a placeholder with a couple
examples. Unless specific HW should be exercised, or specific tools
used, the defaults are usually just fine.
- Several net/forwarding/ selftests (and one net/ one) cannot be run on
veth pairs, they need an actual HW interface to run on. They are
generic in the sense that any capable HW should pass them, which is
why they have been put to net/forwarding/ as opposed to drivers/net/,
but they do not generalize to veth. The fact that these tests are in
net/forwarding/, but still complaining when run, is confusing.
In patches #4 - #6 move these tests to a new directory
drivers/net/hw.
- The following patches extend the codebase to handle well test results
other than pass and fail.
Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
so that tests do not spuriously end up returning non-0 when they
are not supposed to.
In patches #8 - #10, introduce some missing ksft constants, then support
having those constants in RET, and then finally in EXIT_STATUS.
- The traffic scheduler tests generate a large amount of network traffic
to test the behavior of the scheduler. This demands a relatively
high-performance computer. On slow machines, such as with a debugging
kernel, the test would spuriously fail.
It can still be useful to "go through the motions" though, to possibly
catch bugs in setup of the scheduler graph and passing packets around.
Thus we still want to run the tests, just with lowered demands.
To that end, in patches #11 - #12, introduce an environment variable
KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
is provided to mark performance-sensitive parts of the selftest.
- In patch #13, use a similar mechanism to mark a NH group stats
selftest to XFAIL HW stats tests when run on VETH pairs.
- All these changes complicate the hitherto straightforward logging and
checking logic, so in patch #14, add a selftest that checks this
functionality in lib.sh.
v1 (vs. an RFC circulated through linux-kselftest):
- Patch #9:
- Clarify intended usage by s/set_ret/ret_set_ksft_status/,
s/nret/ksft_status/
Petr Machata (14):
selftests: net: libs: Change variable fallback syntax
selftests: forwarding.config.sample: Move overrides to lib.sh
selftests: forwarding: README: Document customization
selftests: forwarding: ipip_lib: Do not import lib.sh
selftests: forwarding: Move several selftests
selftests: forwarding: Ditch skip_on_veth()
selftests: forwarding: Change inappropriate log_test_skip() calls
selftests: lib: Define more kselftest exit codes
selftests: forwarding: Have RET track kselftest framework constants
selftests: forwarding: Convert log_test() to recognize RET values
selftests: forwarding: Support for performance sensitive tests
selftests: forwarding: Mark performance-sensitive tests
selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
selftests: forwarding: Add a test for testing lib.sh functionality
.../testing/selftests/drivers/net/hw/Makefile | 25 ++
.../net/hw}/devlink_port_split.py | 0
.../forwarding => drivers/net/hw}/ethtool.sh | 5 +-
.../net/hw}/ethtool_extended_state.sh | 5 +-
.../net/hw}/ethtool_lib.sh | 0
.../net/hw}/ethtool_mm.sh | 3 +-
.../net/hw}/ethtool_rmon.sh | 7 +-
.../net/hw}/hw_stats_l3.sh | 19 +-
.../net/hw}/hw_stats_l3_gre.sh | 7 +-
.../forwarding => drivers/net/hw}/loopback.sh | 5 +-
.../testing/selftests/drivers/net/hw/settings | 1 +
.../selftests/drivers/net/mlxsw/mlxsw_lib.sh | 2 +-
.../net/mlxsw/spectrum-2/resource_scale.sh | 1 -
.../net/mlxsw/spectrum/resource_scale.sh | 1 -
tools/testing/selftests/net/Makefile | 1 -
.../testing/selftests/net/forwarding/Makefile | 9 +-
tools/testing/selftests/net/forwarding/README | 33 +++
.../net/forwarding/forwarding.config.sample | 53 ++--
.../selftests/net/forwarding/ipip_lib.sh | 1 -
tools/testing/selftests/net/forwarding/lib.sh | 255 +++++++++++++-----
.../selftests/net/forwarding/lib_sh_test.sh | 208 ++++++++++++++
.../net/forwarding/router_mpath_nh_lib.sh | 12 +-
.../selftests/net/forwarding/sch_ets_tests.sh | 19 +-
.../selftests/net/forwarding/sch_red.sh | 10 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 2 +-
.../selftests/net/forwarding/tc_common.sh | 2 +-
.../selftests/net/forwarding/tc_tunnel_key.sh | 2 -
tools/testing/selftests/net/lib.sh | 48 +++-
28 files changed, 565 insertions(+), 171 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/hw/Makefile
rename tools/testing/selftests/{net => drivers/net/hw}/devlink_port_split.py (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool.sh (98%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_extended_state.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_lib.sh (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_mm.sh (99%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_rmon.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3_gre.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/loopback.sh (92%)
create mode 100644 tools/testing/selftests/drivers/net/hw/settings
create mode 100755 tools/testing/selftests/net/forwarding/lib_sh_test.sh
--
2.43.0
Hi,
Commit 28b3df1fe6ba2cb4 ("kunit: add wireless unit tests") which I can't
seem to find on lore breaks full kunit runs on non-UML builds and is now
present in mainline. If I run:
./tools/testing/kunit/kunit.py run --alltests --cross_compile x86_64-linux-gnu- --arch x86_64
on a clean tree then I get:
[15:09:20] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=x86_64 O=.kunit olddefconfig CROSS_COMPILE=x86_64-linux-gnu-
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_IWLWIFI=y, CONFIG_WLAN_VENDOR_INTEL=y
UML works fine, but other real architectures (eg, arm64) seem similarly
broken. I've not looked properly yet, I'm a bit confused given that
there's not even any dependencies for WLAN_VENDOR_INTEL and it's not
mentoned in the defconfig.
Thanks,
Mark
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (13):
queue_api: define queue api
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 256 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/netdevice.h | 3 +
include/linux/skbuff.h | 73 +++-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/netdev_queues.h | 27 ++
include/net/netdev_rx_queue.h | 2 +
include/net/netmem.h | 234 +++++++++-
include/net/page_pool/helpers.h | 155 +++++--
include/net/page_pool/types.h | 33 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 2 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 425 ++++++++++++++++++
net/core/gro.c | 8 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 107 +++++
net/core/page_pool.c | 364 +++++++++-------
net/core/skbuff.c | 110 ++++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 2 +-
net/ipv4/tcp.c | 254 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 9 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 2 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++
46 files changed, 2776 insertions(+), 277 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.44.0.396.g6e790dbe36-goog
Cleaning up after tests is implemented separately for individual tests
and called at the end of each test execution. Since these functions are
very similar and a more generalized test framework was introduced a
function pointer in the resctrl_test struct can be used to reduce the
amount of function calls.
These functions are also all called in the ctrl-c handler because the
handler isn't aware which test is currently running. Since the handler
is implemented with a sigaction no function parameters can be passed
there but information about what test is currently running can be passed
with a global variable.
Series applies cleanly on top of kselftests/next.
Changelog v5:
- Rebase onto kselftests/next.
- Add Reinette's reviewed-by tag.
Changelog v4:
- Check current_test pointer and reset it in signal unregistering.
- Move cleanup call to test_cleanup function.
Changelog v3:
- Make current_test static.
- Add callback NULL check to the ctrl-c handler.
Changelog v2:
- Make current_test a const pointer limited in scope to resctrl_val
file.
- Remove tests_cleanup from resctrl.h.
- Cleanup 'goto out' path and labels in individual test functions.
Older versions of this series:
[v1] https://lore.kernel.org/all/cover.1708434017.git.maciej.wieczor-retman@inte…
[v2] https://lore.kernel.org/all/cover.1708596015.git.maciej.wieczor-retman@inte…
[v3] https://lore.kernel.org/all/cover.1708599491.git.maciej.wieczor-retman@inte…
[v4] https://lore.kernel.org/all/cover.1708949785.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (3):
selftests/resctrl: Add cleanup function to test framework
selftests/resctrl: Simplify cleanup in ctrl-c handler
selftests/resctrl: Move cleanups out of individual tests
tools/testing/selftests/resctrl/cat_test.c | 8 +++-----
tools/testing/selftests/resctrl/cmt_test.c | 4 ++--
tools/testing/selftests/resctrl/mba_test.c | 8 +++-----
tools/testing/selftests/resctrl/mbm_test.c | 8 +++-----
tools/testing/selftests/resctrl/resctrl.h | 9 +++------
.../testing/selftests/resctrl/resctrl_tests.c | 20 +++++++------------
tools/testing/selftests/resctrl/resctrl_val.c | 8 ++++++--
7 files changed, 27 insertions(+), 38 deletions(-)
--
2.43.2
This is required, as CONFIG_DAMON_DEBUGFS is enabled, and --alltests UML
builds will fail due to the missing config option otherwise.
Fixes: f4cba4bf6777 ("mm/damon: rename CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED")
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is breaking all UML alltests builds, so we'd like to fix it sooner
rather than later. SeongJae, would you rather take this yourself, or can
we push it alongside any other KUnit fixes?
Johannes: Does this conflict with the CONFIG_NETDEVICES / CONFIG_WLAN
fixes to all_tests.config? I'd assume not, but I'm happy to take them
via KUnit if you'd prefer anyway.
Thanks,
-- David
---
tools/testing/kunit/configs/all_tests.config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index aa5ec149f96c..f388742cf266 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -38,6 +38,7 @@ CONFIG_DAMON_VADDR=y
CONFIG_DAMON_PADDR=y
CONFIG_DEBUG_FS=y
CONFIG_DAMON_DBGFS=y
+CONFIG_DAMON_DBGFS_DEPRECATED=y
CONFIG_REGMAP_BUILD=y
--
2.44.0.396.g6e790dbe36-goog
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/netdev/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.c…
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (4):
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 ++--
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 36 +++++++----
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +--
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 6 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 49 ++------------
net/ipv4/fou_core.c | 9 +--
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 79 ++++++++++++++++++-----
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 26 ++++----
net/ipv6/ip6_offload.c | 41 +++++-------
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 ++--
tools/testing/selftests/net/udpgro_fwd.sh | 10 ++-
24 files changed, 187 insertions(+), 155 deletions(-)
--
2.36.1
The longest running netdevsim test, nexthop.sh, currently takes
5 min to finish. Around 260s to be exact, and 310s on a debug kernel.
The default timeout in selftest is 45sec, so we need an explicit
config. Give ourselves some headroom and use 10min.
Commit under Fixes isn't really to "blame" but prior to that
netdevsim tests weren't integrated with kselftest infra
so blaming the tests themselves doesn't seem right, either.
Fixes: 8ff25dac88f6 ("netdevsim: add Makefile for selftests")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: dw(a)davidwei.uk
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/netdevsim/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/drivers/net/netdevsim/settings
diff --git a/tools/testing/selftests/drivers/net/netdevsim/settings b/tools/testing/selftests/drivers/net/netdevsim/settings
new file mode 100644
index 000000000000..a62d2fa1275c
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netdevsim/settings
@@ -0,0 +1 @@
+timeout=600
--
2.44.0
This cleans up the output of the tty_tstamp_update selftest to play a
bit more nicely with automated systems parsing the test output.
To do this I've also added a new helper ksft_test_result() which takes a
KSFT_ code as a report, this is something I've wanted on other occasions
but restructured things to avoid needing it. This time I figured I'd
just add it since it keeps coming up.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest: Add mechanism for reporting a KSFT_ result code
kselftest/tty: Report a consistent test name for the one test we run
tools/testing/selftests/kselftest.h | 22 ++++++++++++
tools/testing/selftests/tty/tty_tstamp_update.c | 48 +++++++++++++++++--------
2 files changed, 55 insertions(+), 15 deletions(-)
---
base-commit: 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478
change-id: 20240305-kselftest-tty-tname-5411444ce037
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.9-rc1.
- Link to v1: https://lore.kernel.org/r/20231219-b4-kselftest-seccomp-benchmark-timeout-v…
---
tools/testing/selftests/seccomp/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/seccomp/settings b/tools/testing/selftests/seccomp/settings
index 6091b45d226b..a953c96aa16e 100644
--- a/tools/testing/selftests/seccomp/settings
+++ b/tools/testing/selftests/seccomp/settings
@@ -1 +1 @@
-timeout=120
+timeout=180
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231219-b4-kselftest-seccomp-benchmark-timeout-05b66e7d29d1
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae3..3f74c09c56b6 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.39.2
As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:
- The forwarding selftests use a configuration file where names of
interfaces are defined and various variables can be overridden. There
is also forwarding.config.sample that users can use as a template to
refer to when creating the config file. What happens a fair bit is
that users either do not know about this at all, or simply forget, and
are confused by cryptic failures about interfaces that cannot be
created.
In patches #1 - #3 have lib.sh just be the single source of truth with
regards to which variables exist. That includes the topology variables
which were previously only in the sample file, and any "tweak
variables", such as what tools to use, sleep times, etc.
forwarding.config.sample then becomes just a placeholder with a couple
examples. Unless specific HW should be exercised, or specific tools
used, the defaults are usually just fine.
- Several net/forwarding/ selftests (and one net/ one) cannot be run on
veth pairs, they need an actual HW interface to run on. They are
generic in the sense that any capable HW should pass them, which is
why they have been put to net/forwarding/ as opposed to drivers/net/,
but they do not generalize to veth. The fact that these tests are in
net/forwarding/, but still complaining when run, is confusing.
In patches #4 - #6 move these tests to a new directory
drivers/net/hw.
- The following patches extend the codebase to handle well test results
other than pass and fail.
Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
so that tests do not spuriously end up returning non-0 when they
are not supposed to.
In patches #8 - #10, introduce some missing ksft constants, then support
having those constants in RET, and then finally in EXIT_STATUS.
- The traffic scheduler tests generate a large amount of network traffic
to test the behavior of the scheduler. This demands a relatively
high-performance computer. On slow machines, such as with a debugging
kernel, the test would spuriously fail.
It can still be useful to "go through the motions" though, to possibly
catch bugs in setup of the scheduler graph and passing packets around.
Thus we still want to run the tests, just with lowered demands.
To that end, in patches #11 - #12, introduce an environment variable
KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
is provided to mark performance-sensitive parts of the selftest.
- In patch #13, use a similar mechanism to mark a NH group stats
selftest to XFAIL HW stats tests when run on VETH pairs.
- All these changes complicate the hitherto straightforward logging and
checking logic, so in patch #14, add a selftest that checks this
functionality in lib.sh.
Petr Machata (14):
selftests: net: libs: Change variable fallback syntax
selftests: forwarding.config.sample: Move overrides to lib.sh
selftests: forwarding: README: Document customization
selftests: forwarding: ipip_lib: Do not import lib.sh
selftests: forwarding: Move several selftests
selftests: forwarding: Ditch skip_on_veth()
selftests: forwarding: Change inappropriate log_test_skip() calls
selftests: lib: Define more kselftest exit codes
selftests: forwarding: Have RET track kselftest framework constants
selftests: forwarding: Convert log_test() to recognize RET values
selftests: forwarding: Support for performance sensitive tests
selftests: forwarding: Mark performance-sensitive tests
selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
selftests: forwarding: Add a test for testing lib.sh functionality
.../testing/selftests/drivers/net/hw/Makefile | 25 ++
.../net/hw}/devlink_port_split.py | 0
.../forwarding => drivers/net/hw}/ethtool.sh | 5 +-
.../net/hw}/ethtool_extended_state.sh | 5 +-
.../net/hw}/ethtool_lib.sh | 0
.../net/hw}/ethtool_mm.sh | 3 +-
.../net/hw}/ethtool_rmon.sh | 7 +-
.../net/hw}/hw_stats_l3.sh | 19 +-
.../net/hw}/hw_stats_l3_gre.sh | 7 +-
.../forwarding => drivers/net/hw}/loopback.sh | 5 +-
.../testing/selftests/drivers/net/hw/settings | 1 +
.../selftests/drivers/net/mlxsw/mlxsw_lib.sh | 2 +-
.../net/mlxsw/spectrum-2/resource_scale.sh | 1 -
.../net/mlxsw/spectrum/resource_scale.sh | 1 -
tools/testing/selftests/net/Makefile | 1 -
.../testing/selftests/net/forwarding/Makefile | 9 +-
tools/testing/selftests/net/forwarding/README | 33 +++
.../net/forwarding/forwarding.config.sample | 53 ++--
.../selftests/net/forwarding/ipip_lib.sh | 1 -
tools/testing/selftests/net/forwarding/lib.sh | 255 +++++++++++++-----
.../selftests/net/forwarding/lib_sh_test.sh | 208 ++++++++++++++
.../net/forwarding/router_mpath_nh_lib.sh | 12 +-
.../selftests/net/forwarding/sch_ets_tests.sh | 19 +-
.../selftests/net/forwarding/sch_red.sh | 10 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 2 +-
.../selftests/net/forwarding/tc_common.sh | 2 +-
.../selftests/net/forwarding/tc_tunnel_key.sh | 2 -
tools/testing/selftests/net/lib.sh | 48 +++-
28 files changed, 565 insertions(+), 171 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/hw/Makefile
rename tools/testing/selftests/{net => drivers/net/hw}/devlink_port_split.py (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool.sh (98%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_extended_state.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_lib.sh (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_mm.sh (99%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_rmon.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3_gre.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/loopback.sh (92%)
create mode 100644 tools/testing/selftests/drivers/net/hw/settings
create mode 100755 tools/testing/selftests/net/forwarding/lib_sh_test.sh
--
2.43.0
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v5->v6:
- Rebased to v6.9-rc1.
- Collect RB.
- Add multiple filter test.
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
v5: https://lore.kernel.org/all/20240229065625.114207-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 336 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 28 ++
4 files changed, 372 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 325 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 29 ++
4 files changed, 362 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
This series has some conflicts with SNC series from Maciej and also
with the MBA/MBM series from Babu.
--
i.
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (13):
selftests/resctrl: Convert get_mem_bw_imc() fd close to for loop
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 21 +-
tools/testing/selftests/resctrl/mba_test.c | 34 ++-
tools/testing/selftests/resctrl/mbm_test.c | 33 ++-
tools/testing/selftests/resctrl/resctrl.h | 48 ++--
tools/testing/selftests/resctrl/resctrl_val.c | 269 ++++++------------
tools/testing/selftests/resctrl/resctrlfs.c | 55 ++--
8 files changed, 224 insertions(+), 247 deletions(-)
--
2.39.2
Currently, VA exhaustion is being checked by passing a hint to mmap() and
expecting it to fail. This patch makes a stricter test by successful
write() calls from /proc/self/maps to a dump file, confirming that a
free chunk is indeed not available.
Changes in v2:
- Replace SZ_1GB with MAP_CHUNK_SIZE, tidy-up
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240314122250.68534-1-dev.jain@arm.com/
.../selftests/mm/virtual_address_range.c | 66 +++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 7bcf8d48256a..050e997e3be2 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -12,6 +12,8 @@
#include <errno.h>
#include <sys/mman.h>
#include <sys/time.h>
+#include <fcntl.h>
+
#include "../kselftest.h"
/*
@@ -93,6 +95,66 @@ static int validate_lower_address_hint(void)
return 1;
}
+static int validate_complete_va_space(void)
+{
+ unsigned long start_addr, end_addr, prev_end_addr;
+ char line[400];
+ char prot[6];
+ FILE *file;
+ int fd;
+
+ fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
+ unlink("va_dump");
+ if (fd < 0) {
+ ksft_test_result_skip("cannot create or open dump file\n");
+ ksft_finished();
+ }
+
+ file = fopen("/proc/self/maps", "r");
+ if (file == NULL)
+ ksft_exit_fail_msg("cannot open /proc/self/maps\n");
+
+ prev_end_addr = 0;
+ while (fgets(line, sizeof(line), file)) {
+ unsigned long hop;
+
+ if (sscanf(line, "%lx-%lx %s[rwxp-]",
+ &start_addr, &end_addr, prot) != 3)
+ ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
+
+ /* end of userspace mappings; ignore vsyscall mapping */
+ if (start_addr & (1UL << 63))
+ return 0;
+
+ /* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
+ if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
+ return 1;
+
+ prev_end_addr = end_addr;
+
+ if (prot[0] != 'r')
+ continue;
+
+ /*
+ * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
+ * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
+ * addresses after that. If the address was not held by this
+ * process, write would fail with errno set to EFAULT.
+ * Anyways, if write returns anything apart from 1, exit the
+ * program since that would mean a bug in /proc/self/maps.
+ */
+ hop = 0;
+ while (start_addr + hop < end_addr) {
+ if (write(fd, (void *)(start_addr + hop), 1) != 1)
+ return 1;
+ lseek(fd, 0, SEEK_SET);
+
+ hop += MAP_CHUNK_SIZE;
+ }
+ }
+ return 0;
+}
+
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
@@ -135,6 +197,10 @@ int main(int argc, char *argv[])
validate_addr(hptr[i], 1);
}
hchunks = i;
+ if (validate_complete_va_space()) {
+ ksft_test_result_fail("BUG in mmap() or /proc/self/maps\n");
+ ksft_finished();
+ }
for (i = 0; i < lchunks; i++)
munmap(ptr[i], MAP_CHUNK_SIZE);
--
2.34.1
Currently, VA exhaustion is being checked by passing a hint to mmap() and
expecting it to fail. This patch makes a stricter test by successful write()
calls from /proc/self/maps to a dump file, confirming that a free chunk is
indeed not available.
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240314122250.68534-1-dev.jain@arm.com/
.../selftests/mm/virtual_address_range.c | 69 +++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 7bcf8d48256a..31063613dfd9 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -12,6 +12,8 @@
#include <errno.h>
#include <sys/mman.h>
#include <sys/time.h>
+#include <fcntl.h>
+
#include "../kselftest.h"
/*
@@ -93,6 +95,69 @@ static int validate_lower_address_hint(void)
return 1;
}
+static int validate_complete_va_space(void)
+{
+ unsigned long start_addr, end_addr, prev_end_addr;
+ char line[400];
+ char prot[6];
+ FILE *file;
+ int fd;
+
+ fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
+ unlink("va_dump");
+ if (fd < 0) {
+ ksft_test_result_skip("cannot create or open dump file\n");
+ ksft_finished();
+ }
+
+ file = fopen("/proc/self/maps", "r");
+ if (file == NULL)
+ ksft_exit_fail_msg("cannot open /proc/self/maps\n");
+
+ prev_end_addr = 0;
+ while (fgets(line, sizeof(line), file)) {
+ unsigned long hop;
+ int ret;
+
+ ret = sscanf(line, "%lx-%lx %s[rwxp-]",
+ &start_addr, &end_addr, prot);
+ if (ret != 3)
+ ksft_exit_fail_msg("sscanf failed, cannot parse\n");
+
+ /* end of userspace mappings; ignore vsyscall mapping */
+ if (start_addr & (1UL << 63))
+ return 0;
+
+ /* /proc/self/maps must have gaps less than 1GB only */
+ if (start_addr - prev_end_addr >= SZ_1GB)
+ return 1;
+
+ prev_end_addr = end_addr;
+
+ if (prot[0] != 'r')
+ continue;
+
+ /*
+ * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
+ * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
+ * addresses after that. If the address was not held by this
+ * process, write would fail with errno set to EFAULT.
+ * Anyways, if write returns anything apart from 1, exit the
+ * program since that would mean a bug in /proc/self/maps.
+ */
+ hop = 0;
+ while (start_addr + hop < end_addr) {
+ if (write(fd, (void *)(start_addr + hop), 1) != 1)
+ return 1;
+ else
+ lseek(fd, 0, SEEK_SET);
+
+ hop += MAP_CHUNK_SIZE;
+ }
+ }
+ return 0;
+}
+
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
@@ -135,6 +200,10 @@ int main(int argc, char *argv[])
validate_addr(hptr[i], 1);
}
hchunks = i;
+ if (validate_complete_va_space()) {
+ ksft_test_result_fail("BUG in mmap() or /proc/self/maps\n");
+ ksft_finished();
+ }
for (i = 0; i < lchunks; i++)
munmap(ptr[i], MAP_CHUNK_SIZE);
--
2.34.1
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature will be available in the
next version of "AMD64 Architecture Programmer’s Manual".
Testing Done:
Added a selftest to test the Idle HLT intercept functionality.
Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
tools: Add KVM exit reason for the Idle HLT
selftests: Add an interface to read the data of named vcpu stat
selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 11 +-
tools/arch/x86/include/uapi/asm/svm.h | 2 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 11 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 41 ++++++
.../selftests/kvm/x86_64/svm_idlehlt_test.c | 119 ++++++++++++++++++
9 files changed, 186 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
base-commit: fdd58834d132046149699b88a27a0db26829f4fb
--
2.34.1
Hi,
While running kselftest on vanilla torvalds tree kernel commit v6.8-11167-g4438a810f396,
the test suite reported a number of errors.
I was using the latest iproute2-next suite on an Ubuntu 22.04 LTS box.
# Tests passed: 558
# Tests failed: 84
not ok 90 selftests: net: test_vxlan_mdb.sh # exit=1
495:# TEST: Destination IP - match [FAIL]
496:# TEST: Destination IP - no match [FAIL]
497:# TEST: Default destination port - match [FAIL]
498:# TEST: Default destination port - no match [FAIL]
499:# TEST: Non-default destination port - match [FAIL]
500:# TEST: Non-default destination port - no match [FAIL]
501:# TEST: Default destination VNI - match [FAIL]
502:# TEST: Default destination VNI - no match [FAIL]
503:# TEST: Non-default destination VNI - match [FAIL]
504:# TEST: Non-default destination VNI - no match [FAIL]
521:# TEST: Destination IP - match [FAIL]
522:# TEST: Destination IP - no match [FAIL]
523:# TEST: Default destination port - match [FAIL]
524:# TEST: Default destination port - no match [FAIL]
525:# TEST: Non-default destination port - match [FAIL]
526:# TEST: Non-default destination port - no match [FAIL]
527:# TEST: Default destination VNI - match [FAIL]
528:# TEST: Default destination VNI - no match [FAIL]
529:# TEST: Non-default destination VNI - match [FAIL]
530:# TEST: Non-default destination VNI - no match [FAIL]
549:# TEST: Forward valid source - first VTEP [FAIL]
550:# TEST: Forward valid source - second VTEP [FAIL]
551:# TEST: Block excluded source after removal - first VTEP [FAIL]
552:# TEST: Block excluded source after removal - second VTEP [FAIL]
553:# TEST: Forward valid source after removal - first VTEP [FAIL]
554:# TEST: Forward valid source after removal - second VTEP [FAIL]
571:# TEST: Forward valid source - first VTEP [FAIL]
572:# TEST: Forward valid source - second VTEP [FAIL]
573:# TEST: Block excluded source after removal - first VTEP [FAIL]
574:# TEST: Block excluded source after removal - second VTEP [FAIL]
575:# TEST: Forward valid source after removal - first VTEP [FAIL]
576:# TEST: Forward valid source after removal - second VTEP [FAIL]
593:# TEST: Forward valid source - first VTEP [FAIL]
594:# TEST: Forward valid source - second VTEP [FAIL]
595:# TEST: Block excluded source after removal - first VTEP [FAIL]
596:# TEST: Block excluded source after removal - second VTEP [FAIL]
597:# TEST: Forward valid source after removal - first VTEP [FAIL]
598:# TEST: Forward valid source after removal - second VTEP [FAIL]
615:# TEST: Forward valid source - first VTEP [FAIL]
616:# TEST: Forward valid source - second VTEP [FAIL]
617:# TEST: Block excluded source after removal - first VTEP [FAIL]
618:# TEST: Block excluded source after removal - second VTEP [FAIL]
619:# TEST: Forward valid source after removal - first VTEP [FAIL]
620:# TEST: Forward valid source after removal - second VTEP [FAIL]
636:# TEST: Forward valid source [FAIL]
637:# TEST: Receive of valid source after removal from group [FAIL]
648:# TEST: Forward valid source [FAIL]
649:# TEST: Receive of valid source after removal from group [FAIL]
660:# TEST: Forward valid source [FAIL]
661:# TEST: Receive of valid source after removal from group [FAIL]
672:# TEST: Forward valid source [FAIL]
673:# TEST: Receive of valid source after removal from group [FAIL]
683:# TEST: Egress VNI translation - PVID configured [FAIL]
684:# TEST: Egress VNI translation - no PVID configured [FAIL]
685:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
695:# TEST: Egress VNI translation - PVID configured [FAIL]
696:# TEST: Egress VNI translation - no PVID configured [FAIL]
697:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
707:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
709:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
710:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
711:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
712:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
713:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
714:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
715:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
716:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
734:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
736:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
737:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
738:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
739:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
740:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
741:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
742:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
743:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
761:# TEST: IP multicast - first VTEP [FAIL]
763:# TEST: Broadcast - first VTEP [FAIL]
765:# TEST: IP multicast after removal - first VTEP [FAIL]
766:# TEST: IP multicast after removal - second VTEP [FAIL]
779:# TEST: IP multicast - first VTEP [FAIL]
781:# TEST: Broadcast - first VTEP [FAIL]
783:# TEST: IP multicast after removal - first VTEP [FAIL]
784:# TEST: IP multicast after removal - second VTEP [FAIL]
The problem is present at least since 6.8-rc7.
Please find attached the config and the full output of test_vxlan_mdb.sh.
Hope this helps.
Best regards,
Mirsad Todorovac
New version of the sleepable bpf_timer code, without the HID changes, as
they can now go through the HID tree independantly.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (6):
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 132 ++++++++++++++++++++-
kernel/bpf/verifier.c | 92 +++++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/bpf_experimental.h | 4 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 1 +
tools/testing/selftests/bpf/progs/timer.c | 40 ++++++-
tools/testing/selftests/bpf/progs/timer_failure.c | 114 +++++++++++++++++-
11 files changed, 387 insertions(+), 11 deletions(-)
---
base-commit: 9187210eee7d87eea37b45ea93454a88681894a4
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a pasid.
This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address
space or nested on top of an GPA address space. This series only add VT-d
support as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Change log:
v1:
- Implemnet iommu_replace_device_pasid() to fall back to the original domain
if this replacement failed (Kevin)
- Add check in do_attach() to check corressponding attach_fn per the pasid value.
rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 50 +++--
drivers/iommu/iommufd/iommufd_private.h | 23 +++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++
drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 785 insertions(+), 42 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c
--
2.34.1
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Acked-by: Stanislav Fomichev <sdf(a)google.com>
---
V2 -> V3: No changes
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: c733239f8f530872a1f80d8c45dcafbaff368737
--
2.40.1
On riscv, mmap currently returns an address from the largest address
space that can fit entirely inside of the hint address. This makes it
such that the hint address is almost never returned. This patch raises
the mappable area up to and including the hint address. This allows mmap
to often return the hint address, which allows a performance improvement
over searching for a valid address as well as making the behavior more
similar to other architectures.
Note that a previous patch introduced stronger semantics compared to
other architectures for riscv mmap. On riscv, mmap will not use bits in
the upper bits of the virtual address depending on the hint address. On
other architectures, a random address is returned in the address space
requested. On all architectures the hint address will be returned if it
is available. This allows riscv applications to configure how many bits
in the virtual address should be left empty. This has the two benefits
of being able to request address spaces that are smaller than the
default and doesn't require the application to know the page table
layout of riscv.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Add back forgotten "mmap_end = STACK_TOP_MAX"
- Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@…
---
Charlie Jenkins (3):
riscv: mm: Use hint address in mmap if available
selftests: riscv: Generalize mm selftests
docs: riscv: Define behavior of mmap
Documentation/arch/riscv/vm-layout.rst | 16 ++--
arch/riscv/include/asm/processor.h | 22 +++---
tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_default.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++-----------
5 files changed, 67 insertions(+), 104 deletions(-)
---
base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc
change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1
--
- Charlie
v3:
- pull forward to v6.8
- style and small fixups recommended by jcameron
- update syscall number (will do all archs when RFC tag drops)
- update for new folio code
- added OCP link to device-tracked address hotness proposal
- kept void* over __u64 simply because it integrates cleanly with
existing migration code. If there's strong opinions, I can refactor.
This patch set is a proposal for a syscall analogous to move_pages,
that migrates pages between NUMA nodes using physical addressing.
The intent is to better enable user-land system-wide memory tiering
as CXL devices begin to provide memory resources on the PCIe bus.
For example, user-land software which is making decisions based on
data sources which expose physical address information no longer
must convert that information to virtual addressing to act upon it
(see background for info on how physical addresses are acquired).
The syscall requires CAP_SYS_ADMIN, since physical address source
information is typically protected by the same (or CAP_SYS_NICE).
This patch set broken into 3 patches:
1) refactor of existing migration code for code reuse
2) The sys_move_phys_pages system call.
3) ktest of the syscall
The sys_move_phys_pages system call validates the page may be
migrated by checking migratable-status of each vma mapping the page,
and the intersection of cpuset policies each vma's task.
Background:
Userspace job schedulers, memory managers, and tiering software
solutions depend on page migration syscalls to reallocate resources
across NUMA nodes. Currently, these calls enable movement of memory
associated with a specific PID. Moves can be requested in coarse,
process-sized strokes (as with migrate_pages), and on specific virtual
pages (via move_pages).
However, a number of profiling mechanisms provide system-wide information
that would benefit from a physical-addressing version move_pages.
There are presently at least 4 ways userland can acquire physical
address information for use with this interface, and 1 hardware offload
mechanism being proposed by opencompute.
1) /proc/pid/pagemap: can be used to do page table translations.
This is only really useful for testing, and the ktest was
written using this functionality.
2) X86: IBS (AMD) and PEBS (Intel) can be configured to return physical
and/or vitual address information.
3) zoneinfo: /proc/zoneinfo exposes the start PFN of zones
4) /sys/kernel/mm/page_idle: A way to query whether a PFN is idle.
So long as the page size is known, this can be used to identify
system-wide idle pages that could be migrated to lower tiers.
https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
5) CXL Offloaded Hotness Monitoring (Proposed): a CXL memory device
may provide hot/cold information about its memory. For example,
it may report the hottest device addresses (0-based) or a physical
address (if it has access to decoders for convert bases).
DPA can be cheaply converted to HPA by combining it with data
exposed by /sys/bus/cxl/ information (region address bases).
See: https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Information from these sources facilitates systemwide resource management,
but with the limitations of migrate_pages and move_pages applying to
individual tasks, their outputs must be converted back to virtual addresses
and re-associated with specific PIDs.
Doing this reverse-translation outside of the kernel requires considerable
space and compute, and it will have to be performed again by the existing
system calls. Much of this work can be avoided if the pages can be
migrated directly with physical memory addressing.
Gregory Price (3):
mm/migrate: refactor add_page_for_migration for code re-use
mm/migrate: Create move_phys_pages syscall
ktest: sys_move_phys_pages ktest
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 5 +
include/uapi/asm-generic/unistd.h | 8 +-
kernel/sys_ni.c | 1 +
mm/migrate.c | 288 ++++++++++++++++++++----
tools/include/uapi/asm-generic/unistd.h | 8 +-
tools/testing/selftests/mm/migration.c | 99 ++++++++
8 files changed, 370 insertions(+), 41 deletions(-)
--
2.39.1
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..7989ec608454 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.34.1
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
The series adding SNC support to the kernel is currently in review [1]
but the selftests for resctrl need NUMA-aware adjustments to use these
changes. Issues with resctrl selftests not working properly with SNC
enabled were originally reported by Shaopeng Tan [2][3] and the
following series resolves them.
The main concept currently missing from resctrl selftests is that while
resctrl tracks memory accesses on a single NUMA node (which normally is
the same as the CPU socket) on machines with SNC enabled memory accesses
can leak outside of the local NUMA node and into other NUMA nodes on the
same socket. In that case resctrl could report a diminished value in one
of its monitoring technologies: Cache Monitoring Technology (CMT) or
Memory Bandwidth Monitoring (MBM) .
Implemented solutions for both CMT and MBM follow the same idea which is
to simply sum values reported by different NUMA nodes for a single
Resource Monitoring ID (RMID).
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240228112215.8044-tony.luck@intel.com/
[2] https://lore.kernel.org/all/TYAPR01MB6330B9B17686EF426D2C3F308B25A@TYAPR01M…
[3] https://lore.kernel.org/lkml/TYAPR01MB6330A4EB3633B791939EA45E8B39A@TYAPR01…
Maciej Wieczor-Retman (4):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: SNC support for CMT
selftests/resctrl: SNC support for MBM
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cache.c | 17 ++-
tools/testing/selftests/resctrl/cat_test.c | 2 +-
tools/testing/selftests/resctrl/cmt_test.c | 6 +-
tools/testing/selftests/resctrl/mba_test.c | 3 +-
tools/testing/selftests/resctrl/mbm_test.c | 4 +-
tools/testing/selftests/resctrl/resctrl.h | 13 +-
tools/testing/selftests/resctrl/resctrl_val.c | 46 ++++---
tools/testing/selftests/resctrl/resctrlfs.c | 128 +++++++++++++++++-
8 files changed, 185 insertions(+), 34 deletions(-)
--
2.44.0
Hi,
With the commit v6.8-11167-g4438a810f396 in vanilla torvalds tree, there seem to be problems with
the icmp_redirect.sh tests.
The iproute2-next tools were used, commit 7a6d30c95da9.
# timeout set to 3600
# selftests: net: icmp_redirect.sh
#
# ###########################################################################
# Legacy routing
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Legacy routing with VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects and VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# Tests passed: 28
# Tests failed: 12
# Tests xfailed: 0
not ok 45 selftests: net: icmp_redirect.sh # exit=1
These errors are not introduced with this commit, but were already present at least in 6.8-rc7.
Hope this helps.
Best regards,
Mirsad Todorovac
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
---
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: 4c8644f86c854c214aaabbcc24a27fa4c7e6a951
--
2.40.1
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v3:
- Changing from input() to stdin
- Add checking for non-regular files
- Spacing fix
- Small printing fix
tools/testing/kunit/kunit.py | 54 +++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..641b8ca83e3e 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,40 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append('/dev/stdin')
+ elif len(parsed_files) == 1 and parsed_files[0] == "debugfs":
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isdir(file):
+ print(f'Ignoring directory "{file}"')
+ elif os.path.exists(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ else:
+ print(f'Could not find "{file}"')
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +590,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.291.gc1ea87d7ee-goog
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v2:
- Fixed bug with input from command line. I changed this to use
input(). Daniel, let me know if this works for you.
- Add more specific warning messages
tools/testing/kunit/kunit.py | 56 +++++++++++++++++++++++++-----------
1 file changed, 40 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..1aa3d736d80c 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,42 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append(input("File path: "))
+
+ if parsed_files[0] == "debugfs" and len(parsed_files) == 1:
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isfile(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ elif os.path.isdir(file):
+ print("Ignoring directory ", file)
+ else:
+ print("Could not find ", file)
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +592,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.278.ge034bb2e1d-goog
Add missing flags argument to open(2) call with O_CREAT.
Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid
value) (together with -O), resulting in similar error messages such as:
In file included from /usr/include/fcntl.h:342,
from gup_test.c:1:
In function 'open',
inlined from 'main' at gup_test.c:206:10:
/usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
50 | __open_missing_mode ();
| ^~~~~~~~~~~~~~~~~~~~~~
_FORTIFY_SOURCE is enabled by default in some distributions, so the
tests are not built by default and are skipped.
open(2) man-page warns about missing flags argument: "if it is not
supplied, some arbitrary bytes from the stack will be applied as the
file mode."
Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file")
Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split")
Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect")
Cc: Keith Busch <kbusch(a)kernel.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Vitaly Chikunov <vt(a)altlinux.org>
---
tools/testing/selftests/mm/gup_test.c | 2 +-
tools/testing/selftests/mm/soft-dirty.c | 2 +-
tools/testing/selftests/mm/split_huge_page_test.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/gup_test.c b/tools/testing/selftests/mm/gup_test.c
index cbe99594d319..18a49c70d4c6 100644
--- a/tools/testing/selftests/mm/gup_test.c
+++ b/tools/testing/selftests/mm/gup_test.c
@@ -203,7 +203,7 @@ int main(int argc, char **argv)
ksft_print_header();
ksft_set_plan(nthreads);
- filed = open(file, O_RDWR|O_CREAT);
+ filed = open(file, O_RDWR|O_CREAT, 0664);
if (filed < 0)
ksft_exit_fail_msg("Unable to open %s: %s\n", file, strerror(errno));
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index cc5f144430d4..7dbfa53d93a0 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -137,7 +137,7 @@ static void test_mprotect(int pagemap_fd, int pagesize, bool anon)
if (!map)
ksft_exit_fail_msg("anon mmap failed\n");
} else {
- test_fd = open(fname, O_RDWR | O_CREAT);
+ test_fd = open(fname, O_RDWR | O_CREAT, 0664);
if (test_fd < 0) {
ksft_test_result_skip("Test %s open() file failed\n", __func__);
return;
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index 856662d2f87a..6c988bd2f335 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -223,7 +223,7 @@ void split_file_backed_thp(void)
ksft_exit_fail_msg("Fail to create file-backed THP split testing file\n");
}
- fd = open(testfile, O_CREAT|O_WRONLY);
+ fd = open(testfile, O_CREAT|O_WRONLY, 0664);
if (fd == -1) {
ksft_perror("Cannot open testing file");
goto cleanup;
--
2.42.1
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.
The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests.
Patch four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b…
and offers a more comprehensive solution of the problem discussed there.
Design note:
Function pointers are only added to the __bug_table section if both
CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled to avoid image
size increases if CONFIG_KUNIT=n. There would be some benefits to
adding those pointers all the time (reduced complexity, ability to
display function names in BUG/WARNING messages). That change, if
desired, can be made later.
Checkpatch note:
Remaining checkpatch errors and warnings were deliberately ignored.
Some are triggered by matching coding style or by comments interpreted
as code, others by assembler macros which are disliked by checkpatch.
Suggestions for improvements are welcome.
Changes since RFC:
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 ID of a specific task, addressing a previous limitation where
only bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use case
at Netflix involved the sched_switch tracepoint, where we had to get
the cgroup IDs of both the previous and next tasks.
The bpf_task_get_cgroup_id kfunc returns a task's cgroup ID, correctly
implementing RCU read locking and unlocking for safe data access, and
leverages existing cgroup.h helpers to fetch the cgroup and its ID.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
---
kernel/bpf/helpers.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..8038b2bd3488 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,27 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup_id - Get the cgroup ID of a task.
+ * @task: The target task
+ *
+ * This function returns the ID of the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ */
+__bpf_kfunc u64 bpf_task_get_cgroup_id(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+ u64 cgrp_id;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ cgrp_id = cgroup_id(cgrp);
+ rcu_read_unlock();
+ return cgrp_id;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2594,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup_id, KF_RCU)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
--
2.40.1
There are statements with two semicolons. Remove the second one, it
is redundant.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/bpf/benchs/bench_local_storage_create.c | 2 +-
tools/testing/selftests/bpf/progs/iters.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c b/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c
index b36de42ee4d9..e2ff8ea1cb79 100644
--- a/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c
+++ b/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c
@@ -186,7 +186,7 @@ static void *task_producer(void *input)
for (i = 0; i < batch_sz; i++) {
if (!pthd_results[i])
- pthread_join(pthds[i], NULL);;
+ pthread_join(pthds[i], NULL);
}
}
diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
index 3db416606f2f..fe65e0952a1e 100644
--- a/tools/testing/selftests/bpf/progs/iters.c
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -673,7 +673,7 @@ static __noinline void fill(struct bpf_iter_num *it, int *arr, __u32 n, int mul)
static __noinline int sum(struct bpf_iter_num *it, int *arr, __u32 n)
{
- int *t, i, sum = 0;;
+ int *t, i, sum = 0;
while ((t = bpf_iter_num_next(it))) {
i = *t;
--
2.39.2
On 12/12/2023 12:47 PM, Shashar, Sagi wrote:
>
>
> -----Original Message-----
> From: Sagi Shahar <sagis(a)google.com>
> Sent: Tuesday, December 12, 2023 12:47 PM
> To: linux-kselftest(a)vger.kernel.org; Ackerley Tng <ackerleytng(a)google.com>; Afranji, Ryan <afranji(a)google.com>; Aktas, Erdem <erdemaktas(a)google.com>; Sagi Shahar <sagis(a)google.com>; Yamahata, Isaku <isaku.yamahata(a)intel.com>
> Cc: Sean Christopherson <seanjc(a)google.com>; Paolo Bonzini <pbonzini(a)redhat.com>; Shuah Khan <shuah(a)kernel.org>; Peter Gonda <pgonda(a)google.com>; Xu, Haibo1 <haibo1.xu(a)intel.com>; Chao Peng <chao.p.peng(a)linux.intel.com>; Annapurve, Vishal <vannapurve(a)google.com>; Roger Wang <runanwang(a)google.com>; Vipin Sharma <vipinsh(a)google.com>; jmattson(a)google.com; dmatlack(a)google.com; linux-kernel(a)vger.kernel.org; kvm(a)vger.kernel.org; linux-mm(a)kvack.org
> Subject: [RFC PATCH v5 27/29] KVM: selftests: Propagate KVM_EXIT_MEMORY_FAULT to userspace
>
> Allow userspace to handle KVM_EXIT_MEMORY_FAULT instead of triggering TEST_ASSERT.
>
> From the KVM_EXIT_MEMORY_FAULT documentation:
> Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it accompanies a return code of '-1', not '0'! errno will always be set to EFAULT or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should assume kvm_run.exit_reason is stale/undefined for all other error numbers.
If KVM exits to userspace with KVM_EXIT_MEMORY_FAULT, most likely it's because the guest attempts to access the gfn in a way that is different from what the KVM is configured, in terms of private/shared property. I'd suggest to drop this patch and work on the selftests code to eliminate this exit.
If we need a testcase to catch this exit intentionally, we may call _vcpu_run() directly from the testcase and keep the common API vcpu_run() intact.
>
> Signed-off-by: Sagi Shahar <sagis(a)google.com>
> ---
> tools/testing/selftests/kvm/lib/kvm_util.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index d024abc5379c..8fb041e51484 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1742,6 +1742,10 @@ void vcpu_run(struct kvm_vcpu *vcpu) {
> int ret = _vcpu_run(vcpu);
>
> + // Allow this scenario to be handled by the caller.
> + if (ret == -1 && errno == EFAULT)
> + return;
> +
> TEST_ASSERT(!ret, KVM_IOCTL_ERROR(KVM_RUN, ret)); }
>
> --
> 2.43.0.472.g3155946c3a-goog
>
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V Zicfiss feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
We currently maintain the x86 pattern of implicitly allocating a shadow
stack for threads started with shadow stack enabled, there has been some
discussion of removing this support and requiring the use of clone3()
with explicit allocation of shadow stacks instead. I have no strong
feelings either way, implicit allocation is not really consistent with
anything else we do and creates the potential for errors around thread
exit but on the other hand it is existing ABI on x86 and minimises the
changes needed in userspace code.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
The series depends on support for shadow stacks in clone3(), that series
includes the addition of ARCH_HAS_USER_SHADOW_STACK.
https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
It also depends on the addition of more waitpid() flags to nolibc:
https://lore.kernel.org/r/20231023-nolibc-waitpid-flags-v2-1-b09d096f091f@k…
You can see a branch with the full set of dependencies against Linus'
tree at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (38):
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
mman: Add map_shadow_stack() flags
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide put_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Ensure that new threads have a GCS
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
kselftest: Provide shadow stack enable helpers for arm64
Documentation/admin-guide/kernel-parameters.txt | 6 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 233 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 20 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 107 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/mman.h | 23 +-
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 40 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 85 +++
arch/arm64/kernel/ptrace.c | 59 ++
arch/arm64/kernel/signal.c | 242 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/emulate-nested.c | 4 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 79 ++-
arch/arm64/mm/gcs.c | 300 +++++++++
arch/arm64/mm/mmap.c | 13 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/include/uapi/asm/mman.h | 3 -
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/uapi/asm-generic/mman.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 428 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 62 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/ksft_shstk.h | 37 ++
73 files changed, 4241 insertions(+), 40 deletions(-)
---
base-commit: 50abefbf1bc07f5c4e403fd28f71dcee855100f7
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Changes since v1:
- Rebased the series on top of next-20240202
Muhammad Usama Anjum (12):
selftests/mm: map_fixed_noreplace: conform test to TAP format output
selftests/mm: map_hugetlb: conform test to TAP format output
selftests/mm: map_populate: conform test to TAP format output
selftests/mm: mlock-random-test: conform test to TAP format output
selftests/mm: mlock2-tests: conform test to TAP format output
selftests/mm: mrelease_test: conform test to TAP format output
selftests/mm: mremap_dontunmap: conform test to TAP format output
selftests/mm: split_huge_page_test: conform test to TAP format output
selftests/mm: thp_settings: conform to TAP format output
selftests/mm: thuge-gen: conform to TAP format output
selftests/mm: transhuge-stress: conform to TAP format output
selftests/mm: virtual_address_range: conform to TAP format output
tools/testing/selftests/mm/khugepaged.c | 3 +-
.../selftests/mm/map_fixed_noreplace.c | 96 ++----
tools/testing/selftests/mm/map_hugetlb.c | 42 ++-
tools/testing/selftests/mm/map_populate.c | 37 ++-
.../testing/selftests/mm/mlock-random-test.c | 136 ++++-----
tools/testing/selftests/mm/mlock2-tests.c | 282 +++++++-----------
tools/testing/selftests/mm/mlock2.h | 11 +-
tools/testing/selftests/mm/mrelease_test.c | 80 ++---
tools/testing/selftests/mm/mremap_dontunmap.c | 32 +-
.../selftests/mm/split_huge_page_test.c | 161 +++++-----
tools/testing/selftests/mm/thp_settings.c | 123 +++-----
tools/testing/selftests/mm/thp_settings.h | 4 +-
tools/testing/selftests/mm/thuge-gen.c | 147 ++++-----
tools/testing/selftests/mm/transhuge-stress.c | 36 ++-
.../selftests/mm/virtual_address_range.c | 44 +--
tools/testing/selftests/mm/vm_util.c | 6 +-
16 files changed, 537 insertions(+), 703 deletions(-)
--
2.42.0
In this series, I'm trying to add 3 missing tests to vm_runtests.sh
which is used to run all the tests in mm suite. These tests weren't
running by CIs. While enabling them and through review feedback, I've
fixed some problems in tests as well. I've found more flakiness in more
tests which I'll be fixing with future patches.
hugetlb-read-hwpoison test is being added where it can only run with
newly added "-d" (destructive) flag only. Not sure why it is failing
again. So once it become stable, we can think of moving it to default
set of tests if it doesn't have any side-effect to them.
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
---
Changes in v3:
- Add cover letter
- Fix flakiness in tests found during enablement
- Move additional tests down in the file
- Add "-d" option which poisons the pages and aren't being useable after
the test
v2: https://lore.kernel.org/all/20240123073615.920324-1-usama.anjum@collabora.c…
Muhammad Usama Anjum (5):
selftests/mm: hugetlb_reparenting_test: do not unmount
selftests/mm: run_vmtests: remove sudo and conform to tap
selftests/mm: save and restore nr_hugepages value
selftests/mm: protection_keys: save/restore nr_hugepages settings
selftests/mm: run_vmtests.sh: add missing tests
tools/testing/selftests/mm/Makefile | 5 +++
.../selftests/mm/charge_reserved_hugetlb.sh | 4 +++
.../selftests/mm/hugetlb_reparenting_test.sh | 9 +++--
tools/testing/selftests/mm/on-fault-limit.c | 36 +++++++++----------
tools/testing/selftests/mm/protection_keys.c | 34 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 10 +++++-
6 files changed, 76 insertions(+), 22 deletions(-)
--
2.42.0
Hello,
I've been running execveat (execveat.c) locally on v6.1 and next-20240228.
It has flaky test case. There are some test cases which fail consistently.
The comment (not very clear) on top of failing cases is as following:
/*
* Execute as a long pathname relative to "/". If this is a script,
* the interpreter will launch but fail to open the script because its
* name ("/dev/fd/5/xxx....") is bigger than PATH_MAX.
*
* The failure code is usually 127 (POSIX: "If a command is not found,
* the exit status shall be 127."), but some systems give 126 (POSIX:
* "If the command name is found, but it is not an executable utility,
* the exit status shall be 126."), so allow either.
*/
The file name is just less than PATH_MAX (4096) and we are expecting the
execveat() to fail with particular 99 or 127/128 error code. But kernel is
returning 1 error code. Snippet from full output:
# child 3493092 exited with 1 not 99 nor 99
# child 3493094 exited with 1 not 127 nor 126
I'm not sure if test is wrong or the kernel has changed the return error codes.
Full test run output:
./execveat
TAP version 13
1..51
ok 1 Check success of execveat(3, '../execveat', 0)...
ok 2 Check success of execveat(5, 'execveat', 0)...
ok 3 Check success of execveat(7, 'execveat', 0)...
ok 4 Check success of execveat(-100,
'/home/usama/repos/ke...ftests/exec/execveat', 0)...
ok 5 Check success of execveat(99,
'/home/usama/repos/ke...ftests/exec/execveat', 0)...
ok 6 Check success of execveat(9, '', 4096)...
ok 7 Check success of execveat(18, '', 4096)...
ok 8 Check success of execveat(10, '', 4096)...
ok 9 Check success of execveat(15, '', 4096)...
ok 10 Check success of execveat(15, '', 4096)...
ok 11 Check success of execveat(16, '', 4096)...
ok 12 Check failure of execveat(9, '', 0) with ENOENT
ok 13 Check failure of execveat(9, '(null)', 4096) with EFAULT
ok 14 Check success of execveat(5, 'execveat.symlink', 0)...
ok 15 Check success of execveat(7, 'execveat.symlink', 0)...
ok 16 Check success of execveat(-100,
'/home/usama/repos/ke...xec/execveat.symlink', 0)...
ok 17 Check success of execveat(11, '', 4096)...
ok 18 Check success of execveat(11, '', 4352)...
ok 19 Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP
ok 20 Check failure of execveat(7, 'execveat.symlink', 256) with ELOOP
ok 21 Check failure of execveat(-100,
'/home/usama/repos/kernel/linux_mainline/tools/testing/selftests/exec/execveat.symlink',
256) with ELOOP
ok 22 Check failure of execveat(5, 'pipe', 0) with EACCES
ok 23 Check success of execveat(3, '../script', 0)...
ok 24 Check success of execveat(5, 'script', 0)...
ok 25 Check success of execveat(7, 'script', 0)...
ok 26 Check success of execveat(-100,
'/home/usama/repos/ke...elftests/exec/script', 0)...
ok 27 Check success of execveat(14, '', 4096)...
ok 28 Check success of execveat(14, '', 4352)...
ok 29 Check failure of execveat(19, '', 4096) with ENOENT
ok 30 Check failure of execveat(8, 'script', 0) with ENOENT
ok 31 Check success of execveat(17, '', 4096)...
ok 32 Check success of execveat(17, '', 4096)...
ok 33 Check success of execveat(4, '../script', 0)...
ok 34 Check success of execveat(4, 'script', 0)...
ok 35 Check success of execveat(4, '../script', 0)...
ok 36 Check failure of execveat(4, 'script', 0) with ENOENT
ok 37 Check failure of execveat(5, 'execveat', 65535) with EINVAL
ok 38 Check failure of execveat(5, 'no-such-file', 0) with ENOENT
ok 39 Check failure of execveat(7, 'no-such-file', 0) with ENOENT
ok 40 Check failure of execveat(-100, 'no-such-file', 0) with ENOENT
ok 41 Check failure of execveat(5, '', 4096) with EACCES
ok 42 Check failure of execveat(5, 'Makefile', 0) with EACCES
ok 43 Check failure of execveat(12, '', 4096) with EACCES
ok 44 Check failure of execveat(13, '', 4096) with EACCES
ok 45 Check failure of execveat(99, '', 4096) with EBADF
ok 46 Check failure of execveat(99, 'execveat', 0) with EBADF
ok 47 Check failure of execveat(9, 'execveat', 0) with ENOTDIR
# Invoke copy of 'execveat' via filename of length 4094:
ok 48 Check success of execveat(20, '', 4096)...
# execveat() failed, rc=-1 errno=2 (No such file or directory)
not ok 49 Check success of execveat(6,
'home/usama/repos/ker...yyyyyyyyyyyyyyyyyyyy', 0)...
# child 3493092 exited with 1 not 99 nor 99
not ok 49 Check success of execveat(6,
'home/usama/repos/ker...yyyyyyyyyyyyyyyyyyyy', 0)...
# Invoke copy of 'script' via filename of length 4094:
ok 50 Check success of execveat(21, '', 4096)...
# execveat() failed, rc=-1 errno=2 (No such file or directory)
not ok 51 Check success of execveat(6,
'home/usama/repos/ker...yyyyyyyyyyyyyyyyyyyy', 0)...
# child 3493094 exited with 1 not 127 nor 126
not ok 51 Check success of execveat(6,
'home/usama/repos/ker...yyyyyyyyyyyyyyyyyyyy', 0)...
2 tests failed
# Totals: pass:49 fail:2 xfail:0 xpass:0 skip:0 error:0
--
BR,
Muhammad Usama Anjum
This patchset adds KVM selftests for LoongArch system, currently only
some common test cases are supported and pass to run. These testcase
are listed as following:
demand_paging_test
dirty_log_perf_test
dirty_log_test
guest_print_test
hardware_disable_test
kvm_binary_stats_test
kvm_create_max_vcpus
kvm_page_table_test
memslot_modification_stress_test
memslot_perf_test
set_memory_region_test
This patchset originally is posted from zhaotianrui, I continue to work
on his efforts.
---
Changes in v7:
1. Refine code to add LoongArch support in test case
set_memory_region_test.
Changes in v6:
1. Refresh the patch based on latest kernel 6.8-rc1, add LoongArch
support about testcase set_memory_region_test.
2. Add hardware_disable_test test case.
3. Drop modification about macro DEFAULT_GUEST_TEST_MEM, it is problem
of LoongArch binutils, this issue is raised to LoongArch binutils owners.
Changes in v5:
1. In LoongArch kvm self tests, the DEFAULT_GUEST_TEST_MEM could be
0x130000000, it is different from the default value in memstress.h.
So we Move the definition of DEFAULT_GUEST_TEST_MEM into LoongArch
ucall.h, and add 'ifndef' condition for DEFAULT_GUEST_TEST_MEM
in memstress.h.
Changes in v4:
1. Remove the based-on flag, as the LoongArch KVM patch series
have been accepted by Linux kernel, so this can be applied directly
in kernel.
Changes in v3:
1. Improve implementation of LoongArch VM page walk.
2. Add exception handler for LoongArch.
3. Add dirty_log_test, dirty_log_perf_test, guest_print_test
test cases for LoongArch.
4. Add __ASSEMBLER__ macro to distinguish asm file and c file.
5. Move ucall_arch_do_ucall to the header file and make it as
static inline to avoid function calls.
6. Change the DEFAULT_GUEST_TEST_MEM base addr for LoongArch.
Changes in v2:
1. We should use ".balign 4096" to align the assemble code with 4K in
exception.S instead of "align 12".
2. LoongArch only supports 3 or 4 levels page tables, so we remove the
hanlders for 2-levels page table.
3. Remove the DEFAULT_LOONGARCH_GUEST_STACK_VADDR_MIN and use the common
DEFAULT_GUEST_STACK_VADDR_MIN to allocate stack memory in guest.
4. Reorganize the test cases supported by LoongArch.
5. Fix some code comments.
6. Add kvm_binary_stats_test test case into LoongArch KVM selftests.
---
Tianrui Zhao (4):
KVM: selftests: Add KVM selftests header files for LoongArch
KVM: selftests: Add core KVM selftests support for LoongArch
KVM: selftests: Add ucall test support for LoongArch
KVM: selftests: Add test cases for LoongArch
tools/testing/selftests/kvm/Makefile | 16 +
.../selftests/kvm/include/kvm_util_base.h | 5 +
.../kvm/include/loongarch/processor.h | 133 +++++++
.../selftests/kvm/include/loongarch/ucall.h | 20 ++
.../selftests/kvm/lib/loongarch/exception.S | 59 ++++
.../selftests/kvm/lib/loongarch/processor.c | 332 ++++++++++++++++++
.../selftests/kvm/lib/loongarch/ucall.c | 38 ++
.../selftests/kvm/set_memory_region_test.c | 2 +-
8 files changed, 604 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/include/loongarch/processor.h
create mode 100644 tools/testing/selftests/kvm/include/loongarch/ucall.h
create mode 100644 tools/testing/selftests/kvm/lib/loongarch/exception.S
create mode 100644 tools/testing/selftests/kvm/lib/loongarch/processor.c
create mode 100644 tools/testing/selftests/kvm/lib/loongarch/ucall.c
base-commit: 6764c317b6bb91bd806ef79adf6d9c0e428b191e
--
2.39.3
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..01c0f4b1a8c2 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,22 @@ verify_data() {
fi
}
+wait_for_port() {
+ local digits=8
+ local port2check=$(printf ":%04X" $1)
+ local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
+
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
+ sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
+ grep -q "${port2check}"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +208,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +220,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.34.1
Hi Linus,
Please pull these small execve updates for v6.9-rc1. Details below.
Thanks!
-Kees
The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/execve-v6.9-rc1
for you to fetch changes up to 725d50261285ccf02501f2a1a6d10b31ce014597:
exec: Simplify remove_arg_zero() error path (2024-03-09 13:46:30 -0800)
----------------------------------------------------------------
execve updates for v6.9-rc1
- Drop needless error path code in remove_arg_zero() (Li kunyu, Kees Cook)
- binfmt_elf_efpic: Don't use missing interpreter's properties (Max Filippov)
- Use /bin/bash for execveat selftests
----------------------------------------------------------------
Kees Cook (2):
selftests/exec: Perform script checks with /bin/bash
exec: Simplify remove_arg_zero() error path
Li kunyu (1):
exec: Delete unnecessary statements in remove_arg_zero()
Max Filippov (1):
fs: binfmt_elf_efpic: don't use missing interpreter's properties
fs/binfmt_elf_fdpic.c | 2 +-
fs/exec.c | 11 +++--------
tools/testing/selftests/exec/execveat.c | 2 +-
3 files changed, 5 insertions(+), 10 deletions(-)
--
Kees Cook
Hi,
Routine run of the test in net-next gave also this mm unit error.
root@defiant:tools/testing/selftests/mm# ./uffd-unit-tests
Testing UFFDIO_API (with syscall)... done
Testing UFFDIO_API (with /dev/userfaultfd)... done
Testing register-ioctls on anon... done
Testing register-ioctls on shmem... done
Testing register-ioctls on shmem-private... done
Testing register-ioctls on hugetlb... skipped [reason: memory allocation failed]
Testing register-ioctls on hugetlb-private... skipped [reason: memory allocation failed]
Testing zeropage on anon... done
Testing zeropage on shmem... done
Testing zeropage on shmem-private... done
Testing zeropage on hugetlb... skipped [reason: memory allocation failed]
Testing zeropage on hugetlb-private... skipped [reason: memory allocation failed]
Testing move on anon... done
Testing move-pmd on anon... done
Testing move-pmd-split on anon... done
Testing wp-fork on anon... done
Testing wp-fork on shmem... done
Testing wp-fork on shmem-private... done
Testing wp-fork on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-fork-with-event on anon... done
Testing wp-fork-with-event on shmem... done
Testing wp-fork-with-event on shmem-private... done
Testing wp-fork-with-event on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork-with-event on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-fork-pin on anon... done
Testing wp-fork-pin on shmem... done
Testing wp-fork-pin on shmem-private... done
Testing wp-fork-pin on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork-pin on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-fork-pin-with-event on anon... done
Testing wp-fork-pin-with-event on shmem... done
Testing wp-fork-pin-with-event on shmem-private... done
Testing wp-fork-pin-with-event on hugetlb... skipped [reason: memory allocation failed]
Testing wp-fork-pin-with-event on hugetlb-private... skipped [reason: memory allocation failed]
Testing wp-unpopulated on anon... done
Testing minor on shmem... done
Testing minor on hugetlb... skipped [reason: memory allocation failed]
Testing minor-wp on shmem... done
Testing minor-wp on hugetlb... skipped [reason: memory allocation failed]
Testing minor-collapse on shmem... done
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... skipped [reason: memory allocation failed]
Testing sigbus on hugetlb-private... skipped [reason: memory allocation failed]
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... skipped [reason: memory allocation failed]
Testing sigbus-wp on hugetlb-private... skipped [reason: memory allocation failed]
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... skipped [reason: memory allocation failed]
Testing events on hugetlb-private... skipped [reason: memory allocation failed]
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... skipped [reason: memory allocation failed]
Testing events-wp on hugetlb-private... skipped [reason: memory allocation failed]
Testing poison on anon... done
Testing poison on shmem... done
Testing poison on shmem-private... done
Testing poison on hugetlb... skipped [reason: memory allocation failed]
Testing poison on hugetlb-private... skipped [reason: memory allocation failed]
Userfaults unit tests: pass=42, skip=24, fail=0 (total=66)
root@defiant:tools/testing/selftests/mm# grep -i huge /proc/meminfo
It resulted in alarming errors in the syslog:
Mar 9 19:48:24 defiant kernel: [77187.055103] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4631e000
Mar 9 19:48:24 defiant kernel: [77187.055132] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46320000
Mar 9 19:48:24 defiant kernel: [77187.055160] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46322000
Mar 9 19:48:24 defiant kernel: [77187.055189] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46324000
Mar 9 19:48:24 defiant kernel: [77187.055218] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46326000
Mar 9 19:48:24 defiant kernel: [77187.055250] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46328000
Mar 9 19:48:24 defiant kernel: [77187.055278] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632a000
Mar 9 19:48:24 defiant kernel: [77187.055307] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632c000
Mar 9 19:48:24 defiant kernel: [77187.055336] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4632e000
Mar 9 19:48:24 defiant kernel: [77187.055366] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46330000
Mar 9 19:48:24 defiant kernel: [77187.055395] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46332000
Mar 9 19:48:24 defiant kernel: [77187.055423] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46334000
Mar 9 19:48:24 defiant kernel: [77187.055452] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46336000
Mar 9 19:48:24 defiant kernel: [77187.055480] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46338000
Mar 9 19:48:24 defiant kernel: [77187.055509] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633a000
Mar 9 19:48:24 defiant kernel: [77187.055538] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633c000
Mar 9 19:48:24 defiant kernel: [77187.055567] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 4633e000
Mar 9 19:48:24 defiant kernel: [77187.055597] MCE: Killing uffd-unit-tests:1321817 due to hardware memory corruption fault at 46340000
At this point, it can be problem with my box's memory chips, or something with HUGETLB.
However, since the "classic" allocations were successful, the problem might be in huge pages, or
if I understood well, in deliberate poisoning of pages?
Please also find strace of the run.
Best regards,
Mirsad Todorovac
When using gcc without cross compiling, i.e., `CROSS_COMPILE` unset or
empty, the selftests build defaults to the host architecture, i.e., it uses
plain gcc. However, when compiling with clang an unset `ARCH` variable in
combination with an unset `CROSS_COMPILE` variable, i.e., compiling for
the host architecture, leads to compilation failures since `lib.mk` can
not determine the clang target triple. In this case, the following error
message is displayed for each subsystem that does not set `ARCH` in its
own Makefile before including `lib.mk` (lines wrapped at 75 chrs):
make[1]: Entering directory '/mnt/build/linux/tools/testing/selftests/
sysctl'
../lib.mk:33: *** Specify CROSS_COMPILE or add '--target=' option to
lib.mk. Stop.
make[1]: Leaving directory '/mnt/build/linux/tools/testing/selftests/
sysctl'
Align the behavior for gcc and clang builds by interpreting unset
`ARCH` and `CROSS_COMPILE` variables in `LLVM` builds as a sign that the
user wants to build for the host architecture.
This preserves the property that setting the `ARCH` variable to an
unknown value will trigger an error that complains about insufficient
information.
RFC since I am not entirely sure if this behavior is in fact known and
intended, and whether the way to obtain the host target triple is
sufficiently general. (The flag was introduced in llvm-8 with [1], it
will be an error for older clang versions, however, currently 13.0.1 is the
minimal version required to build the kernel. For some clang binaries it
prints the 'unknown' instead of the 'linux' version of the target, e.g.,
mips [2]). An alternative could be to simply do:
ARCH ?= $(shell uname -m)
before using it to select the target. Possibly with some post processing,
but at that point we would likely be replicating `scripts/subarch.include`.
Also unsure if it needs a 'Fixes: 795285ef2425 ("selftests: Fix clang
cross compilation")'. Furthermore, this change might make it possible to
remove the explicit setting of `ARCH` from the few subsystem Makefiles
that do it.
Would be happy to get some feedback on those points. If it looks OK I
can also send it as a patch.
Link: https://reviews.llvm.org/D50755 [1]
Link: https://godbolt.org/z/r7Gn9bvv1 [2]
Signed-off-by: Valentin Obst <kernel(a)valentinobst.de>
---
tools/testing/selftests/lib.mk | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index aa646e0661f3..a8f0442a36bc 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -7,6 +7,8 @@ else ifneq ($(filter -%,$(LLVM)),)
LLVM_SUFFIX := $(LLVM)
endif
+CLANG := $(LLVM_PREFIX)clang$(LLVM_SUFFIX)
+
CLANG_TARGET_FLAGS_arm := arm-linux-gnueabi
CLANG_TARGET_FLAGS_arm64 := aarch64-linux-gnu
CLANG_TARGET_FLAGS_hexagon := hexagon-linux-musl
@@ -18,7 +20,13 @@ CLANG_TARGET_FLAGS_riscv := riscv64-linux-gnu
CLANG_TARGET_FLAGS_s390 := s390x-linux-gnu
CLANG_TARGET_FLAGS_x86 := x86_64-linux-gnu
CLANG_TARGET_FLAGS_x86_64 := x86_64-linux-gnu
-CLANG_TARGET_FLAGS := $(CLANG_TARGET_FLAGS_$(ARCH))
+
+# Default to host architecture if ARCH is not explicitly given.
+ifeq ($(ARCH),)
+CLANG_TARGET_FLAGS := $(shell $(CLANG) -print-target-triple)
+else
+CLANG_TARGET_FLAGS := $(CLANG_TARGET_FLAGS_$(ARCH))
+endif
ifeq ($(CROSS_COMPILE),)
ifeq ($(CLANG_TARGET_FLAGS),)
@@ -30,7 +38,7 @@ else
CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))
endif # CROSS_COMPILE
-CC := $(LLVM_PREFIX)clang$(LLVM_SUFFIX) $(CLANG_FLAGS) -fintegrated-as
+CC := $(CLANG) $(CLANG_FLAGS) -fintegrated-as
else
CC := $(CROSS_COMPILE)gcc
endif # LLVM
---
base-commit: d206a76d7d2726f3b096037f2079ce0bd3ba329b
change-id: 20240303-selftests-libmk-llvm-rfc-5fe3cfa9f094
Best regards,
--
Valentin Obst <kernel(a)valentinobst.de>
In this series from Geliang, there are various improvements in MPTCP
selftests: sharing code, doing actions the same way, colours, etc.
Patch 1 prints all error messages to stdout: what was done in almost all
other MPTCP selftests. This can be now easily changed later if needed.
Patch 2 makes sure the test counter is continuous in mptcp_connect.sh.
Patch 3 aligns the messages that are printed in mptcp_connect.sh.
Patch 4 prints each test results in mptcp_sockopt.sh, similar to what we
have in the TAP output.
Patch 5 moves the different test counters to a single one in
mptcp_lib.sh, to uniform how it is used.
Patch 6 moves how titles are printed from mptcp_join.sh to the lib, to
be reused in patch 7 by all other MPTCP selftests.
Patch 8 uses the '+=' operator to append strings instead of repeating
twice the variable name: that's shorter, easier to read.
Patch 9 adds colours for the [ OK ], [SKIP], [FAIL] and INFO keywords in
all MPTCP selftests.
Patch 10 to 12 are some preparation patches for patch 13: patch 10
modifies how some 'test_fail' helpers, patch 11 moves a helper from
userspace_pm.sh to the lib, and patch 12 changes where titles are
printed in userspace_pm.sh. Patch 13 moves some duplicated helpers from
mptcp_join.sh and userspace_pm.sh to mptcp_lib.sh.
Patch 14 moves duplicated read-only variables from mptcp_join.sh and
userspace_pm.sh to mptcp_lib.sh as well.
Patch 15 uses explicit variables instead of hard-coded numbers for the
exit status.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (15):
selftests: mptcp: print all error messages to stdout
selftests: mptcp: connect: add dedicated port counter
selftests: mptcp: connect: fix misaligned output
selftests: mptcp: sockopt: print every test result
selftests: mptcp: export TEST_COUNTER variable
selftests: mptcp: add print_title in mptcp_lib
selftests: mptcp: print test results with counters
selftests: mptcp: use += operator to append strings
selftests: mptcp: print test results with colors
selftests: mptcp: call test_fail without argument
selftests: mptcp: extract mptcp_lib_check_expected
selftests: mptcp: print_test out of verify_listener_events
selftests: mptcp: add mptcp_lib_verify_listener_events
selftests: mptcp: declare event macros in mptcp_lib
selftests: mptcp: use KSFT_SKIP/KSFT_PASS/KSFT_FAIL
tools/testing/selftests/net/mptcp/diag.sh | 19 ++-
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 145 +++++++++++----------
tools/testing/selftests/net/mptcp/mptcp_join.sh | 120 +++++++----------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 113 ++++++++++++++--
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 53 ++++----
tools/testing/selftests/net/mptcp/pm_netlink.sh | 13 +-
tools/testing/selftests/net/mptcp/simult_flows.sh | 18 +--
tools/testing/selftests/net/mptcp/userspace_pm.sh | 117 +++++------------
8 files changed, 312 insertions(+), 286 deletions(-)
---
base-commit: 19cfdc0d57696c92523da8eb26c0f3e092400bee
change-id: 20240308-upstream-net-next-20240308-selftests-mptcp-unification-6df178cc8f6a
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch series introduces a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.
This was previously submitted as an RFC [1]. Since there were no major changes
requested to the last RFC revision, I've stripped the RFC prefix.
[1] https://lore.kernel.org/lkml/20240131021356.10322-1-zfigura@codeweavers.com/
== Background ==
The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.
The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.
This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:
https://www.youtube.com/watch?v=NjU4nyWyhU8
== Performance ==
The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuils, OnMars, and myself:
Game Upstream ntsync improvement
===========================================================================
Anger Foot 69 99 43%
Call of Juarez 99.8 224.1 125%
Dirt 3 110.6 860.7 678%
Forza Horizon 5 108 160 48%
Lara Croft: Temple of Osiris 141 326 131%
Metro 2033 164.4 199.2 21%
Resident Evil 2 26 77 196%
The Crew 26 51 96%
Tiny Tina's Wonderlands 130 360 177%
Total War Saga: Troy 109 146 34%
===========================================================================
== Patches ==
The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. For those not already familiar with the Windows
functions (or their undocumented behaviour), patch 31/31 provides a detailed
specification, and individual patches also include a brief description of the
API they are implementing.
The patches making use of this driver in Wine can be retrieved or browsed here:
https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync5
== Implementation ==
Some aspects of the implementation may deserve particular comment:
* In the interest of performance, each object is governed only by a single
spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
objects be changed as a single atomic operation. In order to achieve this, we
first take a device-wide lock ("wait_all_lock") any time we are going to lock
more than one object at a time.
The maximum number of objects that can be used in a vectored wait, and
therefore the maximum that can be locked simultaneously, is 64. This number is
NT's own limit.
The acquisition of multiple spinlocks will degrade performance. This is a
conscious choice, however. Wait-for-all is known to be a very rare operation
in practice, especially with counts that approach the maximum, and it is the
intent of the ntsync driver to optimize wait-for-any at the expense of
wait-for-all as much as possible.
* NT mutexes are tied to their threads on an OS level, and the kernel includes
builtin support for "robust" mutexes. In order to keep the ntsync driver
self-contained and avoid touching more code than necessary, it does not hook
into task exit nor use pids.
Instead, the user space emulator is expected to manage thread IDs and pass
them as an argument to any relevant functions; this is the "owner" field of
ntsync_wait_args and ntsync_mutex_args.
When the emulator detects that a thread dies, it should therefore call
NTSYNC_IOC_MUTEX_KILL on any open mutexes.
* ntsync is module-capable mostly because there was nothing preventing it, and
because it aided development. It is not a hard requirement, though.
== Previous versions ==
Changes from v1:
* Fix a broken rebase that stole part of the Kconfig documentation from the
neighbouring entry, per Randy Dunlap.
* Add my email address to copyright and MODULE_AUTHOR lines, per Randy Dunlap.
* Document the reference counting behaviour more clearly, per Greg
Kroah-Hartman.
* Hopefully submit all the patches this time the right way.
* Link to v1: https://lore.kernel.org/lkml/20240214233645.9273-1-zfigura@codeweavers.com/
* Link to RFC v2: https://lore.kernel.org/lkml/20240131021356.10322-1-zfigura@codeweavers.com/
* Link to RFC v1: https://lore.kernel.org/lkml/20240124004028.16826-1-zfigura@codeweavers.com/
Elizabeth Figura (31):
ntsync: Introduce the ntsync driver and character device.
ntsync: Introduce NTSYNC_IOC_CREATE_SEM.
ntsync: Introduce NTSYNC_IOC_SEM_POST.
ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
ntsync: Introduce NTSYNC_IOC_EVENT_SET.
ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
ntsync: Introduce NTSYNC_IOC_SEM_READ.
ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
ntsync: Introduce NTSYNC_IOC_EVENT_READ.
ntsync: Introduce alertable waits.
ntsync: Allow waits to use the REALTIME clock.
selftests: ntsync: Add some tests for semaphore state.
selftests: ntsync: Add some tests for mutex state.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for manual-reset event state.
selftests: ntsync: Add some tests for auto-reset event state.
selftests: ntsync: Add some tests for wakeup signaling with events.
selftests: ntsync: Add tests for alertable waits.
selftests: ntsync: Add some tests for wakeup signaling via alerts.
selftests: ntsync: Add a stress test for contended waits.
maintainers: Add an entry for ntsync.
docs: ntsync: Add documentation for the ntsync uAPI.
Elizabeth Figura (31):
ntsync: Introduce the ntsync driver and character device.
ntsync: Introduce NTSYNC_IOC_CREATE_SEM.
ntsync: Introduce NTSYNC_IOC_SEM_POST.
ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
ntsync: Introduce NTSYNC_IOC_EVENT_SET.
ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
ntsync: Introduce NTSYNC_IOC_SEM_READ.
ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
ntsync: Introduce NTSYNC_IOC_EVENT_READ.
ntsync: Introduce alertable waits.
ntsync: Allow waits to use the REALTIME clock.
selftests: ntsync: Add some tests for semaphore state.
selftests: ntsync: Add some tests for mutex state.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for manual-reset event state.
selftests: ntsync: Add some tests for auto-reset event state.
selftests: ntsync: Add some tests for wakeup signaling with events.
selftests: ntsync: Add tests for alertable waits.
selftests: ntsync: Add some tests for wakeup signaling via alerts.
selftests: ntsync: Add a stress test for contended waits.
maintainers: Add an entry for ntsync.
docs: ntsync: Add documentation for the ntsync uAPI.
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 2 +
Documentation/userspace-api/ntsync.rst | 399 +++++
MAINTAINERS | 9 +
drivers/misc/Kconfig | 11 +
drivers/misc/Makefile | 1 +
drivers/misc/ntsync.c | 1159 ++++++++++++++
include/uapi/linux/ntsync.h | 62 +
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/drivers/ntsync/Makefile | 8 +
tools/testing/selftests/drivers/ntsync/config | 1 +
.../testing/selftests/drivers/ntsync/ntsync.c | 1407 +++++++++++++++++
12 files changed, 3061 insertions(+)
create mode 100644 Documentation/userspace-api/ntsync.rst
create mode 100644 drivers/misc/ntsync.c
create mode 100644 include/uapi/linux/ntsync.h
create mode 100644 tools/testing/selftests/drivers/ntsync/Makefile
create mode 100644 tools/testing/selftests/drivers/ntsync/config
create mode 100644 tools/testing/selftests/drivers/ntsync/ntsync.c
base-commit: 8d11c6d9b14f7a87f65529cb33edc5fed846ed9d
--
2.43.0
Hi Linus,
Please pull the following KUnit next update for Linux 6.8-rc1.
This KUnit next update for Linux 6.9-rc1 consists of:
-- fix to make kunit_bus_type const
-- kunit tool change to Print UML command
-- DRM device creation helpers are now using the new kunit device
creation helpers. This change resulted in DRM helpers switching
from using a platform_device, to a dedicated bus and device type
used by kunit. kunit devices don't set DMA mask and this caused
regression on some drm tests as they can't allocate DMA buffers.
Fix this problem by setting DMA masks on the kunit device during
initialization.
-- KUnit has several macros which accept a log message, which can
contain printf format specifiers. Some of these (the explicit
log macros) already use the __printf() gcc attribute to ensure
the format specifiers are valid, but those which could fail the
test, and hence used __kunit_do_failed_assertion() behind the scenes,
did not.
These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and
KUNIT_FAIL()
A 9 patch series adds the __printf() attribute, and fixes all of
the issues uncovered.
Note:
make allmodconfig x86 passed passed for me on March 4th linux-next
(This could be with Stephen's fix up for the following problem).
Stephen found a problem in drivers/gpu/drm/tests/drm_buddy_test.c
Caused by commit
806cb2270237 ("kunit: Annotate _MSG assertion variants with gnu printf specifiers")
interacting with commit
c70703320e55 ("drm/tests/drm_buddy: add alloc_range_bias test")
from the drm-misc-fixes tree.
Stephen found that the problem is not with the format string types,
but with a missing argument i.e. there is another argument required
by the format string.
It is easier to fix this problem in the drm-misc-fixes tree.
The hope is that the fix to the problem has been sent to you or will
be sent to you before the merge.
If not Stephen's fix up will be necessary.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit d206a76d7d2726f3b096037f2079ce0bd3ba329b:
Linux 6.8-rc6 (2024-02-25 15:46:06 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-6.9-rc1
for you to fetch changes up to 806cb2270237ce2ec672a407d66cee17a07d3aa2:
kunit: Annotate _MSG assertion variants with gnu printf specifiers (2024-02-28 13:07:49 -0700)
----------------------------------------------------------------
linux_kselftest-kunit-6.9-rc1
This KUnit next update for Linux 6.9-rc1 consists of:
-- fix to make kunit_bus_type const
-- kunit tool change to Print UML command
-- DRM device creation helpers are now using the new kunit device
creation helpers. This change resulted in DRM helpers switching
from using a platform_device, to a dedicated bus and device type
used by kunit. kunit devices don't set DMA mask and this caused
regression on some drm tests as they can't allocate DMA buffers.
Fix this problem by setting DMA masks on the kunit device during
initialization.
-- KUnit has several macros which accept a log message, which can
contain printf format specifiers. Some of these (the explicit
log macros) already use the __printf() gcc attribute to ensure
the format specifiers are valid, but those which could fail the
test, and hence used __kunit_do_failed_assertion() behind the scenes,
did not.
These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and
KUNIT_FAIL()
A 9 patch series adds the __printf() attribute, and fixes all of
the issues uncovered.
----------------------------------------------------------------
David Gow (9):
kunit: test: Log the correct filter string in executor_test
lib/cmdline: Fix an invalid format specifier in an assertion msg
lib: memcpy_kunit: Fix an invalid format specifier in an assertion msg
time: test: Fix incorrect format specifier
rtc: test: Fix invalid format specifier.
net: test: Fix printf format specifier in skb_segment kunit test
drm/xe/tests: Fix printf format specifiers in xe_migrate test
drm: tests: Fix invalid printf format specifiers in KUnit tests
kunit: Annotate _MSG assertion variants with gnu printf specifiers
Lucas De Marchi (1):
kunit: Mark filter* params as rw
Maxime Ripard (1):
kunit: Setup DMA masks on the kunit device
Mickaël Salaün (1):
kunit: tool: Print UML command
Ricardo B. Marliere (1):
kunit: make kunit_bus_type const
drivers/gpu/drm/tests/drm_buddy_test.c | 14 +++++++-------
drivers/gpu/drm/tests/drm_mm_test.c | 6 +++---
drivers/gpu/drm/xe/tests/xe_migrate.c | 8 ++++----
drivers/rtc/lib_test.c | 2 +-
include/kunit/test.h | 12 ++++++------
kernel/time/time_test.c | 2 +-
lib/cmdline_kunit.c | 2 +-
lib/kunit/device.c | 6 +++++-
lib/kunit/executor.c | 6 +++---
lib/kunit/executor_test.c | 2 +-
lib/memcpy_kunit.c | 4 ++--
net/core/gso_test.c | 2 +-
tools/testing/kunit/kunit_kernel.py | 1 +
13 files changed, 36 insertions(+), 31 deletions(-)
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 6.9-rc1.
This kselftest next update for Linux 6.9-rc1 consists of:
-- livepatch restructuring to move the module out of lib to be
built as a out-of-tree modules during kselftest build. This
change makes it easier change, debug and rebuild the tests by
running make on the selftests/livepatch directory, which is not
currently possible since the modules on lib/livepatch are build
and installed using the main makefile modules target.
-- livepatch restructuring fixes for problems found by kernel test
robot. The change skips the test if kernel-devel isn't installed
(default value of KDIR), or if KDIR variable passed doesn't exists.
-- resctrl test restructuring and new non-contiguous CBMs CAT test
-- new ktap_helpers to print diagnostic messages, pass/fail tests
based on exit code, abort test, and finish the test.
-- a new test verify power supply properties.
-- a new ftrace to exercise function tracer across cpu hotplug.
-- timeout increase for mqueue test to allow the test to run on
i3.metal AWS instances.
-- minor spelling corrections in several tests.
-- missing gitignore files and changes to existing gitignore files.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 6613476e225e090cc9aad49be7fa504e290dd33d:
Linux 6.8-rc1 (2024-01-21 14:11:32 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.9-rc1
for you to fetch changes up to 5d94da7ff00ef45737a64d947e7ff45aca972782:
kselftest: Add basic test for probing the rust sample modules (2024-03-04 13:13:04 -0700)
----------------------------------------------------------------
linux_kselftest-next-6.9-rc1
This kselftest next update for Linux 6.9-rc1 consists of:
-- livepatch restructuring to move the module out of lib to be
built as a out-of-tree modules during kselftest build. This
change makes it easier change, debug and rebuild the tests by
running make on the selftests/livepatch directory, which is not
currently possible since the modules on lib/livepatch are build
and installed using the main makefile modules target.
-- livepatch restructuring fixes for problems found by kernel test
robot. The change skips the test if kernel-devel isn't installed
(default value of KDIR), or if KDIR variable passed doesn't exists.
-- resctrl test restructuring and new non-contiguous CBMs CAT test
-- new ktap_helpers to print diagnostic messages, pass/fail tests
based on exit code, abort test, and finish the test.
-- a new test verify power supply properties.
-- a new ftrace to exercise function tracer across cpu hotplug.
-- timeout increase for mqueue test to allow the test to run on
i3.metal AWS instances.
-- minor spelling corrections in several tests.
-- missing gitignore files and changes to existing gitignore files.
----------------------------------------------------------------
Ali Zahraee (1):
selftests: ftrace: fix typo in test description
Colin Ian King (1):
selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy"
Ilpo Järvinen (30):
selftests/resctrl: Convert perror() to ksft_perror() or ksft_print_msg()
selftests/resctrl: Return -1 instead of errno on error
selftests/resctrl: Don't use ctrlc_handler() outside signal handling
selftests/resctrl: Change function comments to say < 0 on error
selftests/resctrl: Split fill_buf to allow tests finer-grained control
selftests/resctrl: Refactor fill_buf functions
selftests/resctrl: Refactor get_cbm_mask() and rename to get_full_cbm()
selftests/resctrl: Mark get_cache_size() cache_type const
selftests/resctrl: Create cache_portion_size() helper
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Split measure_cache_vals()
selftests/resctrl: Split show_cache_info() to test specific and generic parts
selftests/resctrl: Remove unnecessary __u64 -> unsigned long conversion
selftests/resctrl: Remove nested calls in perf event handling
selftests/resctrl: Consolidate naming of perf event related things
selftests/resctrl: Improve perf init
selftests/resctrl: Convert perf related globals to locals
selftests/resctrl: Move cat_val() to cat_test.c and rename to cat_test()
selftests/resctrl: Open perf fd before start & add error handling
selftests/resctrl: Replace file write with volatile variable
selftests/resctrl: Read in less obvious order to defeat prefetch optimizations
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
selftests/resctrl: Restore the CPU affinity after CAT test
selftests/resctrl: Create struct for input parameters
selftests/resctrl: Introduce generalized test framework
selftests/resctrl: Pass write_schemata() resource instead of test name
selftests/resctrl: Add helper to convert L2/3 to integer
selftests/resctrl: Rename resource ID to domain ID
selftests/resctrl: Get domain id from cache id
selftests/resctrl: Add test groups and name L3 CAT test L3_CAT
Javier Carrasco (3):
selftests: uevent: add missing gitignore
selftests: thermal: intel: power_floor: add missing gitignore
selftests: thermal: intel: workload_hint: add missing gitignore
Kousik Sanagavarapu (1):
selftest/ftrace: fix typo in ftracetest script
Laura Nao (2):
selftests: Move KTAP bash helpers to selftests common folder
kselftest: Add basic test for probing the rust sample modules
Maciej Wieczor-Retman (4):
selftests/resctrl: Add a helper for the non-contiguous test
selftests/resctrl: Split validate_resctrl_feature_request()
selftests/resctrl: Add resource_info_file_exists()
selftests/resctrl: Add non-contiguous CBMs CAT test
Marcos Paulo de Souza (6):
kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
selftests: livepatch: Add initial .gitignore
selftests: livepatch: Avoid running the tests if kernel-devel is missing
selftests: lib.mk: Do not process TEST_GEN_MODS_DIR
Mark Brown (1):
selftests: fuxex: Report a unique test name per run of futex_requeue_pi
Naveen N Rao (1):
selftests/ftrace: Add test to exercize function tracer across cpu hotplug
Nícolas F. R. A. Prado (5):
selftests: ktap_helpers: Add helper to print diagnostic messages
selftests: ktap_helpers: Add helper to pass/fail test based on exit code
selftests: ktap_helpers: Add a helper to abort the test
selftests: ktap_helpers: Add a helper to finish the test
selftests: Add test to verify power supply properties
SeongJae Park (1):
selftests/mqueue: Set timeout to 180 seconds
Vincenzo Mezzela (1):
selftest: ftrace: fix minor typo in log
Documentation/dev-tools/kselftest.rst | 4 +
MAINTAINERS | 3 +-
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 --
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 -
tools/testing/selftests/Makefile | 3 +
tools/testing/selftests/dt/Makefile | 2 +-
.../testing/selftests/dt/test_unprobed_devices.sh | 6 +-
tools/testing/selftests/ftrace/ftracetest | 2 +-
.../ftrace/test.d/00basic/test_ownership.tc | 2 +-
.../selftests/ftrace/test.d/ftrace/func_hotplug.tc | 42 ++
.../ftrace/test.d/trigger/trigger-hist-mod.tc | 2 +-
.../selftests/futex/functional/futex_requeue_pi.c | 13 +-
.../selftests/{dt => kselftest}/ktap_helpers.sh | 45 ++-
tools/testing/selftests/lib.mk | 23 +-
tools/testing/selftests/livepatch/.gitignore | 1 +
tools/testing/selftests/livepatch/Makefile | 5 +-
tools/testing/selftests/livepatch/README | 25 +-
tools/testing/selftests/livepatch/config | 1 -
tools/testing/selftests/livepatch/functions.sh | 47 ++-
.../testing/selftests/livepatch/test-callbacks.sh | 50 +--
tools/testing/selftests/livepatch/test-ftrace.sh | 6 +-
.../testing/selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
tools/testing/selftests/livepatch/test-state.sh | 18 +-
tools/testing/selftests/livepatch/test-syscall.sh | 53 +++
tools/testing/selftests/livepatch/test-sysfs.sh | 6 +-
.../selftests/livepatch/test_klp-call_getpid.c | 44 +++
.../selftests/livepatch/test_modules/Makefile | 26 ++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../livepatch/test_modules}/test_klp_livepatch.c | 0
.../livepatch/test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 116 ++++++
tools/testing/selftests/mqueue/setting | 1 +
tools/testing/selftests/power_supply/Makefile | 4 +
tools/testing/selftests/power_supply/helpers.sh | 178 +++++++++
.../power_supply/test_power_supply_properties.sh | 114 ++++++
tools/testing/selftests/resctrl/cache.c | 287 ++++----------
tools/testing/selftests/resctrl/cat_test.c | 421 +++++++++++++++------
tools/testing/selftests/resctrl/cmt_test.c | 80 +++-
tools/testing/selftests/resctrl/fill_buf.c | 132 +++----
tools/testing/selftests/resctrl/mba_test.c | 30 +-
tools/testing/selftests/resctrl/mbm_test.c | 34 +-
tools/testing/selftests/resctrl/resctrl.h | 145 +++++--
tools/testing/selftests/resctrl/resctrl_tests.c | 207 +++++-----
tools/testing/selftests/resctrl/resctrl_val.c | 138 ++++---
tools/testing/selftests/resctrl/resctrlfs.c | 405 ++++++++++++++------
tools/testing/selftests/rust/Makefile | 4 +
tools/testing/selftests/rust/config | 5 +
tools/testing/selftests/rust/test_probe_samples.sh | 41 ++
tools/testing/selftests/sched/cs_prctl_test.c | 2 +-
.../selftests/thermal/intel/power_floor/.gitignore | 1 +
.../thermal/intel/workload_hint/.gitignore | 1 +
tools/testing/selftests/uevent/.gitignore | 1 +
63 files changed, 1945 insertions(+), 883 deletions(-)
delete mode 100644 lib/livepatch/Makefile
create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
rename tools/testing/selftests/{dt => kselftest}/ktap_helpers.sh (66%)
create mode 100644 tools/testing/selftests/livepatch/.gitignore
create mode 100755 tools/testing/selftests/livepatch/test-syscall.sh
create mode 100644 tools/testing/selftests/livepatch/test_klp-call_getpid.c
create mode 100644 tools/testing/selftests/livepatch/test_modules/Makefile
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_atomic_replace.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_busy.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_demo.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_demo2.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_mod.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_livepatch.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_shadow_vars.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state2.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state3.c (100%)
create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
create mode 100644 tools/testing/selftests/mqueue/setting
create mode 100644 tools/testing/selftests/power_supply/Makefile
create mode 100644 tools/testing/selftests/power_supply/helpers.sh
create mode 100755 tools/testing/selftests/power_supply/test_power_supply_properties.sh
create mode 100644 tools/testing/selftests/rust/Makefile
create mode 100644 tools/testing/selftests/rust/config
create mode 100755 tools/testing/selftests/rust/test_probe_samples.sh
create mode 100644 tools/testing/selftests/thermal/intel/power_floor/.gitignore
create mode 100644 tools/testing/selftests/thermal/intel/workload_hint/.gitignore
create mode 100644 tools/testing/selftests/uevent/.gitignore
----------------------------------------------------------------
Hi,
In the vanilla net-next tree build of v6.8-rc7-2348-g75c2946db360, with up-to-date
iproute2 built tools, fcnal-test.sh reports certain failures:
--------------------------------------------------------------------------------------
# TEST: ping local, VRF bind - VRF IP [FAIL]
# TEST: ping local, device bind - ns-A IP [FAIL]
# TEST: ping local, VRF bind - VRF IP [FAIL]
# TEST: ping local, device bind - ns-A IP [FAIL]
--------------------------------------------------------------------------------------
Please find the config and the complete fncal-test.log attached.
The environment is Ubuntu 22.04 hwe.
There is also some simultaneous error report in syslog:
Mar 9 18:44:57 defiant kernel: [73380.243954] TCP: Unexpected MD5 Hash found for 172.16.1.2.42448->172.16.1.1.12345 [S]
Mar 9 18:44:58 defiant kernel: [73381.273950] TCP: Unexpected MD5 Hash found for 172.16.1.2.42448->172.16.1.1.12345 [S]
Mar 9 18:44:59 defiant kernel: [73382.297899] TCP: Unexpected MD5 Hash found for 172.16.1.2.42448->172.16.1.1.12345 [S]
Mar 9 18:45:00 defiant kernel: [73383.325779] TCP: Unexpected MD5 Hash found for 172.16.1.2.42448->172.16.1.1.12345 [S]
Mar 9 18:45:05 defiant kernel: [73388.295882] TCP: MD5 Hash failed for 172.16.1.2.53330->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:06 defiant kernel: [73389.305408] TCP: MD5 Hash failed for 172.16.1.2.53330->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:07 defiant kernel: [73390.329303] TCP: MD5 Hash failed for 172.16.1.2.53330->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:08 defiant kernel: [73391.353347] TCP: MD5 Hash failed for 172.16.1.2.53330->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:13 defiant kernel: [73396.344291] TCP: Unexpected MD5 Hash found for 172.16.1.2.35884->172.16.1.1.12345 [S]
Mar 9 18:45:14 defiant kernel: [73397.368916] TCP: Unexpected MD5 Hash found for 172.16.1.2.35884->172.16.1.1.12345 [S]
Mar 9 18:45:15 defiant kernel: [73398.392957] TCP: Unexpected MD5 Hash found for 172.16.1.2.35884->172.16.1.1.12345 [S]
Mar 9 18:45:16 defiant kernel: [73399.416742] TCP: Unexpected MD5 Hash found for 172.16.1.2.35884->172.16.1.1.12345 [S]
Mar 9 18:45:24 defiant kernel: [73407.444173] TCP: MD5 Hash failed for 172.16.1.2.45728->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:25 defiant kernel: [73408.472269] TCP: MD5 Hash failed for 172.16.1.2.45728->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:26 defiant kernel: [73409.495976] TCP: MD5 Hash failed for 172.16.1.2.45728->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:27 defiant kernel: [73410.519916] TCP: MD5 Hash failed for 172.16.1.2.45728->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:45:32 defiant kernel: [73415.494615] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:45:33 defiant kernel: [73416.503460] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:45:34 defiant kernel: [73417.531394] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:45:35 defiant kernel: [73418.551315] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:47:22 defiant kernel: [73525.350213] TCP: Unexpected MD5 Hash found for 172.16.1.2.54736->172.16.1.1.12345 [S]
Mar 9 18:47:23 defiant kernel: [73526.351780] TCP: Unexpected MD5 Hash found for 172.16.1.2.54736->172.16.1.1.12345 [S]
Mar 9 18:47:24 defiant kernel: [73527.375971] TCP: Unexpected MD5 Hash found for 172.16.1.2.54736->172.16.1.1.12345 [S]
Mar 9 18:47:25 defiant kernel: [73528.399632] TCP: Unexpected MD5 Hash found for 172.16.1.2.54736->172.16.1.1.12345 [S]
Mar 9 18:47:30 defiant kernel: [73533.402162] TCP: MD5 Hash failed for 172.16.1.2.38472->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:31 defiant kernel: [73534.415271] TCP: MD5 Hash failed for 172.16.1.2.38472->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:32 defiant kernel: [73535.439362] TCP: MD5 Hash failed for 172.16.1.2.38472->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:33 defiant kernel: [73536.463119] TCP: MD5 Hash failed for 172.16.1.2.38472->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:38 defiant kernel: [73541.453876] TCP: Unexpected MD5 Hash found for 172.16.1.2.56654->172.16.1.1.12345 [S]
Mar 9 18:47:39 defiant kernel: [73542.478746] TCP: Unexpected MD5 Hash found for 172.16.1.2.56654->172.16.1.1.12345 [S]
Mar 9 18:47:40 defiant kernel: [73543.502677] TCP: Unexpected MD5 Hash found for 172.16.1.2.56654->172.16.1.1.12345 [S]
Mar 9 18:47:41 defiant kernel: [73544.526523] TCP: Unexpected MD5 Hash found for 172.16.1.2.56654->172.16.1.1.12345 [S]
Mar 9 18:47:49 defiant kernel: [73552.553526] TCP: MD5 Hash failed for 172.16.1.2.40748->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:50 defiant kernel: [73553.582003] TCP: MD5 Hash failed for 172.16.1.2.40748->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:51 defiant kernel: [73554.605820] TCP: MD5 Hash failed for 172.16.1.2.40748->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:52 defiant kernel: [73555.629936] TCP: MD5 Hash failed for 172.16.1.2.40748->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:47:57 defiant kernel: [73560.604885] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:47:58 defiant kernel: [73561.613434] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:47:59 defiant kernel: [73562.637270] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:48:01 defiant kernel: [73563.661162] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:49:03 defiant kernel: [73626.193643] TCP: Unexpected MD5 Hash found for 172.16.1.2.52230->172.16.1.1.12345 [S]
Mar 9 18:49:04 defiant kernel: [73627.208926] TCP: Unexpected MD5 Hash found for 172.16.1.2.52230->172.16.1.1.12345 [S]
Mar 9 18:49:05 defiant kernel: [73628.232916] TCP: Unexpected MD5 Hash found for 172.16.1.2.52230->172.16.1.1.12345 [S]
Mar 9 18:49:06 defiant kernel: [73629.260834] TCP: Unexpected MD5 Hash found for 172.16.1.2.52230->172.16.1.1.12345 [S]
Mar 9 18:49:11 defiant kernel: [73634.242440] TCP: MD5 Hash failed for 172.16.1.2.52242->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:12 defiant kernel: [73635.272436] TCP: MD5 Hash failed for 172.16.1.2.52242->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:13 defiant kernel: [73636.296146] TCP: MD5 Hash failed for 172.16.1.2.52242->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:14 defiant kernel: [73637.324079] TCP: MD5 Hash failed for 172.16.1.2.52242->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:19 defiant kernel: [73642.294261] TCP: Unexpected MD5 Hash found for 172.16.1.2.37122->172.16.1.1.12345 [S]
Mar 9 18:49:20 defiant kernel: [73643.303623] TCP: Unexpected MD5 Hash found for 172.16.1.2.37122->172.16.1.1.12345 [S]
Mar 9 18:49:21 defiant kernel: [73644.327778] TCP: Unexpected MD5 Hash found for 172.16.1.2.37122->172.16.1.1.12345 [S]
Mar 9 18:49:22 defiant kernel: [73645.351652] TCP: Unexpected MD5 Hash found for 172.16.1.2.37122->172.16.1.1.12345 [S]
Mar 9 18:49:30 defiant kernel: [73653.393242] TCP: MD5 Hash failed for 172.16.1.2.51856->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:31 defiant kernel: [73654.407017] TCP: MD5 Hash failed for 172.16.1.2.51856->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:32 defiant kernel: [73655.431009] TCP: MD5 Hash failed for 172.16.1.2.51856->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:33 defiant kernel: [73656.454706] TCP: MD5 Hash failed for 172.16.1.2.51856->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:49:38 defiant kernel: [73661.445570] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:49:39 defiant kernel: [73662.470408] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:49:40 defiant kernel: [73663.494193] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:49:41 defiant kernel: [73664.518144] TCP: Unexpected MD5 Hash found for 172.16.2.2.12345->172.16.1.1.12345 [S]
Mar 9 18:49:52 defiant kernel: [73675.595812] TCP: MD5 Hash failed for 172.16.1.2.36204->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:49:53 defiant kernel: [73676.613479] TCP: MD5 Hash failed for 172.16.1.2.36204->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:49:54 defiant kernel: [73677.637422] TCP: MD5 Hash failed for 172.16.1.2.36204->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:49:56 defiant kernel: [73678.661142] TCP: MD5 Hash failed for 172.16.1.2.36204->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:50:01 defiant kernel: [73683.649033] TCP: MD5 Hash failed for 172.16.1.2.52420->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:02 defiant kernel: [73684.676836] TCP: MD5 Hash failed for 172.16.1.2.52420->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:03 defiant kernel: [73685.700640] TCP: MD5 Hash failed for 172.16.1.2.52420->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:04 defiant kernel: [73686.728654] TCP: MD5 Hash failed for 172.16.1.2.52420->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:15 defiant kernel: [73697.793250] TCP: MD5 Hash failed for 172.16.1.2.49974->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:50:16 defiant kernel: [73698.823735] TCP: MD5 Hash failed for 172.16.1.2.49974->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:50:17 defiant kernel: [73699.843890] TCP: MD5 Hash failed for 172.16.1.2.49974->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:50:18 defiant kernel: [73700.867574] TCP: MD5 Hash failed for 172.16.1.2.49974->172.16.1.1.12345 [S] L3 index 0
Mar 9 18:50:23 defiant kernel: [73705.846725] TCP: MD5 Hash failed for 172.16.1.2.55498->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:24 defiant kernel: [73706.851167] TCP: MD5 Hash failed for 172.16.1.2.55498->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:25 defiant kernel: [73707.875095] TCP: MD5 Hash failed for 172.16.1.2.55498->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:26 defiant kernel: [73708.899061] TCP: MD5 Hash failed for 172.16.1.2.55498->172.16.1.1.12345 [S] L3 index 13
Mar 9 18:50:41 defiant kernel: [73724.087381] TCP: Unexpected MD5 Hash found for 172.16.1.2.53952->172.16.1.1.12345 [S]
Mar 9 18:50:42 defiant kernel: [73725.093871] TCP: Unexpected MD5 Hash found for 172.16.1.2.53952->172.16.1.1.12345 [S]
Mar 9 18:50:43 defiant kernel: [73726.113996] TCP: Unexpected MD5 Hash found for 172.16.1.2.53952->172.16.1.1.12345 [S]
Mar 9 18:50:44 defiant kernel: [73727.137924] TCP: Unexpected MD5 Hash found for 172.16.1.2.53952->172.16.1.1.12345 [S]
Mar 9 19:11:09 defiant kernel: [74951.775994] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].39444->[2001:db8:1::1].12345 [S]
Mar 9 19:11:10 defiant kernel: [74952.784159] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].39444->[2001:db8:1::1].12345 [S]
Mar 9 19:11:11 defiant kernel: [74953.804459] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].39444->[2001:db8:1::1].12345 [S]
Mar 9 19:11:12 defiant kernel: [74954.828025] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].39444->[2001:db8:1::1].12345 [S]
Mar 9 19:11:17 defiant kernel: [74959.828366] TCP: MD5 Hash mismatch for [2001:db8:1::2].42684->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:18 defiant kernel: [74960.843597] TCP: MD5 Hash mismatch for [2001:db8:1::2].42684->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:19 defiant kernel: [74961.867330] TCP: MD5 Hash mismatch for [2001:db8:1::2].42684->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:20 defiant kernel: [74962.891257] TCP: MD5 Hash mismatch for [2001:db8:1::2].42684->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:25 defiant kernel: [74967.881665] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].53174->[2001:db8:1::1].12345 [S]
Mar 9 19:11:26 defiant kernel: [74968.906826] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].53174->[2001:db8:1::1].12345 [S]
Mar 9 19:11:27 defiant kernel: [74969.934750] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].53174->[2001:db8:1::1].12345 [S]
Mar 9 19:11:28 defiant kernel: [74970.954668] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].53174->[2001:db8:1::1].12345 [S]
Mar 9 19:11:36 defiant kernel: [74978.977619] TCP: MD5 Hash mismatch for [2001:db8:1::2].37514->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:37 defiant kernel: [74979.978269] TCP: MD5 Hash mismatch for [2001:db8:1::2].37514->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:38 defiant kernel: [74981.002187] TCP: MD5 Hash mismatch for [2001:db8:1::2].37514->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:39 defiant kernel: [74982.026108] TCP: MD5 Hash mismatch for [2001:db8:1::2].37514->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:11:44 defiant kernel: [74987.030453] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:11:45 defiant kernel: [74988.041473] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:11:46 defiant kernel: [74989.065610] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:11:47 defiant kernel: [74990.089331] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:13:12 defiant kernel: [75075.490782] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].52610->[2001:db8:1::1].12345 [S]
Mar 9 19:13:13 defiant kernel: [75076.515324] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].52610->[2001:db8:1::1].12345 [S]
Mar 9 19:13:14 defiant kernel: [75077.543210] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].52610->[2001:db8:1::1].12345 [S]
Mar 9 19:13:16 defiant kernel: [75078.563335] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].52610->[2001:db8:1::1].12345 [S]
Mar 9 19:13:20 defiant kernel: [75083.543761] TCP: MD5 Hash mismatch for [2001:db8:1::2].48578->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:22 defiant kernel: [75084.546751] TCP: MD5 Hash mismatch for [2001:db8:1::2].48578->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:23 defiant kernel: [75085.570652] TCP: MD5 Hash mismatch for [2001:db8:1::2].48578->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:24 defiant kernel: [75086.598593] TCP: MD5 Hash mismatch for [2001:db8:1::2].48578->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:29 defiant kernel: [75091.593893] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].36456->[2001:db8:1::1].12345 [S]
Mar 9 19:13:30 defiant kernel: [75092.610368] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].36456->[2001:db8:1::1].12345 [S]
Mar 9 19:13:31 defiant kernel: [75093.638092] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].36456->[2001:db8:1::1].12345 [S]
Mar 9 19:13:32 defiant kernel: [75094.662014] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].36456->[2001:db8:1::1].12345 [S]
Mar 9 19:13:40 defiant kernel: [75102.693857] TCP: MD5 Hash mismatch for [2001:db8:1::2].38306->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:41 defiant kernel: [75103.713385] TCP: MD5 Hash mismatch for [2001:db8:1::2].38306->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:42 defiant kernel: [75104.737346] TCP: MD5 Hash mismatch for [2001:db8:1::2].38306->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:43 defiant kernel: [75105.761249] TCP: MD5 Hash mismatch for [2001:db8:1::2].38306->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:13:48 defiant kernel: [75110.744898] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:13:49 defiant kernel: [75111.748815] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:13:50 defiant kernel: [75112.768970] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:13:51 defiant kernel: [75113.792707] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:14:38 defiant kernel: [75161.124752] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].37782->[2001:db8:1::1].12345 [S]
Mar 9 19:14:39 defiant kernel: [75162.145544] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].37782->[2001:db8:1::1].12345 [S]
Mar 9 19:14:40 defiant kernel: [75163.165429] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].37782->[2001:db8:1::1].12345 [S]
Mar 9 19:14:41 defiant kernel: [75164.189139] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].37782->[2001:db8:1::1].12345 [S]
Mar 9 19:14:46 defiant kernel: [75169.176267] TCP: MD5 Hash mismatch for [2001:db8:1::2].39638->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:14:47 defiant kernel: [75170.204719] TCP: MD5 Hash mismatch for [2001:db8:1::2].39638->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:14:48 defiant kernel: [75171.228872] TCP: MD5 Hash mismatch for [2001:db8:1::2].39638->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:14:49 defiant kernel: [75172.252601] TCP: MD5 Hash mismatch for [2001:db8:1::2].39638->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:14:54 defiant kernel: [75177.228866] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].50904->[2001:db8:1::1].12345 [S]
Mar 9 19:14:55 defiant kernel: [75178.236182] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].50904->[2001:db8:1::1].12345 [S]
Mar 9 19:14:56 defiant kernel: [75179.260291] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].50904->[2001:db8:1::1].12345 [S]
Mar 9 19:14:57 defiant kernel: [75180.284286] TCP: Unexpected MD5 Hash found for [2001:db8:1::2].50904->[2001:db8:1::1].12345 [S]
Mar 9 19:15:05 defiant kernel: [75188.328452] TCP: MD5 Hash mismatch for [2001:db8:1::2].58614->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:06 defiant kernel: [75189.339612] TCP: MD5 Hash mismatch for [2001:db8:1::2].58614->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:07 defiant kernel: [75190.363319] TCP: MD5 Hash mismatch for [2001:db8:1::2].58614->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:08 defiant kernel: [75191.387271] TCP: MD5 Hash mismatch for [2001:db8:1::2].58614->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:13 defiant kernel: [75196.380149] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:15:14 defiant kernel: [75197.402810] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:15:15 defiant kernel: [75198.426966] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:15:16 defiant kernel: [75199.454656] TCP: Unexpected MD5 Hash found for [2001:db8:2::2].12345->[2001:db8:1::1].12345 [S]
Mar 9 19:15:27 defiant kernel: [75210.528697] TCP: MD5 Hash mismatch for [2001:db8:1::2].35600->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:29 defiant kernel: [75211.545862] TCP: MD5 Hash mismatch for [2001:db8:1::2].35600->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:30 defiant kernel: [75212.569765] TCP: MD5 Hash mismatch for [2001:db8:1::2].35600->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:31 defiant kernel: [75213.593913] TCP: MD5 Hash mismatch for [2001:db8:1::2].35600->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:36 defiant kernel: [75218.580169] TCP: MD5 Hash mismatch for [2001:db8:1::2].35178->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:37 defiant kernel: [75219.609478] TCP: MD5 Hash mismatch for [2001:db8:1::2].35178->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:38 defiant kernel: [75220.633204] TCP: MD5 Hash mismatch for [2001:db8:1::2].35178->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:39 defiant kernel: [75221.657325] TCP: MD5 Hash mismatch for [2001:db8:1::2].35178->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:50 defiant kernel: [75232.724491] TCP: MD5 Hash mismatch for [2001:db8:1::2].53452->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:51 defiant kernel: [75233.752288] TCP: MD5 Hash mismatch for [2001:db8:1::2].53452->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:52 defiant kernel: [75234.780411] TCP: MD5 Hash mismatch for [2001:db8:1::2].53452->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:53 defiant kernel: [75235.804135] TCP: MD5 Hash mismatch for [2001:db8:1::2].53452->[2001:db8:1::1].12345 [S]L3 index 0
Mar 9 19:15:58 defiant kernel: [75240.777430] TCP: MD5 Hash mismatch for [2001:db8:1::2].48658->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:15:59 defiant kernel: [75241.783727] TCP: MD5 Hash mismatch for [2001:db8:1::2].48658->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:16:00 defiant kernel: [75242.807858] TCP: MD5 Hash mismatch for [2001:db8:1::2].48658->[2001:db8:1::1].12345 [S]L3 index 13
Mar 9 19:16:01 defiant kernel: [75243.831587] TCP: MD5 Hash mismatch for [2001:db8:1::2].48658->[2001:db8:1::1].12345 [S]L3 index 13
Hope this helps.
Best regards,
Mirsad Todorovac
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/netdev/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.c…
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (4):
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 ++--
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 36 +++++++----
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +--
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 6 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 49 ++------------
net/ipv4/fou_core.c | 9 +--
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 79 ++++++++++++++++++-----
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 26 ++++----
net/ipv6/ip6_offload.c | 41 +++++-------
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 ++--
tools/testing/selftests/net/udpgro_fwd.sh | 10 ++-
24 files changed, 187 insertions(+), 155 deletions(-)
--
2.36.1
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing multiple
backtraces while at the same time limiting changes to generic code to the
absolute minimum. Architecture specific changes are kept at minimum by
retaining function names only if both CONFIG_DEBUG_BUGVERBOSE and
CONFIG_KUNIT are enabled.
The first patch of the series introduces the necessary infrastructure.
The second patch marks the warning message in drm_calc_scale() in the DRM
subsystem as intentional where warranted. This patch is intended to serve
as an example for the use of the functionality introduced with this series.
The last three patches in the series introduce the necessary architecture
specific changes for x86, arm64, and loongarch.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b…
and offers a more comprehensive solution of the problem discussed there.
Checkpatch note:
Remaining checkpatch errors and warnings were deliberately ignored.
Some are triggered by matching coding style or by comments interpreted
as code, others by assembler macros which are disliked by checkpatch.
Suggestions for improvements are welcome.
Some questions:
- Is the general approach promising ? If not, are there other possible
solutions ?
- Function pointers are only added to the __bug_table section if both
CONFIG_KUNIT and CONFIG_DEBUG_BUGVERBOSE are enabled. This avoids image
size increases if CONFIG_KUNIT=n. Downside is slightly more complex
architecture specific assembler code. If function pointers were always
added to the __bug_table section, vmlinux image size would increase by
approximately 0.6-0.7%. Is the increased complexity in assembler code
worth the reduced image size ? I think so, but I would like to hear
other opinions.
- There are additional possibilities associated with storing the bug
function name in the __bug_table section. It could be independent of
KUNIT, it could be a configuration flag, and/or it could be used to
display the name of the offending function in BUG/WARN messages.
Is any of those of interest ?
----------------------------------------------------------------
Guenter Roeck (5):
bug: Core support for suppressing warning backtraces
drm: Suppress intentional warning backtraces in scaling unit tests
x86: Add support for suppressing warning tracebacks
arm64: Add support for suppressing warning tracebacks
loongarch: Add support for suppressing warning tracebacks
arch/arm64/include/asm/asm-bug.h | 29 +++++++++++++-------
arch/arm64/include/asm/bug.h | 8 +++++-
arch/loongarch/include/asm/bug.h | 38 ++++++++++++++++++--------
arch/x86/include/asm/bug.h | 21 +++++++++++----
drivers/gpu/drm/tests/drm_rect_test.c | 6 +++++
include/asm-generic/bug.h | 16 ++++++++---
include/kunit/bug.h | 51 +++++++++++++++++++++++++++++++++++
include/linux/bug.h | 13 +++++++++
lib/bug.c | 51 ++++++++++++++++++++++++++++++++---
lib/kunit/Makefile | 6 +++--
lib/kunit/bug.c | 40 +++++++++++++++++++++++++++
11 files changed, 243 insertions(+), 36 deletions(-)
create mode 100644 include/kunit/bug.h
create mode 100644 lib/kunit/bug.c
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, inner_network_header is always set in the receive phase of GRO,
this is then leveraged in the next commit to remove some state from
napi_gro_cb, and stateful code in {ipv6,inet}_gro_receive which may be
unnecessarily complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (4):
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: set inner_network_header in receive phase
net: gro: move L3 flush checks to tcp_gro_receive
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 ++--
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 33 ++++++----
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +--
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 9 ++-
net/core/gro.c | 5 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 53 ++-------------
net/ipv4/fou_core.c | 9 +--
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 80 ++++++++++++++++++-----
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 26 ++++----
net/ipv6/ip6_offload.c | 45 +++++--------
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 ++--
tools/testing/selftests/net/udpgro_fwd.sh | 10 ++-
24 files changed, 188 insertions(+), 162 deletions(-)
--
2.36.1
Hi,
In the net-next tree commit v6.8-rc7-2348-g75c2946db360, vannila except for this minor mod
to selftest suite:
----------------------------------------------------------------------------------------
marvin@defiant:~/linux/kernel/net-next$ git diff
diff --git a/tools/testing/selftests/breakpoints/Makefile b/tools/testing/selftests/breakpoints/Makefile
index 9ec2c78de8ca..76a0e3837136 100644
--- a/tools/testing/selftests/breakpoints/Makefile
+++ b/tools/testing/selftests/breakpoints/Makefile
@@ -3,7 +3,7 @@
uname_M := $(shell uname -m 2>/dev/null || echo not)
ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
-TEST_GEN_PROGS := step_after_suspend_test
+# TEST_GEN_PROGS := step_after_suspend_test
ifeq ($(ARCH),x86)
TEST_GEN_PROGS += breakpoint_test
marvin@defiant:~/linux/kernel/net-next$
----------------------------------------------------------------------------------------
there seems to be a bug.
The symptom is a hang in forever loop in ./pidfd_setns_test :
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
pidfd_send_signal(18, SIGKILL, NULL, 0) = 0
.
.
.
This could happen here:
FIXTURE_TEARDOWN(current_nsset)
{
int i;
→ ASSERT_EQ(sys_pidfd_send_signal(self->child_pidfd1,
SIGKILL, NULL, 0), 0);
→ ASSERT_EQ(sys_pidfd_send_signal(self->child_pidfd2,
SIGKILL, NULL, 0), 0);
for (i = 0; i < PIDFD_NS_MAX; i++) {
if (self->nsfds[i] >= 0)
close(self->nsfds[i]);
if (self->child_nsfds1[i] >= 0)
close(self->child_nsfds1[i]);
if (self->child_nsfds2[i] >= 0)
close(self->child_nsfds2[i]);
}
if (self->child_pidfd1 >= 0)
EXPECT_EQ(0, close(self->child_pidfd1));
if (self->child_pidfd2 >= 0)
EXPECT_EQ(0, close(self->child_pidfd2));
ASSERT_EQ(sys_waitid(P_PID, self->child_pid_exited, WEXITED), 0);
ASSERT_EQ(sys_waitid(P_PID, self->child_pid1, WEXITED), 0);
ASSERT_EQ(sys_waitid(P_PID, self->child_pid2, WEXITED), 0);
}
The testsuite output is this:
root@defiant:/home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd# ./pidfd_setns_test
TAP version 13
1..7
# Starting 7 tests from 2 test cases.
# RUN global.setns_einval ...
# OK global.setns_einval
ok 1 global.setns_einval
# RUN current_nsset.invalid_flags ...
# pidfd_setns_test.c:161:invalid_flags:Expected self->child_pid_exited (0) > 0 (0)
# OK current_nsset.invalid_flags
ok 2 current_nsset.invalid_flags
# RUN current_nsset.pidfd_exited_child ...
# pidfd_setns_test.c:161:pidfd_exited_child:Expected self->child_pid_exited (0) > 0 (0)
# OK current_nsset.pidfd_exited_child
ok 3 current_nsset.pidfd_exited_child
# RUN current_nsset.pidfd_incremental_setns ...
# pidfd_setns_test.c:161:pidfd_incremental_setns:Expected self->child_pid_exited (0) > 0 (0)
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to user namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to mnt namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to pid namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to uts namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to ipc namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to net namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to cgroup namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to pid_for_children namespace of 1005687 via pidfd 18
# pidfd_setns_test.c:391:pidfd_incremental_setns:Expected setns(self->child_pidfd1, info->flag) (-1) == 0 (0)
# pidfd_setns_test.c:392:pidfd_incremental_setns:Too many users - Failed to setns to time namespace of 1005687 via pidfd 18
# pidfd_incremental_setns: Test terminated by timeout
# FAIL current_nsset.pidfd_incremental_setns
not ok 4 current_nsset.pidfd_incremental_setns
# RUN current_nsset.nsfd_incremental_setns ...
# pidfd_setns_test.c:161:nsfd_incremental_setns:Expected self->child_pid_exited (0) > 0 (0)
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to user namespace of 1005695 via nsfd 17
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to mnt namespace of 1005695 via nsfd 22
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to pid namespace of 1005695 via nsfd 25
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to uts namespace of 1005695 via nsfd 28
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to ipc namespace of 1005695 via nsfd 31
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to net namespace of 1005695 via nsfd 34
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to cgroup namespace of 1005695 via nsfd 37
# pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to pid_for_children namespace of 1005695 via nsfd 40
# pidfd_setns_test.c:427:nsfd_incremental_setns:Expected setns(self->child_nsfds1[i], info->flag) (-1) == 0 (0)
# pidfd_setns_test.c:428:nsfd_incremental_setns:Too many users - Failed to setns to time namespace of 1005695 via nsfd 43
# nsfd_incremental_setns: Test terminated by timeout
# FAIL current_nsset.nsfd_incremental_setns
not ok 5 current_nsset.nsfd_incremental_setns
# RUN current_nsset.pidfd_one_shot_setns ...
# pidfd_setns_test.c:161:pidfd_one_shot_setns:Expected self->child_pid_exited (0) > 0 (0)
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding user namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding mnt namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding pid namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding uts namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding ipc namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding net namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding cgroup namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding pid_for_children namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:462:pidfd_one_shot_setns:Adding time namespace of 1005710 to list of namespaces to attach to
# pidfd_setns_test.c:466:pidfd_one_shot_setns:Expected setns(self->child_pidfd1, flags) (-1) == 0 (0)
# pidfd_setns_test.c:467:pidfd_one_shot_setns:Too many users - Failed to setns to namespaces of 1005710
# pidfd_one_shot_setns: Test terminated by timeout
# FAIL current_nsset.pidfd_one_shot_setns
not ok 6 current_nsset.pidfd_one_shot_setns
# RUN current_nsset.no_foul_play ...
# pidfd_setns_test.c:161:no_foul_play:Expected self->child_pid_exited (0) > 0 (0)
# pidfd_setns_test.c:506:no_foul_play:Adding user namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding mnt namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding pid namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding uts namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding ipc namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding net namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding cgroup namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:506:no_foul_play:Adding time namespace of 1005745 to list of namespaces to attach to
# pidfd_setns_test.c:510:no_foul_play:Expected setns(self->child_pidfd1, flags) (-1) == 0 (0)
# pidfd_setns_test.c:511:no_foul_play:Too many users - Failed to setns to namespaces of 1005745 vid pidfd 18
# no_foul_play: Test terminated by timeout
# FAIL current_nsset.no_foul_play
not ok 7 current_nsset.no_foul_play
# FAILED: 3 / 7 tests passed.
# Totals: pass:3 fail:4 xfail:0 xpass:0 skip:0 error:0
root@defiant:/home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd#
The main selftest thread is still hanging with total output of:
make[3]: Entering directory '/home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd'
TAP version 13
1..7
# timeout set to 45
# selftests: pidfd: pidfd_test
# TAP version 13
# 1..8
# # Parent: pid: 958028
# # Parent: Waiting for Child (958029) to complete.
# # Child (pidfd): starting. pid 958029 tid 958029
# # Child Thread: starting. pid 958029 tid 958030 ; and sleeping
# # Child Thread: doing exec of sleep
# # Time waited for child: 3
# ok 1 pidfd_poll check for premature notification on child thread exec test: Passed
# # Parent: pid: 958028
# # Parent: Waiting for Child (958031) to complete.
# # Child (pidfd): starting. pid 958031 tid 958031
# # Child Thread: starting. pid 958031 tid 958032 ; and sleeping
# # Child Thread: doing exec of sleep
# # Parent: Child process waited for.
# # Time waited for child: 3
# ok 2 pidfd_poll check for premature notification on child thread exec test: Passed
# # Parent: pid: 958028
# # Parent: Waiting for Child (958033) to complete.
# # Child: starting. pid 958033 tid 958033
# # Child Thread: starting. pid 958033 tid 958034 ; and sleeping
# # Child Thread: starting. pid 958033 tid 958035 ; and sleeping
# # # Child Thread: DONE. pid 958033 tid 958034
# Child Thread: DONE. pid 958033 tid 958035
# # Time since child exit: 3
# ok 3 pidfd_poll check for premature notification on non-emptygroup leader exit test: Passed
# # Parent: pid: 958028
# # Parent: Waiting for Child (958036) to complete.
# # Child: starting. pid 958036 tid 958036
# # Child Thread: starting. pid 958036 tid 958037 ; and sleeping
# # Child Thread: starting. pid 958036 tid 958038 ; and sleeping
# # Child Thread: DONE. pid 958036 tid 958037
# # Child Thread: DONE. pid 958036 tid 958038
# # Parent: Child process waited for.
# # Time since child exit: 3
# ok 4 pidfd_poll check for premature notification on non-emptygroup leader exit test: Passed
# ok 5 pidfd_send_signal check for support test: pidfd_send_signal() syscall is supported. Tests can be executed
# ok 6 pidfd_send_signal send SIGUSR1 test: Sent signal
# # waitpid WEXITSTATUS=0
# ok 7 pidfd_send_signal signal exited process test: Failed to send signal as expected
# # pid to recycle is 1000
# ok 8 # SKIP pidfd_send_signal signal recycled pid test: Skipping test
# # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:1 error:0
ok 1 selftests: pidfd: pidfd_test
# timeout set to 45
# selftests: pidfd: pidfd_fdinfo_test
# TAP version 13
# 1..2
# # New child: 990827, fd: 5
# # New child: 990828, fd: 6
# # waitpid WEXITSTATUS=0
# # waitpid WEXITSTATUS=0
# ok 1 pidfd check for NSpid in fdinfo test: Passed
# # New child: 990830, fd: 5
# # waitpid WEXITSTATUS=0
# ok 2 pidfd check fdinfo for dead process test: Passed
# # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 2 selftests: pidfd: pidfd_fdinfo_test
# timeout set to 45
# selftests: pidfd: pidfd_open_test
# 1..3
# ok 1 do not allow invalid pid test: passed
# ok 2 do not allow invalid flag test: passed
# ok 3 open a new pidfd test: passed
# # pidfd 5 refers to process with pid 990848
# # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 3 selftests: pidfd: pidfd_open_test
# timeout set to 45
# selftests: pidfd: pidfd_poll_test
# # running pidfd poll test for 10000 iterations
# ok 1 pidfd poll test: pass
# # Planned tests != run tests (0 != 1)
# # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 4 selftests: pidfd: pidfd_poll_test
# timeout set to 45
# selftests: pidfd: pidfd_wait
# TAP version 13
# 1..3
# # Starting 3 tests from 1 test cases.
# # RUN global.wait_simple ...
# # OK global.wait_simple
# ok 1 global.wait_simple
# # RUN global.wait_states ...
# # OK global.wait_states
# ok 2 global.wait_states
# # RUN global.wait_nonblock ...
# # OK global.wait_nonblock
# ok 3 global.wait_nonblock
# # PASSED: 3 / 3 tests passed.
# # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 5 selftests: pidfd: pidfd_wait
# timeout set to 45
# selftests: pidfd: pidfd_getfd_test
# TAP version 13
# 1..4
# # Starting 4 tests from 2 test cases.
# # RUN global.flags_set ...
# # OK global.flags_set
# ok 1 global.flags_set
# # RUN child.disable_ptrace ...
# # OK child.disable_ptrace
# ok 2 child.disable_ptrace
# # RUN child.fetch_fd ...
# # OK child.fetch_fd
# ok 3 child.fetch_fd
# # RUN child.test_unknown_fd ...
# # OK child.test_unknown_fd
# ok 4 child.test_unknown_fd
# # PASSED: 4 / 4 tests passed.
# # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 6 selftests: pidfd: pidfd_getfd_test
# timeout set to 45
# selftests: pidfd: pidfd_setns_test
# TAP version 13
# 1..7
# # Starting 7 tests from 2 test cases.
# # RUN global.setns_einval ...
# # OK global.setns_einval
# ok 1 global.setns_einval
# # RUN current_nsset.invalid_flags ...
# # pidfd_setns_test.c:161:invalid_flags:Expected self->child_pid_exited (0) > 0 (0)
# # OK current_nsset.invalid_flags
# ok 2 current_nsset.invalid_flags
# # RUN current_nsset.pidfd_exited_child ...
# # pidfd_setns_test.c:161:pidfd_exited_child:Expected self->child_pid_exited (0) > 0 (0)
# # OK current_nsset.pidfd_exited_child
# ok 3 current_nsset.pidfd_exited_child
# # RUN current_nsset.pidfd_incremental_setns ...
# # pidfd_setns_test.c:161:pidfd_incremental_setns:Expected self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to user namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to mnt namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to pid namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to uts namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to ipc namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to net namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to cgroup namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly setns to pid_for_children namespace of 1000951 via pidfd 20
# # pidfd_setns_test.c:391:pidfd_incremental_setns:Expected setns(self->child_pidfd1, info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:392:pidfd_incremental_setns:Too many users - Failed to setns to time namespace of 1000951 via pidfd 20
# # pidfd_incremental_setns: Test terminated by timeout
# # FAIL current_nsset.pidfd_incremental_setns
# not ok 4 current_nsset.pidfd_incremental_setns
# # RUN current_nsset.nsfd_incremental_setns ...
# # pidfd_setns_test.c:161:nsfd_incremental_setns:Expected self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to user namespace of 1000958 via nsfd 19
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to mnt namespace of 1000958 via nsfd 24
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to pid namespace of 1000958 via nsfd 27
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to uts namespace of 1000958 via nsfd 30
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to ipc namespace of 1000958 via nsfd 33
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to net namespace of 1000958 via nsfd 36
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to cgroup namespace of 1000958 via nsfd 39
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly setns to pid_for_children namespace of 1000958 via nsfd 42
# # pidfd_setns_test.c:427:nsfd_incremental_setns:Expected setns(self->child_nsfds1[i], info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:428:nsfd_incremental_setns:Too many users - Failed to setns to time namespace of 1000958 via nsfd 45
[HANG]
# ps -efwww
root 958005 66430 0 01:04 pts/2 00:00:00 make OUTPUT=/home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd -C pidfd run_tests SRC_PATH=/home/marvin/linux/kernel/net-next/tools/testing/selftests OBJ_PATH=/home/marvin/linux/kernel/net-next/tools/testing/selftests O=/home/marvin/linux/kernel/net-next
root 958006 958005 0 01:04 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/net-next/tools/testing/selftests"; . /home/marvin/linux/kernel/net-next/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_fdinfo_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_open_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_poll_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_wait /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_getfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_setns_test
root 1000927 958006 0 01:05 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/net-next/tools/testing/selftests"; . /home/marvin/linux/kernel/net-next/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_fdinfo_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_open_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_poll_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_wait /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_getfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_setns_test
root 1000928 1000927 0 01:05 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/net-next/tools/testing/selftests"; . /home/marvin/linux/kernel/net-next/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_fdinfo_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_open_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_poll_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_wait /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_getfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_setns_test
root 1000929 1000928 0 01:05 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/net-next/tools/testing/selftests"; . /home/marvin/linux/kernel/net-next/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_fdinfo_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_open_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_poll_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_wait /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_getfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_setns_test
root 1000932 1000929 0 01:05 pts/2 00:00:00 /bin/sh -c BASE_DIR="/home/marvin/linux/kernel/net-next/tools/testing/selftests"; . /home/marvin/linux/kernel/net-next/tools/testing/selftests/kselftest/runner.sh; if [ "X" != "X" ]; then per_test_logging=1; fi; run_many /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_fdinfo_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_open_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_poll_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_wait /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_getfd_test /home/marvin/linux/kernel/net-next/tools/testing/selftests/pidfd/pidfd_setns_test
root 1000934 1000932 0 01:05 pts/2 00:00:00 perl /home/marvin/linux/kernel/net-next/tools/testing/selftests/kselftest/prefix.pl
root 1000955 2931 0 01:05 pts/2 00:00:00 ./pidfd_setns_test
root 1000956 1000955 99 01:05 pts/2 13:20:17 ./pidfd_setns_test
root 1000957 1000956 0 01:05 pts/2 00:00:00 [pidfd_setns_tes] <defunct>
root 1000958 1000956 0 01:05 pts/2 00:00:00 [pidfd_setns_tes] <defunct>
root 1000959 1000956 0 01:05 pts/2 00:00:00 ./pidfd_setns_test
in a 99%CPU forever loop:
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
pidfd_send_signal(20, SIGKILL, NULL, 0) = 0
That's about all there is. Nothing interesting in /var/log/syslog or dmesg.
Hope this helps.
Best regards,
Mirsad Todorovac
Arch maintainers, please ack/review patches.
This is a resend of a series from Frank last year[1]. I worked in Rob's
review comments to unconditionally call unflatten_device_tree() and
fixup/audit calls to of_have_populated_dt() so that behavior doesn't
change.
I need this series so I can add DT based tests in the clk framework.
Either I can merge it through the clk tree once everyone is happy, or
Rob can merge it through the DT tree and provide some branch so I can
base clk patches on it.
Changes from v3 (https://lore.kernel.org/r/20240202195909.3458162-1-sboyd@kernel.org):
* Made OF_UNITTEST depend on OF_EARLY_FLATREE
* Made OF_EARLY_FLATREE depend on absence of arches that don't call
unflatten_device_tree()
* Added of_ prefix to dtb_ prefixed KUnit tests
* Picked up tags
Changes from v2 (https://lore.kernel.org/r/20240130004508.1700335-1-sboyd@kernel.org):
* Reorder patches to have OF changes largely first
* No longer modify initial_boot_params if ACPI=y
* Put arm64 patch back to v1
Changes from v1 (https://lore.kernel.org/r/20240112200750.4062441-1-sboyd@kernel.org):
* x86 patch included
* arm64 knocks out initial dtb if acpi is in use
* keep Kconfig hidden but def_bool enabled otherwise
Changes from Frank's series[1]:
* Add a DTB loaded kunit test
* Make of_have_populated_dt() return false if the DTB isn't from the
bootloader
* Architecture calls made unconditional so that a root node is always
made
Frank Rowand (2):
of: Create of_root if no dtb provided by firmware
of: unittest: treat missing of_root as error instead of fixing up
Stephen Boyd (5):
of: Always unflatten in unflatten_and_copy_device_tree()
um: Unconditionally call unflatten_device_tree()
x86/of: Unconditionally call unflatten_and_copy_device_tree()
arm64: Unconditionally call unflatten_device_tree()
of: Add KUnit test to confirm DTB is loaded
arch/arm64/kernel/setup.c | 3 +-
arch/um/kernel/dtb.c | 14 ++++----
arch/x86/kernel/devicetree.c | 24 +++++++-------
drivers/of/.kunitconfig | 3 ++
drivers/of/Kconfig | 14 ++++++--
drivers/of/Makefile | 4 ++-
drivers/of/empty_root.dts | 6 ++++
drivers/of/fdt.c | 64 +++++++++++++++++++++++++++---------
drivers/of/of_test.c | 57 ++++++++++++++++++++++++++++++++
drivers/of/platform.c | 3 --
drivers/of/unittest.c | 16 +++------
include/linux/of.h | 25 ++++++++------
12 files changed, 168 insertions(+), 65 deletions(-)
create mode 100644 drivers/of/.kunitconfig
create mode 100644 drivers/of/empty_root.dts
create mode 100644 drivers/of/of_test.c
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
This series has some conflicts with SNC series from Maciej and also
with the MBA/MBM series from Babu.
--
i.
Ilpo Järvinen (13):
selftests/resctrl: Convert get_mem_bw_imc() fd close to for loop
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 21 +-
tools/testing/selftests/resctrl/mba_test.c | 34 ++-
tools/testing/selftests/resctrl/mbm_test.c | 33 ++-
tools/testing/selftests/resctrl/resctrl.h | 48 ++--
tools/testing/selftests/resctrl/resctrl_val.c | 269 ++++++------------
tools/testing/selftests/resctrl/resctrlfs.c | 55 ++--
8 files changed, 224 insertions(+), 247 deletions(-)
--
2.39.2
This is the 4th version of the patch set. In this patchset we aim to add
pstore multi-backend support then user can register more than one pstore
backend.
Changes in v4:
- Replace all rcu_read_lock with mutex
- Move bif_oops_buf, max_compressed_size and pstore_dumper into
pstore_info_list
- add a helper to do "is this name in the list" and a helper to do
"is this backend loaded"
- make comments in pstore_(un)register clearer
- return the max_seen ret or the first negative err in write_pmsg()
- add a /sys/module entry for the list of backends, comma separated
- Link to v3: https://lore.kernel.org/all/20230928024244.257687-1-xiangzao@linux.alibaba.…
Changes in v3:
- Fix ftrace.c build error
- Link to v2: https://lore.kernel.org/all/20240205122852.7069-1-xiangzao@linux.alibaba.co…
Changes in v2:
- pstore.backend no longer acts as "registered backend", but
"backends eligible for registration".
- drop subdir since it will break user space
- drop tty frontend since I haven't yet devised a satisfactory
implementation strategy
- Link to v1: https://lore.kernel.org/all/20230928024244.257687-1-xiangzao@linux.alibaba.…
Yuanhe Shu (4):
pstore: add multi-backend support
pstore: add a /sys/module entry for loaded backends
Documentation: adjust pstore backend related document
tools/testing: adjust pstore backend related selftest
Documentation/ABI/testing/pstore | 8 +-
.../admin-guide/kernel-parameters.txt | 4 +-
fs/pstore/ftrace.c | 27 +-
fs/pstore/inode.c | 57 +++-
fs/pstore/internal.h | 5 +-
fs/pstore/platform.c | 274 ++++++++++++------
fs/pstore/pmsg.c | 27 +-
include/linux/pstore.h | 23 ++
tools/testing/selftests/pstore/common_tests | 8 +-
.../selftests/pstore/pstore_post_reboot_tests | 67 +++--
tools/testing/selftests/pstore/pstore_tests | 2 +-
11 files changed, 358 insertions(+), 144 deletions(-)
--
2.39.3
On Thu, Mar 07 2024 at 13:34, Edward Liaw wrote:
>> Thanks for picking those up and moving them forward. Any particular
>> reason why you didn't pick up the full set?
>
> I didn't know enough about the code to resolve some of the merges in the
> full set. I had run into the issue with the test timer_distribution test
> hanging on the Android kernel and wanted to get that fixed first.
Fair enough. I've marked your series for my post merge window tree and
I'll have a look at the rest of the pile.
Thanks,
tglx
I'm sending some patches that were orignally in
https://lore.kernel.org/lkml/20230606132949.068951363@linutronix.de/
to prevent the timer_distribution test from hanging and also fix some
format inconsistencies.
Edward Liaw (3):
selftests/timers/posix_timers: Make signal distribution test less
fragile
selftests/timers/posix_timers: Use TAP reporting format
selftests/timers/posix_timers: Use llabs for long long
tools/testing/selftests/timers/posix_timers.c | 196 ++++++++----------
1 file changed, 89 insertions(+), 107 deletions(-)
--
2.44.0.rc1.240.g4c46232300-goog
This series enables support for the data processing extensions in the
newly released 2023 architecture, this is mainly support for 8 bit
floating point formats. Most of the extensions only introduce new
instructions and therefore only require hwcaps but there is a new EL0
visible control register FPMR used to control the 8 bit floating point
formats, we need to manage traps for this and context switch it.
Due to the very recently merged KVM changes for configuring guest
features via ID register writes only being available in -next the
support for guest state has been dropped for this version, the relevant
KVM interfaces should all be there after the merge window so the code
will be refreshed for the new interfaces then.
I've not added test coverage for ptrace, my plan is to add support to
fp-ptrace (which is now merged so I'll update after the merge window).
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v5:
- Rebase onto v6.8-rc3.
- Use u64 rather than unsigned long for storing FPMR.
- Temporarily drop KVM guest support due to issues with KVM being a
moving target.
- Link to v4: https://lore.kernel.org/r/20240122-arm64-2023-dpisa-v4-0-776e094861df@kerne…
Changes in v4:
- Rebase onto v6.8-rc1.
- Move KVM support to the end of the series.
- Link to v3: https://lore.kernel.org/r/20231205-arm64-2023-dpisa-v3-0-dbcbcd867a7f@kerne…
Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kerne…
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (9):
arm64/cpufeature: Hook new identification registers up to cpufeature
arm64/fpsimd: Enable host kernel access to FPMR
arm64/fpsimd: Support FEAT_FPMR
arm64/signal: Add FPMR signal handling
arm64/ptrace: Expose FPMR via ptrace
arm64/hwcap: Define hwcaps for 2023 DPISA features
kselftest/arm64: Handle FPMR context in generic signal frame parser
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Add 2023 DPISA hwcap test coverage
Documentation/arch/arm64/elf_hwcaps.rst | 49 +++++
arch/arm64/include/asm/cpu.h | 3 +
arch/arm64/include/asm/cpufeature.h | 5 +
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/hwcap.h | 15 ++
arch/arm64/include/asm/kvm_arm.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/asm/processor.h | 4 +
arch/arm64/include/uapi/asm/hwcap.h | 15 ++
arch/arm64/include/uapi/asm/sigcontext.h | 8 +
arch/arm64/kernel/cpufeature.c | 72 +++++++
arch/arm64/kernel/cpuinfo.c | 18 ++
arch/arm64/kernel/fpsimd.c | 13 ++
arch/arm64/kernel/ptrace.c | 42 ++++
arch/arm64/kernel/signal.c | 59 ++++++
arch/arm64/kvm/fpsimd.c | 1 +
arch/arm64/tools/cpucaps | 1 +
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 217 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/fpmr_siginfo.c | 82 ++++++++
.../selftests/arm64/signal/testcases/testcases.c | 8 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
23 files changed, 619 insertions(+), 1 deletion(-)
---
base-commit: 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Zi Yan <ziy(a)nvidia.com>
Hi all,
File folio supports any order and multi-size THP is upstreamed[1], so both
file and anonymous folios can be >0 order. Currently, split_huge_page()
only splits a huge page to order-0 pages, but splitting to orders higher than
0 might better utilize large folios, if done properly. In addition,
Large Block Sizes in XFS support would benefit from it during truncate[2].
This patchset adds support for splitting a large folio to any lower order
folios. The patchset is on top of mm-everything-2024-02-24-02-40.
In addition to this implementation of split_huge_page_to_list_to_order(),
a possible optimization could be splitting a large folio to arbitrary
smaller folios instead of a single order. As both Hugh and Ryan pointed
out [3,5] that split to a single order might not be optimal, an order-9 folio
might be better split into 1 order-8, 1 order-7, ..., 1 order-1, and 2 order-0
folios, depending on subsequent folio operations. Leave this as future work.
Changelog
===
Since v4[4]
1. Picked up Matthew's order-1 folio support in the page cache patch, so
that XFS Large Block Sizes patchset can avoid additional code churn in
split_huge_page_to_list_to_order().
2. Dropped truncate change patch and corresponding testing code.
3. Removed thp_nr_pages() use in __split_huge_page()
(per David Hildenbrand).
4. Fixed __split_page_owner() (per David Hildenbrand).
5. Changed unmap_folio() to only add TTU_SPLIT_HUGE_PMD if the folios is
pmd mappable (per Ryan Roberts).
6. Moved swapcached folio split warning upfront and return -EINVAL
(per Ryan Roberts).
Since v3
---
1. Excluded shmem folios and pagecache folios without FS support from
splitting to any order (per Hugh Dickins).
2. Allowed splitting anonymous large folio to any lower order since
multi-size THP is upstreamed.
3. Adapted selftests code to new framework.
Since v2
---
1. Fixed an issue in __split_page_owner() introduced during my rebase
Since v1
---
1. Changed split_page_memcg() and split_page_owner() parameter to use order
2. Used folio_test_pmd_mappable() in place of the equivalent code
Details
===
* Patch 1 changes unmap_folio() to only add TTU_SPLIT_HUGE_PMD if the
folio is pmd mappable.
* Patch 2 adds support for order-1 page cache folio.
* Patch 3 changes split_page_memcg() to use order instead of nr_pages.
* Patch 4 changes split_page_owner() to use order instead of nr_pages.
* Patch 5 and 6 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 7 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 8 adds a test API to debugfs and test cases in
split_huge_page_test selftests.
Comments and/or suggestions are welcome.
[1] https://lore.kernel.org/all/20231207161211.2374093-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/linux-mm/20240226094936.2677493-1-kernel@pankajragh…
[3] https://lore.kernel.org/linux-mm/9dd96da-efa2-5123-20d4-4992136ef3ad@google…
[4] https://lore.kernel.org/linux-mm/cbb1d6a0-66dd-47d0-8733-f836fe050374@arm.c…
[5] https://lore.kernel.org/linux-mm/20240213215520.1048625-1-zi.yan@sent.com/
Matthew Wilcox (Oracle) (1):
mm: Support order-1 folios in the page cache
Zi Yan (7):
mm/huge_memory: only split PMD mapping when necessary in unmap_folio()
mm/memcg: use order instead of nr in split_page_memcg()
mm/page_owner: use order instead of nr in split_page_owner()
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages
mm: huge_memory: enable debugfs to split huge pages to any order.
include/linux/huge_mm.h | 21 ++-
include/linux/memcontrol.h | 4 +-
include/linux/page_owner.h | 14 +-
mm/filemap.c | 2 -
mm/huge_memory.c | 173 +++++++++++++-----
mm/internal.h | 3 +-
mm/memcontrol.c | 10 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 6 +-
mm/readahead.c | 3 -
.../selftests/mm/split_huge_page_test.c | 115 +++++++++++-
11 files changed, 276 insertions(+), 83 deletions(-)
--
2.43.0
v3: Rebase on the next branch of linux-kselftest.git,
modify the patch title and update the commit message
v2: Rebase on 6.5-rc1 and update the commit message
Tiezhu Yang (2):
selftests/vDSO: Fix building errors on LoongArch
selftests/vDSO: Fix runtime errors on LoongArch
tools/testing/selftests/vDSO/vdso_config.h | 6 ++++-
.../testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++-------
.../selftests/vDSO/vdso_test_gettimeofday.c | 26 +++++--------------
3 files changed, 18 insertions(+), 30 deletions(-)
--
2.42.0
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v1:
- Annotate type of parsed_files
- Add ability to input file name from stdin again
- Make for loops a bit terser
- Add no output warning
- Change feature to take in multiple fields rather than a directory.
Currently nonrecursive. Let me know if people would prefer this as
recursive.
tools/testing/kunit/kunit.py | 45 +++++++++++++++++++++++++-----------
1 file changed, 32 insertions(+), 13 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..df804a118aa5 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,37 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
+ parsed_files = [] # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if cli_args.files is None:
sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
+ parsed_files.append(sys.stdin)
+ elif cli_args.files[0] == "debugfs" and len(cli_args.files) == 1:
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
+ parsed_files.extend(f for f in cli_args.files if os.path.isfile(f))
+
+ if len(parsed_files) == 0:
+ print("No output found.")
+
+ for file in parsed_files:
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +587,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.278.ge034bb2e1d-goog
This series addresses issues related to hugepage requirements in the MM
selftests, ensuring tests are skipped rather than failing when the
necessary hugepage count is not met.
This adjustment allows for a more graceful handling for systems with
insufficient hugepages, preventing unnecessary test failures and improving
the overall robustness of the test suite.
Nico Pache (3):
selftests/mm: Dont fail testsuite due to a lack of hugepages
selftests/mm: Skip uffd hugetlb tests with insufficient hugepages
selftests/mm: Skip the hugetlb-madvise tests on unmet hugepage
requirements
Changes from v1:
- Added checks to skip tests when hugepage requirements are not met, rather
than exiting with a failure.
tools/testing/selftests/mm/hugetlb-madvise.c | 2 +-
tools/testing/selftests/mm/run_vmtests.sh | 1 -
tools/testing/selftests/mm/uffd-stress.c | 6 ++++++
3 files changed, 7 insertions(+), 2 deletions(-)
--
2.44.0
This series enables support for the data processing extensions in the
newly released 2023 architecture, this is mainly support for 8 bit
floating point formats. Most of the extensions only introduce new
instructions and therefore only require hwcaps but there is a new EL0
visible control register FPMR used to control the 8 bit floating point
formats, we need to manage traps for this and context switch it.
Due to uncertainty with the plan for parsing ID registers to identify
which features to expose to the guest the KVM support is placed at the
end of the series, it will need to be revised once that issue is
resolved. The sharing of floating point save code between the host and
guest kernels slightly complicates the introduction of KVM support, we
first introduce host support with some placeholders for KVM then replace
those with the actual KVM support.
I've not added test coverage for ptrace, I've got a test program which
exercises all the FP ptrace interfaces and their interactions together,
my plan is to cover it there rather than add another tiny test program
that duplicates the boilerplace for tracing a target and doesn't
actually run the traced program.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Rebase onto v6.8-rc1.
- Move KVM support to the end of the series.
- Link to v3: https://lore.kernel.org/r/20231205-arm64-2023-dpisa-v3-0-dbcbcd867a7f@kerne…
Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kerne…
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (14):
arm64/cpufeature: Hook new identification registers up to cpufeature
arm64/fpsimd: Enable host kernel access to FPMR
arm64/fpsimd: Support FEAT_FPMR
arm64/signal: Add FPMR signal handling
arm64/ptrace: Expose FPMR via ptrace
arm64/hwcap: Define hwcaps for 2023 DPISA features
kselftest/arm64: Handle FPMR context in generic signal frame parser
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Add 2023 DPISA hwcap test coverage
KVM: arm64: Share all userspace hardened thread data with the hypervisor
KVM: arm64: Add newly allocated ID registers to register descriptions
KVM: arm64: Support FEAT_FPMR for guests
KVM: arm64: selftests: Document feature registers added in 2023 extensions
KVM: arm64: selftests: Teach get-reg-list about FPMR
Documentation/arch/arm64/elf_hwcaps.rst | 49 +++++
arch/arm64/include/asm/cpu.h | 3 +
arch/arm64/include/asm/cpufeature.h | 5 +
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/hwcap.h | 15 ++
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 5 +-
arch/arm64/include/asm/processor.h | 6 +-
arch/arm64/include/uapi/asm/hwcap.h | 15 ++
arch/arm64/include/uapi/asm/sigcontext.h | 8 +
arch/arm64/kernel/cpufeature.c | 72 +++++++
arch/arm64/kernel/cpuinfo.c | 18 ++
arch/arm64/kernel/fpsimd.c | 13 ++
arch/arm64/kernel/ptrace.c | 42 ++++
arch/arm64/kernel/signal.c | 59 ++++++
arch/arm64/kvm/emulate-nested.c | 8 +
arch/arm64/kvm/fpsimd.c | 14 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 9 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 +-
arch/arm64/kvm/sys_regs.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 217 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/fpmr_siginfo.c | 82 ++++++++
.../selftests/arm64/signal/testcases/testcases.c | 8 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 11 +-
28 files changed, 670 insertions(+), 20 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
안녕하세요
스웨덴 스칸디아 엘레바토(Skandia Elevato)에서 온 요아킴 라르손(JOAKIM LARSSON) .
우리는 긴급하게 귀하의 제품을 필요로 하며 가능한 한 빨리 시험 주문을 하고 싶습니다.
온라인으로 제품에 대한 정보를 수집하고 있습니다.
그리고 내 모임에서 나는 우리가 당신의 제품을 주문할 것이라고 생각합니다.
1. 최신 Catalouge를 보낼 수 있습니까?
2. 우리가 주문할 수 있는 최소한은 무엇이고 또한 기간을 보내십시오
및 조건.
3. 우리가 주문하는 경우 지불을 어떻게 해결하기를 원하십니까?
귀하의 회신 대기 중
Mr Joakim larssonv(부사장/영업 관리자)
방문자 주소: Kedumsvägen 14, SE-534 94 Vara, Sweden
배송 주소: Industrivägen, SE-534 94 Vara, Sweden
joakimlarson(a)skendiaelevator.com
https://skandiaelevator.com
selftest harness uses various exit codes to signal test
results. Avoid calling exit() directly, otherwise tests
may get broken by harness refactoring (like the commit
under Fixes). SKIP() will instruct the harness that the
test shouldn't run, it used to not be the case, but that
has been fixed. So just return, no need to exit.
Note that for hmm-tests this actually changes the result
from pass to skip. Which seems fair, the test is skipped,
after all.
Reported-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/all/05f7bf89-04a5-4b65-bf59-c19456aeb1f0@sirena.org…
Fixes: a724707976b0 ("selftests: kselftest_harness: use KSFT_* exit codes")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
This needs to go to net-next because that's where the breaking
patch was (mis?)-applied.
CC: ivan.orlov0322(a)gmail.com
CC: perex(a)perex.cz
CC: tiwai(a)suse.com
CC: broonie(a)kernel.org
CC: shuah(a)kernel.org
CC: jglisse(a)redhat.com
CC: akpm(a)linux-foundation.org
CC: keescook(a)chromium.org
CC: linux-sound(a)vger.kernel.org
CC: linux-kselftest(a)vger.kernel.org
CC: linux-mm(a)kvack.org
---
tools/testing/selftests/alsa/test-pcmtest-driver.c | 4 ++--
tools/testing/selftests/mm/hmm-tests.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/alsa/test-pcmtest-driver.c b/tools/testing/selftests/alsa/test-pcmtest-driver.c
index a52ecd43dbe3..ca81afa4ee90 100644
--- a/tools/testing/selftests/alsa/test-pcmtest-driver.c
+++ b/tools/testing/selftests/alsa/test-pcmtest-driver.c
@@ -127,11 +127,11 @@ FIXTURE_SETUP(pcmtest) {
int err;
if (geteuid())
- SKIP(exit(-1), "This test needs root to run!");
+ SKIP(return, "This test needs root to run!");
err = read_patterns();
if (err)
- SKIP(exit(-1), "Can't read patterns. Probably, module isn't loaded");
+ SKIP(return, "Can't read patterns. Probably, module isn't loaded");
card_name = malloc(127);
ASSERT_NE(card_name, NULL);
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index 20294553a5dd..d2cfc9b494a0 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -138,7 +138,7 @@ FIXTURE_SETUP(hmm)
self->fd = hmm_open(variant->device_number);
if (self->fd < 0 && hmm_is_coherent_type(variant->device_number))
- SKIP(exit(0), "DEVICE_COHERENT not available");
+ SKIP(return, "DEVICE_COHERENT not available");
ASSERT_GE(self->fd, 0);
}
@@ -149,7 +149,7 @@ FIXTURE_SETUP(hmm2)
self->fd0 = hmm_open(variant->device_number0);
if (self->fd0 < 0 && hmm_is_coherent_type(variant->device_number0))
- SKIP(exit(0), "DEVICE_COHERENT not available");
+ SKIP(return, "DEVICE_COHERENT not available");
ASSERT_GE(self->fd0, 0);
self->fd1 = hmm_open(variant->device_number1);
ASSERT_GE(self->fd1, 0);
--
2.44.0
With the removal of the ARCH_NR_GPIOS, the number of available GPIOs
is effectively unlimited, causing the gpio-mockup module load failure
test that overflowed the number of GPIOs to unexpectedly succeed, and
so fail.
The test is no longer relevant so remove it.
Promote the "no lines defined" test so there is still one load
failure test in the basic set.
Fixes: 7b61212f2a07 ("gpiolib: Get rid of ARCH_NR_GPIOS")
Reported-by: Pengfei Xu <pengfei.xu(a)intel.com>
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Closes: https://lore.kernel.org/linux-gpio/ZC6OHBjdwBdT4sSb@xpf.sh.intel.com/
Signed-off-by: Kent Gibson <warthog618(a)gmail.com>
---
tools/testing/selftests/gpio/gpio-mockup.sh | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/gpio/gpio-mockup.sh b/tools/testing/selftests/gpio/gpio-mockup.sh
index 0d6c5f7f95d2..fc2dd4c24d06 100755
--- a/tools/testing/selftests/gpio/gpio-mockup.sh
+++ b/tools/testing/selftests/gpio/gpio-mockup.sh
@@ -377,13 +377,10 @@ if [ "$full_test" ]; then
insmod_test "0,32,32,44,-1,22,-1,31" 32 12 22 31
fi
echo "2. Module load error tests"
-echo "2.1 gpio overflow"
-# Currently: The max number of gpio(1024) is defined in arm architecture.
-insmod_test "-1,1024"
+echo "2.1 no lines defined"
+insmod_test "0,0"
if [ "$full_test" ]; then
- echo "2.2 no lines defined"
- insmod_test "0,0"
- echo "2.3 ignore range overlap"
+ echo "2.2 ignore range overlap"
insmod_test "0,32,0,1" 32
insmod_test "0,32,1,5" 32
insmod_test "0,32,30,35" 32
--
2.39.2
Major changes in v1:
--------------
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Changes in RFC v3:
------------------
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Changes in RFC v2:
------------------
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
----------------------
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Jakub Kicinski (2):
net: page_pool: factor out releasing DMA from releasing the page
net: page_pool: create hooks for custom page providers
Mina Almasry (14):
queue_api: define queue api
gve: implement queue api
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
memory-provider: dmabuf devmem memory provider
page_pool: device memory support
page_pool: don't release iov on elevanted refcount
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 52 ++
Documentation/networking/devmem.rst | 270 ++++++++++
drivers/net/ethernet/google/gve/gve_adminq.c | 6 +-
drivers/net/ethernet/google/gve/gve_adminq.h | 3 +
drivers/net/ethernet/google/gve/gve_dqo.h | 2 +
drivers/net/ethernet/google/gve/gve_main.c | 286 +++++++++++
drivers/net/ethernet/google/gve/gve_rx_dqo.c | 5 +-
include/linux/netdevice.h | 24 +
include/linux/skbuff.h | 56 ++-
include/linux/socket.h | 1 +
include/net/devmem.h | 109 +++++
include/net/netdev_rx_queue.h | 1 +
include/net/page_pool/helpers.h | 162 +++++-
include/net/page_pool/types.h | 48 ++
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 14 +
net/core/datagram.c | 6 +
net/core/dev.c | 314 +++++++++++-
net/core/gro.c | 7 +-
net/core/netdev-genl-gen.c | 19 +
net/core/netdev-genl-gen.h | 2 +
net/core/netdev-genl.c | 124 +++++
net/core/page_pool.c | 239 +++++++--
net/core/skbuff.c | 108 +++-
net/core/sock.c | 38 ++
net/ipv4/tcp.c | 196 +++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 489 +++++++++++++++++++
37 files changed, 2585 insertions(+), 83 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.43.0.472.g3155946c3a-goog