From: Kevin Brodsky <kevin.brodsky(a)arm.com>
[ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ]
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky(a)arm.com: silence unused-result warnings]
Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7)
[Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y, thus backport this commit. It is already
backported to linux-6.13.y by commit d9eb5a1e76f56]
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
tools/testing/selftests/mm/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..c0138cb19705 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -33,9 +33,16 @@ endif
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
+CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is
+# automatically enabled at -O1 or above. This triggers various unused-result
+# warnings where functions such as read() or write() are called and their
+# return value is not checked. Disable _FORTIFY_SOURCE to silence those
+# warnings.
+CFLAGS += -U_FORTIFY_SOURCE
+
TEST_GEN_FILES = cow
TEST_GEN_FILES += compaction_test
TEST_GEN_FILES += gup_longterm
--
2.46.0
The kprobe_multi feature was disabled on ARM64 due to the lack of fprobe
support.
The fprobe rewrite on function_graph has been recently merged and thus
brought support for fprobes on arm64. This then enables kprobe_multi
support on arm64, and so the corresponding tests can now be run on this
architecture.
Remove the tests depending on kprobe_multi from DENYLIST.aarch64 to
allow those to run in CI. CONFIG_FPROBE is already correctly set in
tools/testing/selftests/bpf/config
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
The tests being enabled with this series have been run locally in an
ARM64 qemu environment, and in Github CI.
I only did some testing to ensure that the tests depending on kprobe_multi
now run correctly on arm64, it is fair to stress that all the hard
work has actually been done by M. Hiramatsu ([0])
[0] https://lore.kernel.org/bpf/173518987627.391279.3307342580035322889.stgit@d…
---
tools/testing/selftests/bpf/DENYLIST.aarch64 | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index 901349da680fa67896d279d184db78e964d9ae27..6d8feda27ce9de07d77d6e384666082923e3dc76 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,12 +1,3 @@
-bpf_cookie/multi_kprobe_attach_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
-bpf_cookie/multi_kprobe_link_api # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
-kprobe_multi_bench_attach # needs CONFIG_FPROBE
-kprobe_multi_test # needs CONFIG_FPROBE
-module_attach # prog 'kprobe_multi': failed to auto-attach: -95
fentry_test/fentry_many_args # fentry_many_args:FAIL:fentry_many_args_attach unexpected error: -524
fexit_test/fexit_many_args # fexit_many_args:FAIL:fexit_many_args_attach unexpected error: -524
tracing_struct/struct_many_args # struct_many_args:FAIL:tracing_struct_many_args__attach unexpected error: -524
-fill_link_info/kprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95
-fill_link_info/kretprobe_multi_link_info # bpf_program__attach_kprobe_multi_opts unexpected error: -95
-fill_link_info/kprobe_multi_invalid_ubuff # bpf_program__attach_kprobe_multi_opts unexpected error: -95
-missed/kprobe_recursion # missed_kprobe_recursion__attach unexpected error: -95 (errno 95)
---
base-commit: d3417ac824b98e8773bc04b93e09c4b93c2c6cad
change-id: 20250219-enable_kprobe_multi_tests-c8d53336e5cd
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
This is one of just 3 remaining "Test Module" kselftests (the others
being bitmap and scanf), the rest having been converted to KUnit.
I tested this using:
$ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 printf
I have also sent out a series converting scanf[0].
Link: https://lore.kernel.org/all/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@… [0]
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v4:
- Add patch "implicate test line in failure messages".
- Rebase on linux-next, move scanf_kunit.c into lib/tests/.
- Link to v3: https://lore.kernel.org/r/20250210-printf-kunit-convert-v3-0-ee6ac5500f5e@g…
Changes in v3:
- Remove extraneous trailing newlines from failure messages.
- Replace `pr_warn` with `kunit_warn`.
- Drop arch changes.
- Remove KUnit boilerplate from CONFIG_PRINTF_KUNIT_TEST help text.
- Restore `total_tests` counting.
- Remove tc_fail macro in last patch.
- Link to v2: https://lore.kernel.org/r/20250207-printf-kunit-convert-v2-0-057b23860823@g…
Changes in v2:
- Incorporate code review from prior work[0] by Arpitha Raghunandan.
- Link to v1: https://lore.kernel.org/r/20250204-printf-kunit-convert-v1-0-ecf1b846a4de@g…
Link: https://lore.kernel.org/lkml/20200817043028.76502-1-98.arpi@gmail.com/t/#u [0]
---
Tamir Duberstein (3):
printf: convert self-test to KUnit
printf: break kunit into test cases
printf: implicate test line in failure messages
Documentation/core-api/printk-formats.rst | 4 +-
MAINTAINERS | 2 +-
lib/Kconfig.debug | 12 +-
lib/Makefile | 1 -
lib/tests/Makefile | 1 +
lib/{test_printf.c => tests/printf_kunit.c} | 437 ++++++++++++----------------
tools/testing/selftests/lib/config | 1 -
tools/testing/selftests/lib/printf.sh | 4 -
8 files changed, 200 insertions(+), 262 deletions(-)
---
base-commit: 7b7a883c7f4de1ee5040bd1c32aabaafde54d209
change-id: 20250131-printf-kunit-convert-fd4012aa2ec6
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
Hi all,
Both tc_links.c and tc_opts.c do their tests on the loopback interface.
It prevents from parallelizing their executions.
Add a new behaviour to the test_progs framework that creates and opens a
new network namespace to run a test in it. This is done automatically on
tests whose names start with 'ns_'.
One test already has a name starting with 'ns_', so PATCH 1 renames it
to avoid conflicts. PATCH 2 introduces the test_progs 'feature'.
PATCH 3 & 4 convert some tests to use these dedicated namespaces.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v2:
- Handle the netns creation / opening directly in test_progs
- Link to v1: https://lore.kernel.org/bpf/e3838d93-04e3-4e96-af53-e9e63550d7ba@bootlin.com
---
Bastien Curutchet (eBPF Foundation) (4):
selftests/bpf: ns_current_pid_tgid: Rename the test function
selftests/bpf: Optionally open a dedicated namespace to run test in it
selftests/bpf: tc_links/tc_opts: Unserialize tests
selftests/bpf: ns_current_pid_tgid: Use test_progs's ns_ feature
.../selftests/bpf/prog_tests/ns_current_pid_tgid.c | 47 ++++++++--------------
tools/testing/selftests/bpf/prog_tests/tc_links.c | 28 ++++++-------
tools/testing/selftests/bpf/prog_tests/tc_opts.c | 40 +++++++++---------
tools/testing/selftests/bpf/test_progs.c | 12 ++++++
4 files changed, 63 insertions(+), 64 deletions(-)
---
base-commit: a814b9be27fb3c3f49343aee4b015b76f5875558
change-id: 20250219-b4-tc_links-b6d5bf709e1f
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
[ Background ]
On ARM GIC systems and others, the target address of the MSI is translated
by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
IOMMU is disabled, the MSI address is programmed to the physical location
of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
page is behind the IOMMU, so the MSI address is programmed to an allocated
IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
When a 2-stage translation is enabled, IOVA will be still used to program
the MSI address, though the mappings will be in two stages:
IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> PA (0x20200000)
(IPA stands for Intermediate Physical Address).
If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
IOVA is dynamically allocated from the top of the IOVA space. If attached
to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
of the IOMMU translation (1-stage translation), since the IOVA for the ITS
page is fixed and known by kernel. However, with virtual machine enabling
a nested IOMMU translation (2-stage), a guest kernel directly controls the
stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
kernel can't know that guest-level IOVA to program the MSI address.
There have been two approaches to solve this problem:
1. Create an identity mapping in the stage-1. VMM could insert a few RMRs
(Reserved Memory Regions) in guest's IORT. Then the guest kernel would
fetch these RMR entries from the IORT and create an IOMMU_RESV_DIRECT
region per iommu group for a direct mapping. Eventually, the mappings
would look like: IOVA (0x8000000) === IPA (0x8000000) ===> 0x20200000
This requires an IOMMUFD ioctl for kernel and VMM to agree on the IPA.
2. Forward the guest-level MSI IOVA captured by VMM to the host-level GIC
driver, to program the correct MSI IOVA. Forward the VMM-defined vITS
page location (IPA) to the kernel for the stage-2 mapping. Eventually:
IOVA (0xFFFF0000) ===> IPA (0x80900000) ===> PA (0x20200000)
This requires a VFIO ioctl (for IOVA) and an IOMMUFD ioctl (for IPA).
Worth mentioning that when Eric Auger was working on the same topic with
the VFIO iommu uAPI, he had the approach (2) first, and then switched to
the approach (1), suggested by Jean-Philippe for reduction of complexity.
The approach (1) basically feels like the existing VFIO passthrough that
has a 1-stage mapping for the unmanaged domain, yet only by shifting the
MSI mapping from stage 1 (guest-has-no-iommu case) to stage 2 (guest-has-
iommu case). So, it could reuse the existing IOMMU_RESV_SW_MSI piece, by
sharing the same idea of "VMM leaving everything to the kernel".
The approach (2) is an ideal solution, yet it requires additional effort
for kernel to be aware of the 1-stage gIOVA(s) and 2-stage IPAs for vITS
page(s), which demands VMM to closely cooperate.
* It also brings some complicated use cases to the table where the host
or/and guest system(s) has/have multiple ITS pages.
[ Execution ]
Though these two approaches feel very different on the surface, they can
share some underlying common infrastructure. Currently, only one pair of
sw_msi functions (prepare/compose) are provided by dma-iommu for irqchip
drivers to directly use. There could be different versions of functions
from different domain owners: for existing VFIO passthrough cases and in-
kernel DMA domain cases, reuse the existing dma-iommu's version of sw_msi
functions; for nested translation use cases, there can be another version
of sw_msi functions to handle mapping and msi_msg(s) differently.
As a part-1 supporting the approach (1), i.e. the RMR solution:
- Get rid of the duplication in the "compose" function
- Introduce a function pointer for the previously "prepare" function
- Allow different domain owners to set their own "sw_msi" implementations
- Implement an iommufd_sw_msi function to additionally support a nested
translation use case using the approach (1)
- Add a pair of IOMMUFD options for a SW_MSI window for kernel and VMM to
agree on (for approach 1)
[ Future Plan ]
Part-2 and beyond will continue the effort of supporting the approach (2)
for a complete vITS-to-pITS mapping:
1) Map the phsical ITS page (potentially via IOMMUFD_CMD_IOAS_MAP_MSI)
2) Convey the IOVAs per-irq (potentially via VFIO_IRQ_SET_ACTION_PREPARE)
Note that the set_option uAPI in this series might not fit since this
requires it is an array of MSI IOVAs.)
---
This is a joint effort that includes Jason's rework in irq/iommu/iommufd
base level and my additional patches on top of that for new uAPIs.
This series is on github:
https://github.com/nicolinc/iommufd/commits/iommufd_msi_p1-v1
Pairing QEMU branch for testing (approach 1):
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_msi_p1-v1-rmr
(Note: QEMU virt command no longer requires iommmufd object v.s. RFCv2)
Changelog
v1
* Rebase on v6.14-rc1 and iommufd_attach_handle-v1 series
https://lore.kernel.org/all/cover.1738645017.git.nicolinc@nvidia.com/
* Correct typos
* Replace set_bit with __set_bit
* Use a common helper to get iommufd_handle
* Add kdoc for iommu_msi_iova/iommu_msi_page_shift
* Rename msi_msg_set_msi_addr() to msi_msg_set_addr()
* Update selftest for a better coverage for the new options
* Change IOMMU_OPTION_SW_MSI_START/SIZE to be per-idev and properly
check against device's reserved region list
RFCv2
https://lore.kernel.org/kvm/cover.1736550979.git.nicolinc@nvidia.com/
* Rebase on v6.13-rc6
* Drop all the irq/pci patches and rework the compose function instead
* Add a new sw_msi op to iommu_domain for a per type implementation and
let iommufd core has its own implementation to support both approaches
* Add RMR-solution (approach 1) support since it is straightforward and
have been used in some out-of-tree projects widely
RFCv1
https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Jason Gunthorpe (5):
genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of
iommu_cookie
genirq/msi: Rename iommu_dma_compose_msi_msg() to msi_msg_set_addr()
iommu: Make iommu_dma_prepare_msi() into a generic operation
irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by the irqchips that
need it
iommufd: Implement sw_msi support natively
Nicolin Chen (8):
iommu: Turn fault_data to iommufd private pointer
iommu: Turn iova_cookie to dma-iommu private pointer
iommufd/device: Move sw_msi_start from igroup to idev
iommufd: Pass in idev to iopt_table_enforce_dev_resv_regions
iommufd: Add IOMMU_OPTION_SW_MSI_START/SIZE ioctls
iommufd/selftest: Add MOCK_FLAGS_DEVICE_NO_ATTACH
iommufd/selftest: Add a testing reserved region
iommufd/selftest: Add coverage for IOMMU_OPTION_SW_MSI_START/SIZE
drivers/iommu/Kconfig | 1 -
drivers/irqchip/Kconfig | 4 +
kernel/irq/Kconfig | 1 +
drivers/iommu/iommufd/iommufd_private.h | 29 ++-
drivers/iommu/iommufd/iommufd_test.h | 4 +
include/linux/iommu.h | 58 ++++--
include/linux/msi.h | 47 +++--
include/uapi/linux/iommufd.h | 20 +-
drivers/iommu/dma-iommu.c | 63 ++----
drivers/iommu/iommu.c | 29 +++
drivers/iommu/iommufd/device.c | 196 ++++++++++++++----
drivers/iommu/iommufd/fault.c | 2 +-
drivers/iommu/iommufd/hw_pagetable.c | 5 +-
drivers/iommu/iommufd/io_pagetable.c | 18 +-
drivers/iommu/iommufd/ioas.c | 97 +++++++++
drivers/iommu/iommufd/main.c | 13 ++
drivers/iommu/iommufd/selftest.c | 41 +++-
drivers/irqchip/irq-gic-v2m.c | 5 +-
drivers/irqchip/irq-gic-v3-its.c | 13 +-
drivers/irqchip/irq-gic-v3-mbi.c | 12 +-
drivers/irqchip/irq-ls-scfg-msi.c | 5 +-
tools/testing/selftests/iommu/iommufd.c | 97 +++++++++
.../selftests/iommu/iommufd_fail_nth.c | 21 ++
23 files changed, 608 insertions(+), 173 deletions(-)
base-commit: 2b5bc8c9425fd87e094a08f72498536133da80e1
--
2.43.0
Hi all,
Thank you for your review comments. Here is an updated patch series with
the requested changes.
To add a selftest for the metadata support of the tun driver, I refactored
an existing "xdp_context_functional" test which already tested something
similar but for the veth driver. I made the testing logic behind it more
reusable so that it also works for the tun driver and possibly other
drivers in the future.
The last patch ("fix file descriptor assertion in open_tuntap helper")
fixes an assertion in an existing helper function that I just moved and
reused. Somehow the file descriptor for /dev/net/tun turned out to be 0
when running in the BPF kernel-patches GitHub CI, so the assert condition
needed adjustment:
https://github.com/kernel-patches/bpf/actions/runs/13339140896
Successful pipeline:
https://github.com/kernel-patches/bpf/actions/runs/13372306548
---
v2:
- submit against bpf-next subtree
- split commits and improved commit messages
- remove redundant metasize check and add clarifying comment instead
- use max() instead of ternary operator
- add selftest for metadata support in the tun driver
v1: https://lore.kernel.org/all/20250130171614.1657224-1-marcus.wichelmann@hetz…
Marcus Wichelmann (6):
net: tun: enable XDP metadata support
net: tun: enable transfer of XDP metadata to skb
selftests/bpf: move open_tuntap to network helpers
selftests/bpf: refactor xdp_context_functional test and bpf program
selftests/bpf: add test for XDP metadata support in tun driver
selftests/bpf: fix file descriptor assertion in open_tuntap helper
drivers/net/tun.c | 24 ++-
tools/testing/selftests/bpf/network_helpers.c | 28 ++++
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../selftests/bpf/prog_tests/lwt_helpers.h | 29 ----
.../bpf/prog_tests/xdp_context_test_run.c | 152 +++++++++++++++---
.../selftests/bpf/progs/test_xdp_meta.c | 56 ++++---
6 files changed, 215 insertions(+), 77 deletions(-)
--
2.43.0
From: Rafael Aquini <raquini(a)redhat.com>
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64
or bigger:
...
# ------------------------------------
# running ./uffd-stress hugetlb 128 32
# ------------------------------------
# ERROR: invalid MiB (errno=9, @uffd-stress.c:459)
...
# [FAIL]
not ok 3 uffd-stress hugetlb 128 32 # exit=1
...
The problem boils down to how run_vmtests.sh (mis)calculates the size
of the region it feeds to uffd-stress. The latter expects to see an
amount of MiB while the former is just giving out the number of free
hugepages halved down. This measurement discrepancy ends up violating
uffd-stress' assertion on number of hugetlb pages allocated per CPU,
causing it to bail out with the error above.
This commit fixes that issue by adjusting run_vmtests.sh's half_ufd_size_MB
calculation so it properly renders the region size in MiB, as expected,
while maintaining all of its original constraints in place.
Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation")
Signed-off-by: Rafael Aquini <raquini(a)redhat.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 333c468c2699..157d07e5aaa3 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -304,7 +304,9 @@ uffd_stress_bin=./uffd-stress
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16
# Hugetlb tests require source and destination huge pages. Pass in half
# the size of the free pages we have, which is used for *each*.
-half_ufd_size_MB=$((freepgs / 2))
+# uffd-stress expects a region expressed in MiB, so we adjust
+# half_ufd_size_MB accordingly.
+half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2))
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16
--
2.47.0
In this patch seried, modified kvm selftests code to enable
guest code to run in vEL2(As guest Hypervisor).
Also added test cases to test guest code booting in vEL2
and register access of VNCR mapped registers.
This patchset is created as per discussions over ml[1].
Posting RFC patch for the early feedback and to
further explore requirements and test cases.
Ganapatrao Kulkarni (2):
KVM: arm64: nv: selftests: Add guest hypervisor test
KVM: arm64: nv: selftests: Access VNCR mapped registers
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../selftests/kvm/arm64/nv_guest_hypervisor.c | 83 ++++++
.../selftests/kvm/arm64/nv_vncr_regs_test.c | 255 ++++++++++++++++++
.../kvm/include/arm64/kvm_util_arch.h | 3 +
.../selftests/kvm/include/arm64/nv_util.h | 28 ++
.../testing/selftests/kvm/include/kvm_util.h | 1 +
.../selftests/kvm/lib/arm64/processor.c | 59 +++-
7 files changed, 417 insertions(+), 14 deletions(-)
create mode 100644 tools/testing/selftests/kvm/arm64/nv_guest_hypervisor.c
create mode 100644 tools/testing/selftests/kvm/arm64/nv_vncr_regs_test.c
create mode 100644 tools/testing/selftests/kvm/include/arm64/nv_util.h
--
2.48.1
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
To: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..701719b320049
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader
+ * all other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Patch set splited into part I (this one)
- net: pktgen: replace ENOTSUPP with EOPNOTSUPP
- net: pktgen: enable 'param=value' parsing
- net: pktgen: fix hex32_arg parsing for short reads
- net: pktgen: fix 'rate 0' error handling (return -EINVAL)
- net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
- net: pktgen: fix ctrl interface command parsing
- net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
And part II (will follow):
- net: pktgen: use defines for the various dec/hex number parsing digits lengths
- net: pktgen: fix mix of int/long
- net: pktgen: remove extra tmp variable (re-use len instead)
- net: pktgen: remove some superfluous variable initializing
- net: pktgen: fix mpls maximum labels list parsing
- net: pktgen: fix access outside of user given buffer in pktgen_if_write()
- net: pktgen: fix mpls reset parsing
- net: pktgen: remove all superfluous index assignements
- selftest: net: add proc_net_pktgen
Regards,
Peter
Changes v4 -> v5:
- split up patchset into part i/ii (suggested by Simon Horman)
Changes v3 -> v4:
- add rev-by Simon Horman
- new patch 'net: pktgen: use defines for the various dec/hex number parsing
digits lengths' (suggested by Simon Horman)
- replace C99 comment (suggested by Paolo Abeni)
- drop available characters check in strn_len() (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: align some variable declarations to the
most common pattern' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove extra tmp variable (re-use len
instead)' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove some superfluous variable
initializing' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: fix mpls maximum labels list parsing'
(suggested by Paolo Abeni)
- factored out 'net: pktgen: hex32_arg/num_arg error out in case no
characters are available' (suggested by Paolo Abeni)
- factored out 'net: pktgen: num_arg error out in case no valid character
is parsed' (suggested by Paolo Abeni)
Changes v2 -> v3:
- new patch: 'net: pktgen: fix ctrl interface command parsing'
- new patch: 'net: pktgen: fix mpls reset parsing'
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo in change description ('v1 -> v1' and tyop)
- rename some vars to better match usage
add_loopback_0 -> thr_cmd_add_loopback_0
rm_loopback_0 -> thr_cmd_rm_loopback_0
wrong_ctrl_cmd -> wrong_thr_cmd
legacy_ctrl_cmd -> legacy_thr_cmd
ctrl_fd -> thr_fd
- add ctrl interface tests
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix tyop not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (8):
net: pktgen: replace ENOTSUPP with EOPNOTSUPP
net: pktgen: enable 'param=value' parsing
net: pktgen: fix hex32_arg parsing for short reads
net: pktgen: fix 'rate 0' error handling (return -EINVAL)
net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
net: pktgen: fix ctrl interface command parsing
net: pktgen: fix access outside of user given buffer in
pktgen_thread_write()
net: pktgen: use defines for the various dec/hex number parsing digits
lengths
net/core/pktgen.c | 119 +++++++++++++++++++++++++---------------------
1 file changed, 66 insertions(+), 53 deletions(-)
--
2.48.1
Hi all,
Both tc_links.c and tc_opts.c do their tests on the loopback interface.
It prevents from parallelizing their executions.
Use namespaces and the new append_tid() helper to allow this
parallelization.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Bastien Curutchet (eBPF Foundation) (3):
selftests/bpf: tc_helpers: Add create_and_open_tid_ns()
selftests/bpf: tc_link/tc_opts: Use unique namespace
selftests/bpf: tc_links/tc_opts: Serialize tests
.../testing/selftests/bpf/prog_tests/tc_helpers.h | 12 ++
tools/testing/selftests/bpf/prog_tests/tc_links.c | 164 +++++++++++++--
tools/testing/selftests/bpf/prog_tests/tc_opts.c | 230 ++++++++++++++++++---
3 files changed, 361 insertions(+), 45 deletions(-)
---
base-commit: cfed0f474a4bb2f12b54de5d6a7301cfb7dc0dbd
change-id: 20250128-tc_links-d894a23b7063
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
From: Kevin Brodsky <kevin.brodsky(a)arm.com>
[ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ]
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky(a)arm.com: silence unused-result warnings]
Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7)
[Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y and linux-6.13.y, thus backport this commit]
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
tools/testing/selftests/mm/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..c0138cb19705 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -33,9 +33,16 @@ endif
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
+CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is
+# automatically enabled at -O1 or above. This triggers various unused-result
+# warnings where functions such as read() or write() are called and their
+# return value is not checked. Disable _FORTIFY_SOURCE to silence those
+# warnings.
+CFLAGS += -U_FORTIFY_SOURCE
+
TEST_GEN_FILES = cow
TEST_GEN_FILES += compaction_test
TEST_GEN_FILES += gup_longterm
--
2.46.0
This patch adds commas to clarify sentence structure:
- "To confirm look for" --> "To confirm, look for"
- "If you do remove this file" --> "If you do, remove this file"
Signed-off-by: Brian Ochoa <brianeochoa(a)gmail.com>
---
tools/testing/selftests/firmware/fw_fallback.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/firmware/fw_fallback.sh b/tools/testing/selftests/firmware/fw_fallback.sh
index 70d18be46af5..cd1ff88feb28 100755
--- a/tools/testing/selftests/firmware/fw_fallback.sh
+++ b/tools/testing/selftests/firmware/fw_fallback.sh
@@ -173,13 +173,13 @@ test_syfs_timeout()
echo ""
echo "This might be a distribution udev rule setup by your distribution"
echo "to immediately cancel all fallback requests, this must be"
- echo "removed before running these tests. To confirm look for"
+ echo "removed before running these tests. To confirm, look for"
echo "a firmware rule like /lib/udev/rules.d/50-firmware.rules"
echo "and see if you have something like this:"
echo ""
echo "SUBSYSTEM==\"firmware\", ACTION==\"add\", ATTR{loading}=\"-1\""
echo ""
- echo "If you do remove this file or comment out this line before"
+ echo "If you do, remove this file or comment out this line before"
echo "proceeding with these tests."
exit 1
fi
--
2.34.1
Fix few spelling mistakes in net selftests
Signed-off-by: Chandra Mohan Sundar <chandru.dav(a)gmail.com>
---
tools/testing/selftests/net/fcnal-test.sh | 4 ++--
tools/testing/selftests/net/fdb_flush.sh | 2 +-
tools/testing/selftests/net/fib_nexthops.sh | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh
index 899dbad0104b..4fcc38907e48 100755
--- a/tools/testing/selftests/net/fcnal-test.sh
+++ b/tools/testing/selftests/net/fcnal-test.sh
@@ -3667,7 +3667,7 @@ ipv6_addr_bind_novrf()
# when it really should not
a=${NSA_LO_IP6}
log_start
- show_hint "Tecnically should fail since address is not on device but kernel allows"
+ show_hint "Technically should fail since address is not on device but kernel allows"
run_cmd nettest -6 -s -l ${a} -I ${NSA_DEV} -t1 -b
log_test_addr ${a} $? 0 "TCP socket bind to out of scope local address"
}
@@ -3724,7 +3724,7 @@ ipv6_addr_bind_vrf()
# passes when it really should not
a=${VRF_IP6}
log_start
- show_hint "Tecnically should fail since address is not on device but kernel allows"
+ show_hint "Technically should fail since address is not on device but kernel allows"
run_cmd nettest -6 -s -l ${a} -I ${NSA_DEV} -t1 -b
log_test_addr ${a} $? 0 "TCP socket bind to VRF address with device bind"
diff --git a/tools/testing/selftests/net/fdb_flush.sh b/tools/testing/selftests/net/fdb_flush.sh
index d5e3abb8658c..9931a1e36e3d 100755
--- a/tools/testing/selftests/net/fdb_flush.sh
+++ b/tools/testing/selftests/net/fdb_flush.sh
@@ -583,7 +583,7 @@ vxlan_test_flush_by_remote_attributes()
$IP link del dev vx10
$IP link add name vx10 type vxlan dstport "$VXPORT" external
- # For multicat FDB entries, the VXLAN driver stores a linked list of
+ # For multicast FDB entries, the VXLAN driver stores a linked list of
# remotes for a given key. Verify that only the expected remotes are
# flushed.
multicast_fdb_entries_add
diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index 77c83d9508d3..bea1282e0281 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -741,7 +741,7 @@ ipv6_fcnal()
run_cmd "$IP nexthop add id 52 via 2001:db8:92::3"
log_test $? 2 "Create nexthop - gw only"
- # gw is not reachable throught given dev
+ # gw is not reachable through given dev
run_cmd "$IP nexthop add id 53 via 2001:db8:3::3 dev veth1"
log_test $? 2 "Create nexthop - invalid gw+dev combination"
--
2.43.0
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.
This patch fixes this behavior by skipping the non-existing key, and
does not return EINTR any more.
V2->V3: deleted a unused macro
V1->V2: split the fix and selftests; fixed a few selftests issues.
V2: https://lore.kernel.org/bpf/cover.1738905497.git.yan@cloudflare.com/
V1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
Yan Zhai (2):
bpf: skip non exist keys in generic_map_lookup_batch
selftests: bpf: test batch lookup on array of maps with holes
kernel/bpf/syscall.c | 18 ++----
.../bpf/map_tests/map_in_map_batch_ops.c | 62 +++++++++++++------
2 files changed, 49 insertions(+), 31 deletions(-)
--
2.39.5
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 & 2 Rework test_xdp_veth.c to avoid using the root namespace
- PATCH 3 and 4 cover the broadcast flags
- PATCH 5 covers the XDP egress programs
NOTE: While working on this iteration I ran into a memory leak in
net/core/rtnetlink.c that leads to oom-kill when running ./test_progs in
a loop. This leak has been fixed by commit 1438f5d07b9a ("rtnetlink:
fix netns leak with rtnl_setlink()") in the net tree.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v5:
- Remove the patches that were applied from previous iteration
- Add PATCH 1 & 2 to avoid using the root namespace so the veth indexes
don't get incremented on every ./test_progs call
- PATCH 3: Remove unnecessary <linux/ip.h> header
- Link to v4: https://lore.kernel.org/r/20250131-redirect-multi-v4-0-970b33678512@bootlin…
Changes in v4:
- Remove the NO_IP #define
- append_tid() takes string's size as input to ensure there is enough
space to fit the thread ID at the end
- Fix PATCH 12's commit log
- Link to v3: https://lore.kernel.org/r/20250128-redirect-multi-v3-0-c1ce69997c01@bootlin…
Changes in v3:
- Add append_tid() helper and use unique names to allow parallel testing
- Check create_network()'s return value through ASSERT_OK()
- Remove check_ping() and unused defines
- Change next_veth type (from string to int)
- Link to v2: https://lore.kernel.org/r/20250121-redirect-multi-v2-0-fc9cacabc6b2@bootlin…
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (6):
selftests/bpf: test_xdp_veth: Create struct net_configuration
selftests/bpf: test_xdp_veth: Use a dedicated namespace
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 435 ++++++++++++++++++---
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 88 +++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 ----------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 -----------
6 files changed, 491 insertions(+), 515 deletions(-)
---
base-commit: 36ab3d3de536753a4b9b2b4c4ce26af41917a378
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
I've removed the RFC tag from this version of the series, but the items
that I'm looking for feedback on remains the same:
- The userspace ABI, in particular:
- The vector length used for the SVE registers, access to the SVE
registers and access to ZA and (if available) ZT0 depending on
the current state of PSTATE.{SM,ZA}.
- The use of a single finalisation for both SVE and SME.
- The addition of control for enabling fine grained traps in a similar
manner to FGU but without the UNDEF, I'm not clear if this is desired
at all and at present this requires symmetric read and write traps like
FGU. That seemed like it might be desired from an implementation
point of view but we already have one case where we enable an
asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it
seems generally useful to enable asymmetrically.
This series implements support for SME use in non-protected KVM guests.
Much of this is very similar to SVE, the main additional challenge that
SME presents is that it introduces a new vector length similar to the
SVE vector length and two new controls which change the registers seen
by guests:
- PSTATE.ZA enables the ZA matrix register and, if SME2 is supported,
the ZT0 LUT register.
- PSTATE.SM enables streaming mode, a new floating point mode which
uses the SVE register set with the separately configured SME vector
length. In streaming mode implementation of the FFR register is
optional.
It is also permitted to build systems which support SME without SVE, in
this case when not in streaming mode no SVE registers or instructions
are available. Further, there is no requirement that there be any
overlap in the set of vector lengths supported by SVE and SME in a
system, this is expected to be a common situation in practical systems.
Since there is a new vector length to configure we introduce a new
feature parallel to the existing SVE one with a new pseudo register for
the streaming mode vector length. Due to the overlap with SVE caused by
streaming mode rather than finalising SME as a separate feature we use
the existing SVE finalisation to also finalise SME, a new define
KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising
SVE and SME separately would introduce complication with register access
since finalising SVE makes the SVE regsiters writeable by userspace and
doing multiple finalisations results in an error being reported.
Dealing with a state where the SVE registers are writeable due to one of
SVE or SME being finalised but may have their VL changed by the other
being finalised seems like needless complexity with minimal practical
utility, it seems clearer to just express directly that only one
finalisation can be done in the ABI.
Access to the floating point registers follows the architecture:
- When both SVE and SME are present:
- If PSTATE.SM == 0 the vector length used for the Z and P registers
is the SVE vector length.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
is the SME vector length.
- If only SME is present:
- If PSTATE.SM == 0 the Z and P registers are inaccessible and the
floating point state accessed via the encodings for the V registers.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
- The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1.
The VMM must understand this, in particular when loading state SVCR
should be configured before other state.
There are a large number of subfeatures for SME, most of which only
offer additional instructions but some of which (SME2 and FA64) add
architectural state. These are configured via the ID registers as per
usual.
The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been
added to the get-reg-list selftest, the idea of supporting additional
features there without restructuring the program to generate all
possible feature combinations has been rejected. I will post a separate
series which does that restructuring.
No support is present for protected guests, this is expected to be added
separately, the series is already rather large and pKVM in general
offers a subset of features.
This series is based on Mark Rutland's fix series:
https://lore.kernel.org/r/20250210195226.1215254-1-mark.rutland@arm.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Rebase onto v6.14-rc2 and Mark Rutland's fixes.
- Expose SME to nested guests.
- Additional cleanups and test fixes following on from the rebase.
- Link to v3: https://lore.kernel.org/r/20241220-kvm-arm64-sme-v3-0-05b018c1ffeb@kernel.o…
Changes in v3:
- Rebase onto v6.12-rc2.
- Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.o…
Changes in v2:
- Rebase onto v6.7-rc3.
- Configure subfeatures based on host system only.
- Complete nVHE support.
- There was some snafu with sending v1 out, it didn't make it to the
lists but in case it hit people's inboxes I'm sending as v2.
---
Mark Brown (27):
arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state
arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time
arm64/fpsimd: Check enable bit for FA64 when saving EFI state
arm64/fpsimd: Determine maximum virtualisable SME vector length
KVM: arm64: Introduce non-UNDEF FGT control
KVM: arm64: Pay attention to FFR parameter in SVE save and load
KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h
KVM: arm64: Move SVE state access macros after feature test macros
KVM: arm64: Rename SVE finalization constants to be more general
KVM: arm64: Document the KVM ABI for SME
KVM: arm64: Define internal features for SME
KVM: arm64: Rename sve_state_reg_region
KVM: arm64: Store vector lengths in an array
KVM: arm64: Implement SME vector length configuration
KVM: arm64: Support SME control registers
KVM: arm64: Support TPIDR2_EL0
KVM: arm64: Support SME identification registers for guests
KVM: arm64: Support SME priority registers
KVM: arm64: Provide assembly for SME state restore
KVM: arm64: Support userspace access to streaming mode Z and P registers
KVM: arm64: Expose SME specific state to userspace
KVM: arm64: Context switch SME state for normal guests
KVM: arm64: Handle SME exceptions
KVM: arm64: Expose SME to nested guests
KVM: arm64: Provide interface for configuring and enabling SME for guests
KVM: arm64: selftests: Add SME system registers to get-reg-list
KVM: arm64: selftests: Add SME to set_id_regs test
Documentation/virt/kvm/api.rst | 117 +++++++---
arch/arm64/include/asm/fpsimd.h | 22 ++
arch/arm64/include/asm/kvm_emulate.h | 12 +-
arch/arm64/include/asm/kvm_host.h | 143 ++++++++++---
arch/arm64/include/asm/kvm_hyp.h | 4 +-
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 33 +++
arch/arm64/kernel/cpufeature.c | 2 -
arch/arm64/kernel/fpsimd.c | 86 ++++----
arch/arm64/kvm/arm.c | 10 +
arch/arm64/kvm/fpsimd.c | 19 +-
arch/arm64/kvm/guest.c | 262 ++++++++++++++++++++---
arch/arm64/kvm/handle_exit.c | 14 ++
arch/arm64/kvm/hyp/fpsimd.S | 18 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 141 ++++++++++--
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 77 ++++---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 +-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +-
arch/arm64/kvm/hyp/nvhe/switch.c | 11 +-
arch/arm64/kvm/hyp/vhe/switch.c | 21 +-
arch/arm64/kvm/nested.c | 3 +-
arch/arm64/kvm/reset.c | 154 +++++++++----
arch/arm64/kvm/sys_regs.c | 118 +++++++++-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/arm64/get-reg-list.c | 32 ++-
tools/testing/selftests/kvm/arm64/set_id_regs.c | 29 ++-
27 files changed, 1078 insertions(+), 268 deletions(-)
---
base-commit: 6a25088d268ce4c2163142ead7fe1975bb687cb7
change-id: 20230301-kvm-arm64-sme-06a1246d3636
prerequisite-message-id: 20250210195226.1215254-1-mark.rutland(a)arm.com
prerequisite-patch-id: 615ab9c526e9f6242bd5b8d7188e96fb66fb0e64
prerequisite-patch-id: e5c4f2ff9c9ba01a0f659dd1e8bf6396de46e197
prerequisite-patch-id: 0794d28526755180847841c045a6b7cb3d800c16
prerequisite-patch-id: 079d3a8a680f793b593268eeba000acc55a0b4ec
prerequisite-patch-id: a3428f67a5ee49f2b01208f30b57984d5409d8f5
prerequisite-patch-id: 26393e401e9eae7cff5bb1d3bdb18b4e29ffc8fe
prerequisite-patch-id: 64f9819f751da4a1c73b9d1b292ccee6afda89f6
prerequisite-patch-id: 0355baaa8ceb31dc85d015b56084c33416f78041
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Tejun Heo <tj(a)kernel.org>
[ Upstream commit e9fe182772dcb2630964724fd93e9c90b68ea0fd ]
dsp_local_on has several incorrect assumptions, one of which is that
p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task
is scheduled out while migration is disabled - p->cpus_ptr is temporarily
overridden to the previous CPU while p->nr_cpus_allowed remains unchanged.
This led to sporadic test faliures when dsp_local_on_dispatch() tries to put
a migration disabled task to a different CPU. Fix it by keeping the previous
CPU when migration is disabled.
There are SCX schedulers that make use of p->nr_cpus_allowed. They should
also implement explicit handling for p->migration_disabled.
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai(a)pm.me>
Cc: Andrea Righi <arighi(a)nvidia.com>
Cc: Changwoo Min <changwoo(a)igalia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
index c9a2da0575a0f..eea06decb6f59 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
@@ -43,7 +43,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
if (!p)
return;
- if (p->nr_cpus_allowed == nr_cpus)
+ if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
target = bpf_get_prandom_u32() % nr_cpus;
else
target = scx_bpf_task_cpu(p);
--
2.39.5
From: Tejun Heo <tj(a)kernel.org>
[ Upstream commit e9fe182772dcb2630964724fd93e9c90b68ea0fd ]
dsp_local_on has several incorrect assumptions, one of which is that
p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task
is scheduled out while migration is disabled - p->cpus_ptr is temporarily
overridden to the previous CPU while p->nr_cpus_allowed remains unchanged.
This led to sporadic test faliures when dsp_local_on_dispatch() tries to put
a migration disabled task to a different CPU. Fix it by keeping the previous
CPU when migration is disabled.
There are SCX schedulers that make use of p->nr_cpus_allowed. They should
also implement explicit handling for p->migration_disabled.
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai(a)pm.me>
Cc: Andrea Righi <arighi(a)nvidia.com>
Cc: Changwoo Min <changwoo(a)igalia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
index fbda6bf546712..758b479bd1ee1 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
@@ -43,7 +43,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
if (!p)
return;
- if (p->nr_cpus_allowed == nr_cpus)
+ if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
target = bpf_get_prandom_u32() % nr_cpus;
else
target = scx_bpf_task_cpu(p);
--
2.39.5
The quiet infrastructure was moved out of Makefile.build to accomidate
the new syscall table generation scripts in perf. Syscall table
generation wanted to also be able to be quiet, so instead of again
copying the code to set the quiet variables, the code was moved into
Makefile.perf to be used globally. This was not the right solution. It
should have been moved even further upwards in the call chain.
Makefile.include is imported in many files so this seems like a proper
place to put it.
To:
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Charlie Jenkins (2):
tools: Unify top-level quiet infrastructure
tools: Remove redundant quiet setup
tools/arch/arm64/tools/Makefile | 6 -----
tools/bpf/Makefile | 6 -----
tools/bpf/bpftool/Documentation/Makefile | 6 -----
tools/bpf/bpftool/Makefile | 6 -----
tools/bpf/resolve_btfids/Makefile | 2 --
tools/bpf/runqslower/Makefile | 5 +---
tools/build/Makefile | 8 +-----
tools/lib/bpf/Makefile | 13 ----------
tools/lib/perf/Makefile | 13 ----------
tools/lib/thermal/Makefile | 13 ----------
tools/objtool/Makefile | 6 -----
tools/perf/Makefile.perf | 41 -------------------------------
tools/scripts/Makefile.include | 31 ++++++++++++++++++++++-
tools/testing/selftests/bpf/Makefile.docs | 6 -----
tools/testing/selftests/hid/Makefile | 2 --
tools/thermal/lib/Makefile | 13 ----------
tools/tracing/latency/Makefile | 6 -----
tools/tracing/rtla/Makefile | 6 -----
tools/verification/rv/Makefile | 6 -----
19 files changed, 32 insertions(+), 163 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250203-quiet_tools-9a6ea9d65a19
--
- Charlie
Hi Matthew,
Can you please take a look at Patch 1 and let me know if the new xarray
function looks good to you? I will send __filemap_add_folio() and
shmem_split_large_entry() changes separately.
Hi all,
This patchset adds a new buddy allocator like (or non-uniform) large folio
split from a order-n folio to order-m with m < n. It reduces
1. the total number of after-split folios from 2^(n-m) to n-m+1;
2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to
n/6-m/6, assuming XA_CHUNK_SHIFT=6;
3. keep more large folios after a split from all order-m folios to
order-(n-1) to order-m folios.
For example, to split an order-9 to order-0, folio split generates 10
(or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node
instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0
folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios.
It is on top of mm-everything-2025-02-07-05-27 with V6 reverted. It is ready to
be merged.
Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split(). __folio_split()
can support both uniform split and buddy allocator like (or non-uniform) split.
All existing split_huge_page*() users can be gradually converted to use
folio_split() if possible. In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs.
Changelog
===
From V6[8]:
1. Added an xarray function xas_try_split() to support iterative folio split,
removing the need of using xas_split_alloc() and xas_split(). The
function guarantees that at most one xa_node is allocated for each
call.
2. Added concrete numbers of after-split folios and xa_node savings to
cover letter, commit log. (per Andrew)
From V5[7]:
1. Split shmem to any lower order patches are in mm tree, so dropped
from this series.
2. Rename split_folio_at() to try_folio_split() to clarify that
non-uniform split will not be used if it is not supported.
From V4[6]:
1. Enabled shmem support in both uniform and buddy allocator like split
and added selftests for it.
2. Added functions to check if uniform split and buddy allocator like
split are supported for the given folio and order.
3. Made truncate fall back to uniform split if buddy allocator split is
not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio).
4. Added the missing folio_clear_has_hwpoisoned() to
__split_unmapped_folio().
From V3[5]:
1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra
operations inside xas_split_alloc() are needed for correctness.
2. Enabled folio_split() for shmem and no issue was found with xfstests
quick test group.
3. Split both ends of a truncate range in truncate_inode_partial_folio()
to avoid wasting memory in shmem truncate (per David Hildenbrand).
4. Removed page_in_folio_offset() since page_folio() does the same
thing.
5. Finished truncate related tests from xfstests quick test group on XFS and
tmpfs without issues.
6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS
and FS without large folio. This check was missed in the prior
versions.
From V2[3]:
1. Incorporated all the feedback from Kirill[4].
2. Used GFP_NOWAIT for xas_nomem().
3. Tested the code path when xas_nomem() fails.
4. Added selftests for folio_split().
5. Fixed no THP config build error.
From V1[2]:
1. Split the original patch 1 into multiple ones for easy review (per
Kirill).
2. Added xas_destroy() to avoid memory leak.
3. Fixed nr_dropped not used error (per kernel test robot).
4. Added proper error handling when xas_nomem() fails to allocate memory
for xas_split() during buddy allocator like split.
From RFC[1]:
1. Merged backend code of split_huge_page_to_list_to_order() and
folio_split(). The same code is used for both uniform split and buddy
allocator like split.
2. Use xas_nomem() instead of xas_split_alloc() for folio_split().
3. folio_split() now leaves the first after-split folio unlocked,
instead of the one containing the given page, since
the caller of truncate_inode_partial_folio() locks and unlocks the
first folio.
4. Extended split_huge_page debugfs to use folio_split().
5. Added truncate_inode_partial_folio() as first user of folio_split().
Design
===
folio_split() splits a large folio in the same way as buddy allocator
splits a large free page for allocation. The purpose is to minimize the
number of folios after the split. For example, if user wants to free the
3rd subpage in a order-9 folio, folio_split() will split the order-9 folio
as:
O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon
O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache
Since anon folio does not support order-1 yet.
The split process is similar to existing approach:
1. Unmap all page mappings (split PMD mappings if exist);
2. Split meta data like memcg, page owner, page alloc tag;
3. Copy meta data in struct folio to sub pages, but instead of spliting
the whole folio into multiple smaller ones with the same order in a
shot, this approach splits the folio iteratively. Taking the example
above, this approach first splits the original order-9 into two order-8,
then splits left part of order-8 to two order-7 and so on;
4. Post-process split folios, like write mapping->i_pages for pagecache,
adjust folio refcounts, add split folios to corresponding list;
5. Remap split folios
6. Unlock split folios.
__split_unmapped_folio() and __split_folio_to_order() replace
__split_huge_page() and __split_huge_page_tail() respectively.
__split_unmapped_folio() uses different approaches to perform
uniform split and buddy allocator like split:
1. uniform split: one single call to __split_folio_to_order() is used to
uniformly split the given folio. All resulting folios are put back to
the list after split. The folio containing the given page is left to
caller to unlock and others are unlocked.
2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls
to __split_folio_to_order() are used to split the given folio at order N to
order N-1. After each call, the target folio is changed to the one
containing the page, which is given as a folio_split() parameter.
After each call, folios not containing the page are put back to the list.
The folio containing the page is put back to the list when its order
is new_order. All folios are unlocked except the first folio, which
is left to caller to unlock.
Patch Overview
===
1. Patch 1 added a new xarray function xas_try_split() to perform
iterative xarray split.
2. Patch 2 added __split_unmapped_folio() and __split_folio_to_order() to
prepare for moving to new backend split code.
3. Patch 3 moved common code in split_huge_page_to_list_to_order() to
__folio_split().
4. Patch 4 added new folio_split() and made
split_huge_page_to_list_to_order() share the new
__split_unmapped_folio() with folio_split().
5. Patch 5 removed no longer used __split_huge_page() and
__split_huge_page_tail().
6. Patch 6 added a new in_folio_offset to split_huge_page debugfs for
folio_split() test.
7. Patch 7 used try_folio_split() for truncate operation.
8. Patch 8 added folio_split() tests.
Any comments and/or suggestions are welcome. Thanks.
[1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/
[2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/
[4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc…
[5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/
[6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/
[7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/
[8] https://lore.kernel.org/linux-mm/20250205031417.1771278-1-ziy@nvidia.com/
Zi Yan (8):
xarray: add xas_try_split() to split a multi-index entry.
mm/huge_memory: add two new (not yet used) functions for folio_split()
mm/huge_memory: move folio split common code to __folio_split()
mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
mm/huge_memory: remove the old, unused __split_huge_page()
mm/huge_memory: add folio_split() to debugfs testing interface.
mm/truncate: use buddy allocator like folio split for truncate
operation.
selftests/mm: add tests for folio_split(), buddy allocator like split.
Documentation/core-api/xarray.rst | 14 +-
include/linux/huge_mm.h | 36 +
include/linux/xarray.h | 7 +
lib/test_xarray.c | 47 ++
lib/xarray.c | 136 +++-
mm/huge_memory.c | 751 ++++++++++++------
mm/truncate.c | 31 +-
tools/testing/radix-tree/Makefile | 1 +
.../selftests/mm/split_huge_page_test.c | 34 +-
9 files changed, 772 insertions(+), 285 deletions(-)
--
2.47.2
Hi all,
After much delay, v6 of the KUnit/Rust integration patchset is here.
This change incorporates most of Miguels suggestions from v5 (save for
some of the copyright headers I wasn't comfortable unilaterally
changing). This means the documentation is much improved, and it should
work more cleanly on Rust 1.83 and 1.84, no longer requiring
static_mut_refs or const_mut_refs. (I'm not 100% sure I understand all
of the details of this, but I'm comfortable enough with how it's ended
up.)
This has been rebased against 6.14-rc1/rust-next, and should be able to
comfortably go in via either the KUnit or Rust trees. My suspicion is
that there's more likely to be conflicts with the Rust work (due to the
changes in rust/macros/lib.rs) than with KUnit, where there are no
current patches which would break the API, so maybe it makes the most
sense for it to go in via Rust for 6.15.
This series was originally written by José Expósito, and has been
modified and updated by Matt Gilbride, Miguel Ojeda, and myself. The
original version can be found here:
https://github.com/Rust-for-Linux/linux/pull/950
Add support for writing KUnit tests in Rust. While Rust doctests are
already converted to KUnit tests and run, they're really better suited
for examples, rather than as first-class unit tests.
This series implements a series of direct Rust bindings for KUnit tests,
as well as a new macro which allows KUnit tests to be written using a
close variant of normal Rust unit test syntax. The only change required
is replacing '#[cfg(test)]' with '#[kunit_tests(kunit_test_suite_name)]'
An example test would look like:
#[kunit_tests(rust_kernel_hid_driver)]
mod tests {
use super::*;
use crate::{c_str, driver, hid, prelude::*};
use core::ptr;
struct SimpleTestDriver;
impl Driver for SimpleTestDriver {
type Data = ();
}
#[test]
fn rust_test_hid_driver_adapter() {
let mut hid = bindings::hid_driver::default();
let name = c_str!("SimpleTestDriver");
static MODULE: ThisModule = unsafe { ThisModule::from_ptr(ptr::null_mut()) };
let res = unsafe {
<hid::Adapter<SimpleTestDriver> as driver::DriverOps>::register(&mut hid, name, &MODULE)
};
assert_eq!(res, Err(ENODEV)); // The mock returns -19
}
}
Please give this a go, and make sure I haven't broken it! There's almost
certainly a lot of improvements which can be made -- and there's a fair
case to be made for replacing some of this with generated C code which
can use the C macros -- but this is hopefully an adequate implementation
for now, and the interface can (with luck) remain the same even if the
implementation changes.
A few small notable missing features:
- Attributes (like the speed of a test) are hardcoded to the default
value.
- Similarly, the module name attribute is hardcoded to NULL. In C, we
use the KBUILD_MODNAME macro, but I couldn't find a way to use this
from Rust which wasn't more ugly than just disabling it.
- Assertions are not automatically rewritten to use KUnit assertions.
---
Changes since v5:
https://lore.kernel.org/all/20241213081035.2069066-1-davidgow@google.com/
- Rebased against 6.14-rc1
- Fixed a bunch of warnings / clippy lints introduced in Rust 1.83 and
1.84.
- No longer needs static_mut_refs / const_mut_refs, and is much cleaned
up as a result. (Thanks, Miguel)
- Major documentation and example fixes. (Thanks, Miguel)
Changes since v4:
https://lore.kernel.org/linux-kselftest/20241101064505.3820737-1-davidgow@g…
- Rebased against 6.13-rc1
- Allowed an unused_unsafe warning after the behaviour of addr_of_mut!()
changed in Rust 1.82. (Thanks Boqun, Miguel)
- "Expect" that the sample assert_eq!(1+1, 2) produces a clippy warning
due to a redundant assertion. (Thanks Boqun, Miguel)
- Fix some missing safety comments, and remove some unneeded 'unsafe'
blocks. (Thanks Boqun)
- Fix a couple of minor rustfmt issues which were triggering checkpatch
warnings.
Changes since v3:
https://lore.kernel.org/linux-kselftest/20241030045719.3085147-2-davidgow@g…
- The kunit_unsafe_test_suite!() macro now panic!s if the suite name is
too long, triggering a compile error. (Thanks, Alice!)
- The #[kunit_tests()] macro now preserves span information, so
errors can be better reported. (Thanks, Boqun!)
- The example tests have been updated to no longer use assert_eq!() with
a constant bool argument (which triggered a clippy warning now we
have the span info).
Changes since v2:
https://lore.kernel.org/linux-kselftest/20241029092422.2884505-1-davidgow@g…
- Include missing rust/macros/kunit.rs file from v2. (Thanks Boqun!)
- The kunit_unsafe_test_suite!() macro will truncate the name of the
suite if it is too long. (Thanks Alice!)
- The proc macro now emits an error if the suite name is too long.
- We no longer needlessly use UnsafeCell<> in
kunit_unsafe_test_suite!(). (Thanks Alice!)
Changes since v1:
https://lore.kernel.org/lkml/20230720-rustbind-v1-0-c80db349e3b5@google.com…
- Rebase on top of the latest rust-next (commit 718c4069896c)
- Make kunit_case a const fn, rather than a macro (Thanks Boqun)
- As a result, the null terminator is now created with
kernel::kunit::kunit_case_null()
- Use the C kunit_get_current_test() function to implement
in_kunit_test(), rather than re-implementing it (less efficiently)
ourselves.
Changes since the GitHub PR:
- Rebased on top of kselftest/kunit
- Add const_mut_refs feature
This may conflict with https://lore.kernel.org/lkml/20230503090708.2524310-6-nmi@metaspace.dk/
- Add rust/macros/kunit.rs to the KUnit MAINTAINERS entry
---
José Expósito (3):
rust: kunit: add KUnit case and suite macros
rust: macros: add macro to easily run KUnit tests
rust: kunit: allow to know if we are in a test
MAINTAINERS | 1 +
rust/kernel/kunit.rs | 199 +++++++++++++++++++++++++++++++++++++++++++
rust/macros/kunit.rs | 161 ++++++++++++++++++++++++++++++++++
rust/macros/lib.rs | 29 +++++++
4 files changed, 390 insertions(+)
create mode 100644 rust/macros/kunit.rs
--
2.48.1.601.g30ceb7b040-goog
Series deals with one more case of vsock surprising BPF/sockmap by being
inconsistency about (having an) assigned transport.
KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127]
CPU: 7 UID: 0 PID: 56 Comm: kworker/7:0 Not tainted 6.14.0-rc1+
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_read_skb+0x4b/0x90
Call Trace:
sk_psock_verdict_data_ready+0xa4/0x2e0
virtio_transport_recv_pkt+0x1ca8/0x2acc
vsock_loopback_work+0x27d/0x3f0
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x35a/0x700
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
This bug, similarly to commit f6abafcd32f9 ("vsock/bpf: return early if
transport is not assigned"), could be fixed with a single NULL check. But
instead, let's explore another approach: take a hint from
vsock_bpf_update_proto() and teach sockmap to accept only vsocks that are
already connected (no risk of transport being dropped or reassigned). At
the same time straight reject the listeners (vsock listening sockets do not
carry any transport anyway). This way BPF does not have to worry about
vsk->transport becoming NULL.
Signed-off-by: Michal Luczaj <mhal(a)rbox.co>
---
Michal Luczaj (4):
sockmap, vsock: For connectible sockets allow only connected
vsock/bpf: Warn on socket without transport
selftest/bpf: Adapt vsock_delete_on_close to sockmap rejecting unconnected
selftest/bpf: Add vsock test for sockmap rejecting unconnected
net/core/sock_map.c | 3 +
net/vmw_vsock/af_vsock.c | 3 +
net/vmw_vsock/vsock_bpf.c | 2 +-
.../selftests/bpf/prog_tests/sockmap_basic.c | 70 ++++++++++++++++------
4 files changed, 59 insertions(+), 19 deletions(-)
---
base-commit: 9c01a177c2e4b55d2bcce8a1f6bdd1d46a8320e3
change-id: 20250210-vsock-listen-sockmap-nullptr-e6e82ca79611
Best regards,
--
Michal Luczaj <mhal(a)rbox.co>
Add PKEY_UNRESTRICTED macro to mman.h and use it in selftests.
For context, this change will also allow for more consistent update of the
Glibc manual which in turn will help with introducing memory protection
keys on AArch64 targets.
Applies to 5bc55a333a2f (tag: v6.13-rc7).
Note that I couldn't build ppc tests so I would appreciate if someone
could check the 3rd patch. Thank you!
Signed-off-by: Yury Khrustalev <yury.khrustalev(a)arm.com>
---
Changes in v4:
- Removed change to tools/include/uapi/asm-generic/mman-common.h as it is not
necessary.
Link to v3: https://lore.kernel.org/all/20241028090715.509527-1-yury.khrustalev@arm.com/
Changes in v3:
- Replaced previously missed 0-s tools/testing/selftests/mm/mseal_test.c
- Replaced previously missed 0-s in tools/testing/selftests/mm/mseal_test.c
Link to v2: https://lore.kernel.org/linux-arch/20241027170006.464252-2-yury.khrustalev@…
Changes in v2:
- Update tools/include/uapi/asm-generic/mman-common.h as well
- Add usages of the new macro to selftests.
Link to v1: https://lore.kernel.org/linux-arch/20241022120128.359652-1-yury.khrustalev@…
---
Yury Khrustalev (3):
mm/pkey: Add PKEY_UNRESTRICTED macro
selftests/mm: Use PKEY_UNRESTRICTED macro
selftests/powerpc: Use PKEY_UNRESTRICTED macro
include/uapi/asm-generic/mman-common.h | 1 +
tools/testing/selftests/mm/mseal_test.c | 6 +++---
tools/testing/selftests/mm/pkey-helpers.h | 3 ++-
tools/testing/selftests/mm/pkey_sighandler_tests.c | 4 ++--
tools/testing/selftests/mm/protection_keys.c | 2 +-
tools/testing/selftests/powerpc/include/pkeys.h | 2 +-
tools/testing/selftests/powerpc/mm/pkey_exec_prot.c | 2 +-
tools/testing/selftests/powerpc/mm/pkey_siginfo.c | 2 +-
tools/testing/selftests/powerpc/ptrace/core-pkey.c | 6 +++---
tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 6 +++---
10 files changed, 18 insertions(+), 16 deletions(-)
--
2.39.5
kunit_skip() and kunit_mark_skipped() can only be passed a pointer
to a struct kunit, not struct kunit_suite (only kunit_log() actually
supports both). Rename their first argument accordingly.
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
---
Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Cc: linux-kselftest(a)vger.kernel.org
---
include/kunit/test.h | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 58dbab60f853..0ffb97c78566 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -553,9 +553,9 @@ void kunit_cleanup(struct kunit *test);
void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt, ...);
/**
- * kunit_mark_skipped() - Marks @test_or_suite as skipped
+ * kunit_mark_skipped() - Marks @test as skipped
*
- * @test_or_suite: The test context object.
+ * @test: The test context object.
* @fmt: A printk() style format string.
*
* Marks the test as skipped. @fmt is given output as the test status
@@ -563,18 +563,18 @@ void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt,
*
* Test execution continues after kunit_mark_skipped() is called.
*/
-#define kunit_mark_skipped(test_or_suite, fmt, ...) \
+#define kunit_mark_skipped(test, fmt, ...) \
do { \
- WRITE_ONCE((test_or_suite)->status, KUNIT_SKIPPED); \
- scnprintf((test_or_suite)->status_comment, \
+ WRITE_ONCE((test)->status, KUNIT_SKIPPED); \
+ scnprintf((test)->status_comment, \
KUNIT_STATUS_COMMENT_SIZE, \
fmt, ##__VA_ARGS__); \
} while (0)
/**
- * kunit_skip() - Marks @test_or_suite as skipped
+ * kunit_skip() - Marks @test as skipped
*
- * @test_or_suite: The test context object.
+ * @test: The test context object.
* @fmt: A printk() style format string.
*
* Skips the test. @fmt is given output as the test status
@@ -582,10 +582,10 @@ void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt,
*
* Test execution is halted after kunit_skip() is called.
*/
-#define kunit_skip(test_or_suite, fmt, ...) \
+#define kunit_skip(test, fmt, ...) \
do { \
- kunit_mark_skipped((test_or_suite), fmt, ##__VA_ARGS__);\
- kunit_try_catch_throw(&((test_or_suite)->try_catch)); \
+ kunit_mark_skipped((test), fmt, ##__VA_ARGS__); \
+ kunit_try_catch_throw(&((test)->try_catch)); \
} while (0)
/*
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
--
2.47.0
Following a similar rationale as commit e4835f1da425f ("kunit: tool:
Build compile_commands.json"), make a common developer tool available by
default for KUnit users.
Compared to compile_commands.json, there is a little more work to be
done to build the GDB scripts. Is it enough to affect development cycle
duration? Unscientific evaluation:
rm -rf .kunit; time tools/testing/kunit/kunit.py build --kunitconfig ./lib/kunit/.kunitconfig --jobs 96
Without this patch it took 14.77s, with this patch it took 14.83. So,
although `make scripts_gdb` is pretty slow, presumably most of that is
just the overhead of running Kbuild at all, actually building the
scripts is approximately free.
Note also, to actually get the GDB scripts the user needs to enable
CONFIG_SCRIPTS_GDB, but building the scripts_gdb target without that is
still harmless.
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
tools/testing/kunit/kunit_kernel.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index e76d7894b6c5195ece49f0d8c7ac35130df428a9..33b5f7351cbb5d0be240cb52db2bc1fa94aeb75e 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -72,8 +72,8 @@ class LinuxSourceTreeOperations:
raise ConfigError(e.output.decode())
def make(self, jobs: int, build_dir: str, make_options: Optional[List[str]]) -> None:
- command = ['make', 'all', 'compile_commands.json', 'ARCH=' + self._linux_arch,
- 'O=' + build_dir, '--jobs=' + str(jobs)]
+ command = ['make', 'all', 'compile_commands.json', 'scripts_gdb',
+ 'ARCH=' + self._linux_arch, 'O=' + build_dir, '--jobs=' + str(jobs)]
if make_options:
command.extend(make_options)
if self._cross_compile:
---
base-commit: 521d60e196ecb215f425e04e9ab33e02beaffbe3
change-id: 20250121-kunit-gdb-b27315b4f2d8
Best regards,
--
Brendan Jackman <jackmanb(a)google.com>
Greetings:
Welcome to v8. Minor change, see changelog below. Re-tested on my mlx5
system both with and without CONFIG_XDP_SOCKETS enabled and both with
and without NETIF set.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v8:
- Update the Makefile in patch 3 to use TEST_GEN_FILES instead of
TEST_GET_PROGS.
- Fix a codespell complaint in xdp_helper.c.
v7: https://lore.kernel.org/netdev/20250213192336.42156-1-jdamato@fastly.com/
- Added CONFIG_XDP_SOCKETS to selftests/driver/net/config as suggested
by Stanislav.
- Updated xdp_helper.c to return -1 for AF_XDP non-existence, but 1
for other failures.
- Updated queues.py to mark test as skipped if AF_XDP does not exist.
v6: https://lore.kernel.org/bpf/20250210193903.16235-1-jdamato@fastly.com/
- Added ifdefs for CONFIG_XDP_SOCKETS in patch 2 as Stanislav
suggested.
v5: https://lore.kernel.org/bpf/20250208041248.111118-1-jdamato@fastly.com/
- Removed unused ret variable from patch 2 as Simon suggested.
v4: https://lore.kernel.org/lkml/20250207030916.32751-1-jdamato@fastly.com/
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 +++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 12 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/config | 1 +
tools/testing/selftests/drivers/net/queues.py | 42 +++++++-
.../selftests/drivers/net/xdp_helper.c | 98 +++++++++++++++++++
10 files changed, 194 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 7a7e0197133d18cfd9931e7d3a842d0f5730223f
--
2.43.0
Currently the mm selftests refuse to run if we don't have huge page
support but there are plenty of tests that don't depend on this feature,
relax this requirement to allow coverage on relevant systems (eg, most
32 bit arm ones).
While doing this I noticed a bug with an existing check if we're running
THP tests, the fix overlaps with the above change so is sent as part of
a series.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
selftests/mm: Fix check for running THP tests
selftests/mm: Allow tests to run with no huge pages support
tools/testing/selftests/mm/run_vmtests.sh | 68 +++++++++++++++++++------------
1 file changed, 43 insertions(+), 25 deletions(-)
---
base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
change-id: 20250211-kselftest-mm-no-hugepages-ee5917a170eb
Best regards,
--
Mark Brown <broonie(a)kernel.org>
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
Since mm_cid compaction is less likely for tasks running in short
bursts, we increase the likelihood by just running a busy loop at every
iteration. This compaction is a best effort work and this behaviour is
currently acceptable.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 208 ++++++++++++++++++
3 files changed, 210 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..8808500466d02
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ /*
+ * Currently mm_cid compaction is less likely for tasks
+ * running in short bursts: increase likelihood by just
+ * running for some time doing nothing.
+ */
+ for (int j = 0; j < 0xffff; j++)
+ for (int k = 0; k < 0xffff; k++)
+ asm("");
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1
This series is a follow-up to [1], which adds mTHP support to khugepaged.
mTHP khugepaged support was necessary for the global="defer" and
mTHP="inherit" case (and others) to make sense.
We've seen cases were customers switching from RHEL7 to RHEL8 see a
significant increase in the memory footprint for the same workloads.
Through our investigations we found that a large contributing factor to
the increase in RSS was an increase in THP usage.
For workloads like MySQL, or when using allocators like jemalloc, it is
often recommended to set /transparent_hugepages/enabled=never. This is
in part due to performance degradations and increased memory waste.
This series introduces enabled=defer, this setting acts as a middle
ground between always and madvise. If the mapping is MADV_HUGEPAGE, the
page fault handler will act normally, making a hugepage if possible. If
the allocation is not MADV_HUGEPAGE, then the page fault handler will
default to the base size allocation. The caveat is that khugepaged can
still operate on pages thats not MADV_HUGEPAGE.
This allows for two things... one, applications specifically designed to
use hugepages will get them, and two, applications that don't use
hugepages can still benefit from them without aggressively inserting
THPs at every possible chance. This curbs the memory waste, and defers
the use of hugepages to khugepaged. Khugepaged can then scan the memory
for eligible collapsing.
Admins may want to lower max_ptes_none, if not, khugepaged may
aggressively collapse single allocations into hugepages.
TESTING:
- Built for x86_64, aarch64, ppc64le, and s390x
- selftests mm
- In [1] I provided a script [2] that has multiple access patterns
- lots of general use. These changes have been running in my VM for some time
- redis testing. This test was my original case for the defer mode. What I was
able to prove was that THP=always leads to increased max_latency cases; hence
why it is recommended to disable THPs for redis servers. However with 'defer'
we dont have the max_latency spikes and can still get the system to utilize
THPs. I further tested this with the mTHP defer setting and found that redis
(and probably other jmalloc users) can utilize THPs via defer (+mTHP defer)
without a large latency penalty and some potential gains.
I uploaded some mmtest results here [3] which compares:
stock+thp=never
stock+(m)thp=always
khugepaged-mthp + defer (max_ptes_none=64)
The results show that (m)THPs can cause some throughput regression in some
cases, but also has gains in other cases. The mTHP+defer results have more
gains and less losses over the (m)THP=always case.
V2 Changes:
- base changes on mTHP khugepaged support
- Fix selftests parsing issue
- add mTHP defer option
- add mTHP defer Documentation
[1] - https://lkml.org/lkml/2025/2/10/1982
[2] - https://gitlab.com/npache/khugepaged_mthp_test
[3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.h…
Nico Pache (5):
mm: defer THP insertion to khugepaged
mm: document transparent_hugepage=defer usage
selftests: mm: add defer to thp setting parser
khugepaged: add defer option to mTHP options
mm: document mTHP defer setting
Documentation/admin-guide/mm/transhuge.rst | 40 ++++++++++---
include/linux/huge_mm.h | 18 +++++-
mm/huge_memory.c | 69 +++++++++++++++++++---
mm/khugepaged.c | 10 ++--
tools/testing/selftests/mm/thp_settings.c | 1 +
tools/testing/selftests/mm/thp_settings.h | 1 +
6 files changed, 115 insertions(+), 24 deletions(-)
--
2.48.1
damos_quota_goal.py selftest see if DAMOS quota goals tuning feature
increases or reduces the effective size quota for given score as
expected. The tuning feature sets the minimum quota size as one byte,
so if the effective size quota is already one, we cannot expect it
further be reduced. However the test is not aware of the edge case, and
fails since it shown no expected change of the effective quota. Handle
the case by updating the failure logic for no change to see if it was
the case, and simply skips to next test input.
Fixes: f1c07c0a1662b ("selftests/damon: add a test for DAMOS quota goal")
Cc: <stable(a)vger.kernel.org> # 6.10.x
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171423.b28a918d-lkp@intel.com
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
tools/testing/selftests/damon/damos_quota_goal.py | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/damon/damos_quota_goal.py b/tools/testing/selftests/damon/damos_quota_goal.py
index 18246f3b62f7..f76e0412b564 100755
--- a/tools/testing/selftests/damon/damos_quota_goal.py
+++ b/tools/testing/selftests/damon/damos_quota_goal.py
@@ -63,6 +63,9 @@ def main():
if last_effective_bytes != 0 else -1.0))
if last_effective_bytes == goal.effective_bytes:
+ # effective quota was already minimum that cannot be more reduced
+ if expect_increase is False and last_effective_bytes == 1:
+ continue
print('efective bytes not changed: %d' % goal.effective_bytes)
exit(1)
base-commit: 20017459916819f8ae15ca3840e71fbf0ea8354e
--
2.39.5
Here is a couple of patches to fix issues. I think mount_options.tc's one
is a real bug(I'm not sure how it worked), but another one is an enhancement
for (my) execution environment.
Anyway, those should go through kselftests tree.
Thank you,
---
Masami Hiramatsu (Google) (2):
selftests/ftrace: Fix to use remount when testing mount GID option
selftests/ftrace: Make uprobe test more robust against binary name
.../ftrace/test.d/00basic/mount_options.tc | 8 ++++----
.../ftrace/test.d/dynevent/add_remove_uprobe.tc | 4 +++-
2 files changed, 7 insertions(+), 5 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
On Sat, Feb 15, 2025 at 02:52:22PM -0500, Tamir Duberstein wrote:
> On Sat, Feb 15, 2025 at 1:51 PM kernel test robot <lkp(a)intel.com> wrote:
> I am not able to reproduce these warnings with clang 19.1.7. They also
> don't obviously make sense to me.
Please, when reply, remove boielrplate stuff!
I have just wasted a couple of minutes to understand what's going on in the
message that is 2700 lines of text as the reply to the bot message which was
~700 lines.
--
With Best Regards,
Andy Shevchenko
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details in a
straightforward and architecture-agnostic way, providing a consistent way
of manipulating the system call number and arguments across architectures.
As in case of PTRACE_GET_SYSCALL_INFO, PTRACE_SET_SYSCALL_INFO also
does not aim to address numerous architecture-specific system call ABI
peculiarities, like differences in the number of system call arguments
for such system calls as pread64 and preadv.
The current implementation supports changing only those bits of system call
information that are used by strace system call tampering, namely, syscall
number, syscall arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set. Currently, "flags" and "reserved" fields of
struct ptrace_syscall_info must be initialized with zeroes; "arch",
"instruction_pointer", and "stack_pointer" fields are currently ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Notes:
v6:
* mips: Submit mips_get_syscall_arg() o32 fix via mips tree
to get it merged into v6.14-rc3
* Rebase to v6.14-rc3
* v5: https://lore.kernel.org/all/20250210113336.GA887@strace.io/
v5:
* ptrace: Extend the commit message to say that the new API does not aim
to address numerous architecture-specific syscall ABI peculiarities
* selftests: Add a workaround for s390 16-bit syscall numbers
* Add more Acked-by
* v4: https://lore.kernel.org/all/20250203065849.GA14120@strace.io/
v4:
* Split out syscall_set_return_value() for hexagon into a separate patch
* s390: Change the style of syscall_set_arguments() implementation as
requested
* Add more Reviewed-by
* v3: https://lore.kernel.org/all/20250128091445.GA8257@strace.io/
v3:
* powerpc: Submit syscall_set_return_value() fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
add a detailed description to the commit message of the
mips_get_syscall_arg() change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
simplify obtaining of user ptrace_syscall_info,
do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit(),
add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* Add Tested-by and Reviewed-by
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
and sparc
* Remove redundant ptrace_syscall_info.op assignments in
ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
in ptrace_set_syscall_info_entry()
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test
Dmitry V. Levin (6):
hexagon: add syscall_set_return_value()
syscall.h: add syscall_set_arguments()
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 25 +
arch/arm/include/asm/syscall.h | 37 ++
arch/arm64/include/asm/syscall.h | 29 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 21 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 46 ++
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 20 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 21 +
arch/sh/include/asm/syscall_32.h | 24 +
arch/sparc/include/asm/syscall.h | 22 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 +
include/uapi/linux/ptrace.h | 7 +-
kernel/ptrace.c | 179 +++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 519 ++++++++++++++++++
25 files changed, 1141 insertions(+), 27 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
--
ldv
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
For now we only adjust pidfd_get_task() and the pidfd_send_signal() system
call with specific handling for this, implementing this functionality for
process_madvise(), process_mrelease() (albeit, using it here wouldn't
really make sense) and pidfd_send_signal().
We defer making further changes, as this would require a significant rework
of the pidfd mechanism.
The motivating case here is to support PIDFD_SELF in process_madvise(), so
this suffices for immediate uses. Moving forward, this can be further
expanded to other uses.
v7:
* Reworked implementation according to Christian's requirements. We now
only support process_madvise() and pidfd_send_signal() system calls with
PIDFD_SELF as specified.
* Updated tests to account for broken pidfd_open_test.c implementation.
* Fixed missing includes in pidfd self tests.
* Removed tests relating to functionality no longer supported.
* Update guard pages test to use PIDFD_SELF.
v6:
* Avoid static inline in UAPI header as suggested by Pedro.
* Place PIDFD_SELF values out of range of errors and any other sentinel as
suggested by Pedro.
https://lore.kernel.org/linux-mm/cover.1729926229.git.lorenzo.stoakes@oracl…
v5:
* Fixup self test dependencies on pidfd/pidfd.h.
https://lore.kernel.org/linux-mm/cover.1729848252.git.lorenzo.stoakes@oracl…
v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
Christian, instead simply always pin the pid and maintain fd scope in the
helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
UAPI pidfd.h header without encountering the collision between system
fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
stdbool.h in userland.
https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoakes@oracl…
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracl…
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (6):
pidfd: add PIDFD_SELF* sentinels to refer to own thread/process
selftests/pidfd: add missing system header imcludes to pidfd tests
tools: testing: separate out wait_for_pid() into helper header
selftests: pidfd: add pidfd.h UAPI wrapper
selftests: pidfd: add tests for PIDFD_SELF_*
selftests/mm: use PIDFD_SELF in guard pages test
include/uapi/linux/pidfd.h | 24 ++++
kernel/pid.c | 24 +++-
kernel/signal.c | 106 +++++++++++-------
tools/include/linux/pidfd.h | 14 +++
tools/testing/selftests/cgroup/test_kill.c | 2 +-
tools/testing/selftests/mm/Makefile | 4 +
tools/testing/selftests/mm/guard-pages.c | 15 +--
.../pid_namespace/regression_enomem.c | 2 +-
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 28 +----
.../selftests/pidfd/pidfd_fdinfo_test.c | 1 +
tools/testing/selftests/pidfd/pidfd_helpers.h | 39 +++++++
.../testing/selftests/pidfd/pidfd_open_test.c | 6 +-
.../selftests/pidfd/pidfd_setns_test.c | 1 +
tools/testing/selftests/pidfd/pidfd_test.c | 76 +++++++++++--
15 files changed, 242 insertions(+), 103 deletions(-)
create mode 100644 tools/include/linux/pidfd.h
create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h
--
2.48.1
Hello,
kernel test robot noticed "kernel-selftests.lib.prime_numbers.sh.fail" on:
commit: 313b38a6ecb46db4002925af91b64df4f2b76d1f ("lib/prime_numbers: convert self-test to KUnit")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 0ae0fa3bf0b44c8611d114a9f69985bf451010c3]
in testcase: kernel-selftests
version: kernel-selftests-x86_64-78a632a2086c-1_20250215
with following parameters:
group: lib
config: x86_64-rhel-9.4-kselftests
compiler: gcc-12
test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang(a)intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202502171110.708d965a-lkp@intel.com
# timeout set to 300
# selftests: lib: prime_numbers.sh
# Warning: file prime_numbers.sh is missing!
not ok 3 selftests: lib: prime_numbers.sh
(missed the change for tools/testing/selftests/lib/Makefile?)
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250217/202502171110.708d965a-lkp@…
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi,
This series carries forward the effort to add Kselftest for PCI Endpoint
Subsystem started by Aman Gupta [1] a while ago. I reworked the initial version
based on another patch that fixes the return values of IOCTLs in
pci_endpoint_test driver and did many cleanups. Since the resulting work
modified the initial version substantially, I took over the authorship.
This series also incorporates the review comment by Shuah Khan [2] to move the
existing tests from 'tools/pci' to 'tools/testing/kselftest/pci_endpoint' before
migrating to Kselftest framework. I made sure that the tests are executable in
each commit and updated documentation accordingly.
- Mani
[1] https://lore.kernel.org/linux-pci/20221007053934.5188-1-aman1.gupta@samsung…
[2] https://lore.kernel.org/linux-pci/b2a5db97-dc59-33ab-71cd-f591e0b1b34d@linu…
Changes in v6:
* Fixed the documentation to pass max MSI and MSI-X count to configfs
* Collected tags
Changes in v5:
* Incorporated comments from Niklas
* Added a patch to fix the DMA MEMCPY check in pci-epf-test driver
* Collected tags
* Rebased on top of pci/next 0333f56dbbf7ef6bb46d2906766c3e1b2a04a94d
Changes in v4:
* Dropped the BAR fix patches and submitted them separately:
https://lore.kernel.org/linux-pci/20241231130224.38206-1-manivannan.sadhasi…
* Rebased on top of pci/next 9e1b45d7a5bc0ad20f6b5267992da422884b916e
Changes in v3:
* Collected tags.
* Added a note about failing testcase 10 and command to skip it in
documentation.
* Removed Aman Gupta and Padmanabhan Rajanbabu from CC as their addresses are
bouncing.
Changes in v2:
* Added a patch that fixes return values of IOCTL in pci_endpoint_test driver
* Moved the existing tests to new location before migrating
* Added a fix for BARs on Qcom devices
* Updated documentation and also added fixture variants for memcpy & DMA modes
Manivannan Sadhasivam (4):
PCI: endpoint: pci-epf-test: Fix the check for DMA MEMCPY test
misc: pci_endpoint_test: Fix the return value of IOCTL
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
selftests: pci_endpoint: Migrate to Kselftest framework
Documentation/PCI/endpoint/pci-test-howto.rst | 174 +++++-------
MAINTAINERS | 2 +-
drivers/misc/pci_endpoint_test.c | 255 +++++++++--------
drivers/pci/endpoint/functions/pci-epf-test.c | 4 +-
tools/pci/Build | 1 -
tools/pci/Makefile | 58 ----
tools/pci/pcitest.c | 264 ------------------
tools/pci/pcitest.sh | 73 -----
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/pci_endpoint/.gitignore | 2 +
tools/testing/selftests/pci_endpoint/Makefile | 7 +
tools/testing/selftests/pci_endpoint/config | 4 +
.../pci_endpoint/pci_endpoint_test.c | 221 +++++++++++++++
13 files changed, 437 insertions(+), 629 deletions(-)
delete mode 100644 tools/pci/Build
delete mode 100644 tools/pci/Makefile
delete mode 100644 tools/pci/pcitest.c
delete mode 100644 tools/pci/pcitest.sh
create mode 100644 tools/testing/selftests/pci_endpoint/.gitignore
create mode 100644 tools/testing/selftests/pci_endpoint/Makefile
create mode 100644 tools/testing/selftests/pci_endpoint/config
create mode 100644 tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
--
2.25.1
tun used to simply advance iov_iter when it needs to pad virtio header,
which leaves the garbage in the buffer as is. This is especially
problematic when tun starts to allow enabling the hash reporting
feature; even if the feature is enabled, the packet may lack a hash
value and may contain a hole in the virtio header because the packet
arrived before the feature gets enabled or does not contain the
header fields to be hashed. If the hole is not filled with zero, it is
impossible to tell if the packet lacks a hash value.
In theory, a user of tun can fill the buffer with zero before calling
read() to avoid such a problem, but leaving the garbage in the buffer is
awkward anyway so fill the buffer in tun.
The specification also says the device MUST set num_buffers to 1 when
the field is present so set it when the specified header size is big
enough to contain the field.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
drivers/net/tap.c | 2 +-
drivers/net/tun.c | 6 ++++--
drivers/net/tun_vnet.h | 14 +++++++++-----
3 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 1287e241f4454fb8ec4975bbaded5fbaa88e3cc8..d96009153c316f669e626d95002e5fe8add3a1b2 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -711,7 +711,7 @@ static ssize_t tap_put_user(struct tap_queue *q,
int total;
if (q->flags & IFF_VNET_HDR) {
- struct virtio_net_hdr vnet_hdr;
+ struct virtio_net_hdr_v1 vnet_hdr;
vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b14231a743915c2851eaae49d757b763ec4a8841..a3aed7e42c63d8b8f523c0141c7d970ab185178c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1987,7 +1987,9 @@ static ssize_t tun_put_user_xdp(struct tun_struct *tun,
ssize_t ret;
if (tun->flags & IFF_VNET_HDR) {
- struct virtio_net_hdr gso = { 0 };
+ struct virtio_net_hdr_v1 gso = {
+ .num_buffers = cpu_to_tun_vnet16(tun->flags, 1)
+ };
vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
ret = tun_vnet_hdr_put(vnet_hdr_sz, iter, &gso);
@@ -2040,7 +2042,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
}
if (vnet_hdr_sz) {
- struct virtio_net_hdr gso;
+ struct virtio_net_hdr_v1 gso;
ret = tun_vnet_hdr_from_skb(tun->flags, tun->dev, skb, &gso);
if (ret)
diff --git a/drivers/net/tun_vnet.h b/drivers/net/tun_vnet.h
index fd7411c4447ffb180e032fe3e22f6709c30da8e9..b4f406f522728f92266898969831c26a87930f6a 100644
--- a/drivers/net/tun_vnet.h
+++ b/drivers/net/tun_vnet.h
@@ -135,15 +135,17 @@ static inline int tun_vnet_hdr_get(int sz, unsigned int flags,
}
static inline int tun_vnet_hdr_put(int sz, struct iov_iter *iter,
- const struct virtio_net_hdr *hdr)
+ const struct virtio_net_hdr_v1 *hdr)
{
+ int content_sz = MIN(sizeof(*hdr), sz);
+
if (unlikely(iov_iter_count(iter) < sz))
return -EINVAL;
- if (unlikely(copy_to_iter(hdr, sizeof(*hdr), iter) != sizeof(*hdr)))
+ if (unlikely(copy_to_iter(hdr, content_sz, iter) != content_sz))
return -EFAULT;
- iov_iter_advance(iter, sz - sizeof(*hdr));
+ iov_iter_zero(sz - content_sz, iter);
return 0;
}
@@ -157,11 +159,11 @@ static inline int tun_vnet_hdr_to_skb(unsigned int flags, struct sk_buff *skb,
static inline int tun_vnet_hdr_from_skb(unsigned int flags,
const struct net_device *dev,
const struct sk_buff *skb,
- struct virtio_net_hdr *hdr)
+ struct virtio_net_hdr_v1 *hdr)
{
int vlan_hlen = skb_vlan_tag_present(skb) ? VLAN_HLEN : 0;
- if (virtio_net_hdr_from_skb(skb, hdr,
+ if (virtio_net_hdr_from_skb(skb, (struct virtio_net_hdr *)hdr,
tun_vnet_is_little_endian(flags), true,
vlan_hlen)) {
struct skb_shared_info *sinfo = skb_shinfo(skb);
@@ -179,6 +181,8 @@ static inline int tun_vnet_hdr_from_skb(unsigned int flags,
return -EINVAL;
}
+ hdr->num_buffers = cpu_to_tun_vnet16(flags, 1);
+
return 0;
}
---
base-commit: f54eab84fc17ef79b701e29364b7d08ca3a1d2f6
change-id: 20250116-buffers-96e14bf023fc
prerequisite-change-id: 20241230-tun-66e10a49b0c7:v6
prerequisite-patch-id: 871dc5f146fb6b0e3ec8612971a8e8190472c0fb
prerequisite-patch-id: 2797ed249d32590321f088373d4055ff3f430a0e
prerequisite-patch-id: ea3370c72d4904e2f0536ec76ba5d26784c0cede
prerequisite-patch-id: 837e4cf5d6b451424f9b1639455e83a260c4440d
prerequisite-patch-id: ea701076f57819e844f5a35efe5cbc5712d3080d
prerequisite-patch-id: 701646fb43ad04cc64dd2bf13c150ccbe6f828ce
prerequisite-patch-id: 53176dae0c003f5b6c114d43f936cf7140d31bb5
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
Fixes a minor grammatical issue in a comment in close_range_test.c
where "and and" was mistakenly repeated.
Signed-off-by: Imanol <imvalient(a)protonmail.com>
---
tools/testing/selftests/core/close_range_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c
index e0d9851fe1c9..c19e8d037211 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -506,7 +506,7 @@ TEST(close_range_cloexec_unshare_syzbot)
/*
* Create a huge gap in the fd table. When we now call
- * CLOSE_RANGE_UNSHARE with a shared fd table and and with ~0U as upper
+ * CLOSE_RANGE_UNSHARE with a shared fd table and with ~0U as upper
* bound the kernel will only copy up to fd1 file descriptors into the
* new fd table. If the kernel is buggy and doesn't handle
* CLOSE_RANGE_CLOEXEC correctly it will not have copied all file
--
2.43.0
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Patch set splited into part I
- net: pktgen: replace ENOTSUPP with EOPNOTSUPP
- net: pktgen: enable 'param=value' parsing
- net: pktgen: fix hex32_arg parsing for short reads
- net: pktgen: fix 'rate 0' error handling (return -EINVAL)
- net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
- net: pktgen: fix ctrl interface command parsing
- net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
And part II (this one):
- net: pktgen: use defines for the various dec/hex number parsing digits lengths
- net: pktgen: fix mix of int/long
- net: pktgen: remove extra tmp variable (re-use len instead)
- net: pktgen: remove some superfluous variable initializing
- net: pktgen: fix mpls maximum labels list parsing
- net: pktgen: fix access outside of user given buffer in pktgen_if_write()
- net: pktgen: fix mpls reset parsing
- net: pktgen: remove all superfluous index assignements
- selftest: net: add proc_net_pktgen
Regards,
Peter
Changes v4 -> v5:
- split up patchset into part i/ii (suggested by Simon Horman)
- add rev-by Simon Horman
- net: pktgen: align some variable declarations to the most common pattern
-> net: pktgen: fix mix of int/long
- instead of align to most common pattern (int) adjust all usages to
size_t for i and max and ssize_t for len and adjust function signatures
of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly
- respect reverse xmas tree order for local variable declarations (where
possible without too much code churn)
- update subject line and patch description
- dropped net: pktgen: hex32_arg/num_arg error out in case no characters are
available
- keep empty hex/num arg is implicit assumed as zero value
- dropped net: pktgen: num_arg error out in case no valid character is parsed
- keep empty hex/num arg is implicit assumed as zero value
- Change patch description ('Fixes:' -> 'Addresses the following:',
suggested by Simon Horman)
- net: pktgen: remove all superfluous index assignements
- new patch (suggested by Simon Horman)
- selftest: net: add proc_net_pktgen
- addapt to dropped patch 'net: pktgen: hex32_arg/num_arg error out in case
no characters are available', empty hex/num arg is now implicit assumed as
zero value (instead of failure)
Changes v3 -> v4:
- add rev-by Simon Horman
- new patch 'net: pktgen: use defines for the various dec/hex number parsing
digits lengths' (suggested by Simon Horman)
- replace C99 comment (suggested by Paolo Abeni)
- drop available characters check in strn_len() (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: align some variable declarations to the
most common pattern' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove extra tmp variable (re-use len
instead)' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove some superfluous variable
initializing' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: fix mpls maximum labels list parsing'
(suggested by Paolo Abeni)
- factored out 'net: pktgen: hex32_arg/num_arg error out in case no
characters are available' (suggested by Paolo Abeni)
- factored out 'net: pktgen: num_arg error out in case no valid character
is parsed' (suggested by Paolo Abeni)
Changes v2 -> v3:
- new patch: 'net: pktgen: fix ctrl interface command parsing'
- new patch: 'net: pktgen: fix mpls reset parsing'
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo in change description ('v1 -> v1' and tyop)
- rename some vars to better match usage
add_loopback_0 -> thr_cmd_add_loopback_0
rm_loopback_0 -> thr_cmd_rm_loopback_0
wrong_ctrl_cmd -> wrong_thr_cmd
legacy_ctrl_cmd -> legacy_thr_cmd
ctrl_fd -> thr_fd
- add ctrl interface tests
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix tyop not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (8):
net: pktgen: fix mix of int/long
net: pktgen: remove extra tmp variable (re-use len instead)
net: pktgen: remove some superfluous variable initializing
net: pktgen: fix mpls maximum labels list parsing
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
net: pktgen: fix mpls reset parsing
net: pktgen: remove all superfluous index assignements
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 288 ++++----
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 646 ++++++++++++++++++
4 files changed, 806 insertions(+), 130 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.1
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
v11:
- regenerate netdev_xsk_flags based on latest netdev.yaml (Jakub)
v10: https://lore.kernel.org/netdev/20250207021943.814768-1-yoong.siang.song@int…
- use net_err_ratelimited(), instead of net_ratelimit() (Maciej)
- accumulate the amount of used descs in local variable and update the
igc_metadata_request::used_desc once (Maciej)
- Ensure reverse christmas tree rule (Maciej)
V9: https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.song@int…
- Remove the igc_desc_unused() checking (Maciej)
- Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
- Rate limit the error messages (Maciej)
- Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
- Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
- put self SOB tag as last tag (Maciej)
V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int…
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
- submit another bug fix patch to set the buffer type for empty frame (Maciej):
https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int…
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 143 +++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 396 insertions(+), 39 deletions(-)
--
2.34.1
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
v10:
- use net_err_ratelimited(), instead of net_ratelimit() (Maciej)
- accumulate the amount of used descs in local variable and update the
igc_metadata_request::used_desc once (Maciej)
- Ensure reverse christmas tree rule (Maciej)
V9: https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.song@int…
- Remove the igc_desc_unused() checking (Maciej)
- Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
- Rate limit the error messages (Maciej)
- Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
- Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
- put self SOB tag as last tag (Maciej)
V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int…
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
- submit another bug fix patch to set the buffer type for empty frame (Maciej):
https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int…
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 143 +++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 396 insertions(+), 39 deletions(-)
--
2.34.1