Dear Linux folks,
Running the selftests in a QEMU q35 VM, they hang.
```
$ make kselftest
[…]
ok 6 selftests: pidfd: pidfd_getfd_test
# timeout set to 45
# selftests: pidfd: pidfd_setns_test
# TAP version 13
# 1..7
# # Starting 7 tests from 2 test cases.
# # RUN global.setns_einval ...
# # OK global.setns_einval
# ok 1 global.setns_einval
# # RUN current_nsset.invalid_flags ...
# # pidfd_setns_test.c:161:invalid_flags:Expected self->child_pid_exited
(0) > 0 (0)
# # OK current_nsset.invalid_flags
# ok 2 current_nsset.invalid_flags
# # RUN current_nsset.pidfd_exited_child ...
# # pidfd_setns_test.c:161:pidfd_exited_child:Expected
self->child_pid_exited (0) > 0 (0)
# # OK current_nsset.pidfd_exited_child
# ok 3 current_nsset.pidfd_exited_child
# # RUN current_nsset.pidfd_incremental_setns ...
# # pidfd_setns_test.c:161:pidfd_incremental_setns:Expected
self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to user namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to mnt namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to pid namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to uts namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to ipc namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to net namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to cgroup namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:408:pidfd_incremental_setns:Managed to correctly
setns to pid_for_children namespace of 31570 via pidfd 20
# # pidfd_setns_test.c:391:pidfd_incremental_setns:Expected
setns(self->child_pidfd1, info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:392:pidfd_incremental_setns:Too many users -
Failed to setns to time namespace of 31570 via pidfd 20
# # pidfd_incremental_setns: Test terminated by timeout
# # FAIL current_nsset.pidfd_incremental_setns
# not ok 4 current_nsset.pidfd_incremental_setns
# # RUN current_nsset.nsfd_incremental_setns ...
# # pidfd_setns_test.c:161:nsfd_incremental_setns:Expected
self->child_pid_exited (0) > 0 (0)
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to user namespace of 31577 via nsfd 19
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to mnt namespace of 31577 via nsfd 24
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to pid namespace of 31577 via nsfd 27
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to uts namespace of 31577 via nsfd 30
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to ipc namespace of 31577 via nsfd 33
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to net namespace of 31577 via nsfd 36
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to cgroup namespace of 31577 via nsfd 39
# # pidfd_setns_test.c:444:nsfd_incremental_setns:Managed to correctly
setns to pid_for_children namespace of 31577 via nsfd 42
# # pidfd_setns_test.c:427:nsfd_incremental_setns:Expected
setns(self->child_nsfds1[i], info->flag) (-1) == 0 (0)
# # pidfd_setns_test.c:428:nsfd_incremental_setns:Too many users -
Failed to setns to time namespace of 31577 via nsfd 45
```
Ctrl + c gets me back to the shell prompt.
```
$ ps auxf
[…]
build 31574 0.0 0.0 2528 1024 pts/0 D 09:57 0:00
./pidfd_setns_test
build 31575 99.9 0.0 2528 1024 pts/0 R 09:57 37:37 \_
./pidfd_setns_test
build 31576 0.0 0.0 0 0 pts/0 Z 09:57 0:00
\_ [pidfd_setns_tes] <defunct>
build 31577 0.0 0.0 0 0 pts/0 Z 09:57 0:00
\_ [pidfd_setns_tes] <defunct>
build 31578 0.0 0.0 2528 512 pts/0 S 09:57 0:00
\_ ./pidfd_setns_test
```
During shutdown, the system waits for that process:
[42733.704347] systemd-shutdown[1]: Waiting for process: 31578
(pidfd_setns_tes)
This is commit 8d025e2092e2 with one patch on top:
$ git log --no-decorate --oneline -2 a2ce022afcbb
a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated
*.mod.c intermediaries
8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
KCSAN is not enabled though.
Kind regards,
Paul
As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:
- The forwarding selftests use a configuration file where names of
interfaces are defined and various variables can be overridden. There
is also forwarding.config.sample that users can use as a template to
refer to when creating the config file. What happens a fair bit is
that users either do not know about this at all, or simply forget, and
are confused by cryptic failures about interfaces that cannot be
created.
In patches #1 - #3 have lib.sh just be the single source of truth with
regards to which variables exist. That includes the topology variables
which were previously only in the sample file, and any "tweak
variables", such as what tools to use, sleep times, etc.
forwarding.config.sample then becomes just a placeholder with a couple
examples. Unless specific HW should be exercised, or specific tools
used, the defaults are usually just fine.
- Several net/forwarding/ selftests (and one net/ one) cannot be run on
veth pairs, they need an actual HW interface to run on. They are
generic in the sense that any capable HW should pass them, which is
why they have been put to net/forwarding/ as opposed to drivers/net/,
but they do not generalize to veth. The fact that these tests are in
net/forwarding/, but still complaining when run, is confusing.
In patches #4 - #6 move these tests to a new directory
drivers/net/hw.
- The following patches extend the codebase to handle well test results
other than pass and fail.
Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
so that tests do not spuriously end up returning non-0 when they
are not supposed to.
In patches #8 - #10, introduce some missing ksft constants, then support
having those constants in RET, and then finally in EXIT_STATUS.
- The traffic scheduler tests generate a large amount of network traffic
to test the behavior of the scheduler. This demands a relatively
high-performance computer. On slow machines, such as with a debugging
kernel, the test would spuriously fail.
It can still be useful to "go through the motions" though, to possibly
catch bugs in setup of the scheduler graph and passing packets around.
Thus we still want to run the tests, just with lowered demands.
To that end, in patches #11 - #12, introduce an environment variable
KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
is provided to mark performance-sensitive parts of the selftest.
- In patch #13, use a similar mechanism to mark a NH group stats
selftest to XFAIL HW stats tests when run on VETH pairs.
- All these changes complicate the hitherto straightforward logging and
checking logic, so in patch #14, add a selftest that checks this
functionality in lib.sh.
v1 (vs. an RFC circulated through linux-kselftest):
- Patch #9:
- Clarify intended usage by s/set_ret/ret_set_ksft_status/,
s/nret/ksft_status/
Petr Machata (14):
selftests: net: libs: Change variable fallback syntax
selftests: forwarding.config.sample: Move overrides to lib.sh
selftests: forwarding: README: Document customization
selftests: forwarding: ipip_lib: Do not import lib.sh
selftests: forwarding: Move several selftests
selftests: forwarding: Ditch skip_on_veth()
selftests: forwarding: Change inappropriate log_test_skip() calls
selftests: lib: Define more kselftest exit codes
selftests: forwarding: Have RET track kselftest framework constants
selftests: forwarding: Convert log_test() to recognize RET values
selftests: forwarding: Support for performance sensitive tests
selftests: forwarding: Mark performance-sensitive tests
selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
selftests: forwarding: Add a test for testing lib.sh functionality
.../testing/selftests/drivers/net/hw/Makefile | 25 ++
.../net/hw}/devlink_port_split.py | 0
.../forwarding => drivers/net/hw}/ethtool.sh | 5 +-
.../net/hw}/ethtool_extended_state.sh | 5 +-
.../net/hw}/ethtool_lib.sh | 0
.../net/hw}/ethtool_mm.sh | 3 +-
.../net/hw}/ethtool_rmon.sh | 7 +-
.../net/hw}/hw_stats_l3.sh | 19 +-
.../net/hw}/hw_stats_l3_gre.sh | 7 +-
.../forwarding => drivers/net/hw}/loopback.sh | 5 +-
.../testing/selftests/drivers/net/hw/settings | 1 +
.../selftests/drivers/net/mlxsw/mlxsw_lib.sh | 2 +-
.../net/mlxsw/spectrum-2/resource_scale.sh | 1 -
.../net/mlxsw/spectrum/resource_scale.sh | 1 -
tools/testing/selftests/net/Makefile | 1 -
.../testing/selftests/net/forwarding/Makefile | 9 +-
tools/testing/selftests/net/forwarding/README | 33 +++
.../net/forwarding/forwarding.config.sample | 53 ++--
.../selftests/net/forwarding/ipip_lib.sh | 1 -
tools/testing/selftests/net/forwarding/lib.sh | 255 +++++++++++++-----
.../selftests/net/forwarding/lib_sh_test.sh | 208 ++++++++++++++
.../net/forwarding/router_mpath_nh_lib.sh | 12 +-
.../selftests/net/forwarding/sch_ets_tests.sh | 19 +-
.../selftests/net/forwarding/sch_red.sh | 10 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 2 +-
.../selftests/net/forwarding/tc_common.sh | 2 +-
.../selftests/net/forwarding/tc_tunnel_key.sh | 2 -
tools/testing/selftests/net/lib.sh | 48 +++-
28 files changed, 565 insertions(+), 171 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/hw/Makefile
rename tools/testing/selftests/{net => drivers/net/hw}/devlink_port_split.py (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool.sh (98%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_extended_state.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_lib.sh (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_mm.sh (99%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_rmon.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3_gre.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/loopback.sh (92%)
create mode 100644 tools/testing/selftests/drivers/net/hw/settings
create mode 100755 tools/testing/selftests/net/forwarding/lib_sh_test.sh
--
2.43.0
Hi,
Commit 28b3df1fe6ba2cb4 ("kunit: add wireless unit tests") which I can't
seem to find on lore breaks full kunit runs on non-UML builds and is now
present in mainline. If I run:
./tools/testing/kunit/kunit.py run --alltests --cross_compile x86_64-linux-gnu- --arch x86_64
on a clean tree then I get:
[15:09:20] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=x86_64 O=.kunit olddefconfig CROSS_COMPILE=x86_64-linux-gnu-
ERROR:root:Not all Kconfig options selected in kunitconfig were in the generated .config.
This is probably due to unsatisfied dependencies.
Missing: CONFIG_IWLWIFI=y, CONFIG_WLAN_VENDOR_INTEL=y
UML works fine, but other real architectures (eg, arm64) seem similarly
broken. I've not looked properly yet, I'm a bit confused given that
there's not even any dependencies for WLAN_VENDOR_INTEL and it's not
mentoned in the defconfig.
Thanks,
Mark
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (13):
queue_api: define queue api
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 256 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/netdevice.h | 3 +
include/linux/skbuff.h | 73 +++-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/netdev_queues.h | 27 ++
include/net/netdev_rx_queue.h | 2 +
include/net/netmem.h | 234 +++++++++-
include/net/page_pool/helpers.h | 155 +++++--
include/net/page_pool/types.h | 33 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 2 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 425 ++++++++++++++++++
net/core/gro.c | 8 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 107 +++++
net/core/page_pool.c | 364 +++++++++-------
net/core/skbuff.c | 110 ++++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 2 +-
net/ipv4/tcp.c | 254 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 9 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 2 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++
46 files changed, 2776 insertions(+), 277 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.44.0.396.g6e790dbe36-goog
Cleaning up after tests is implemented separately for individual tests
and called at the end of each test execution. Since these functions are
very similar and a more generalized test framework was introduced a
function pointer in the resctrl_test struct can be used to reduce the
amount of function calls.
These functions are also all called in the ctrl-c handler because the
handler isn't aware which test is currently running. Since the handler
is implemented with a sigaction no function parameters can be passed
there but information about what test is currently running can be passed
with a global variable.
Series applies cleanly on top of kselftests/next.
Changelog v5:
- Rebase onto kselftests/next.
- Add Reinette's reviewed-by tag.
Changelog v4:
- Check current_test pointer and reset it in signal unregistering.
- Move cleanup call to test_cleanup function.
Changelog v3:
- Make current_test static.
- Add callback NULL check to the ctrl-c handler.
Changelog v2:
- Make current_test a const pointer limited in scope to resctrl_val
file.
- Remove tests_cleanup from resctrl.h.
- Cleanup 'goto out' path and labels in individual test functions.
Older versions of this series:
[v1] https://lore.kernel.org/all/cover.1708434017.git.maciej.wieczor-retman@inte…
[v2] https://lore.kernel.org/all/cover.1708596015.git.maciej.wieczor-retman@inte…
[v3] https://lore.kernel.org/all/cover.1708599491.git.maciej.wieczor-retman@inte…
[v4] https://lore.kernel.org/all/cover.1708949785.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (3):
selftests/resctrl: Add cleanup function to test framework
selftests/resctrl: Simplify cleanup in ctrl-c handler
selftests/resctrl: Move cleanups out of individual tests
tools/testing/selftests/resctrl/cat_test.c | 8 +++-----
tools/testing/selftests/resctrl/cmt_test.c | 4 ++--
tools/testing/selftests/resctrl/mba_test.c | 8 +++-----
tools/testing/selftests/resctrl/mbm_test.c | 8 +++-----
tools/testing/selftests/resctrl/resctrl.h | 9 +++------
.../testing/selftests/resctrl/resctrl_tests.c | 20 +++++++------------
tools/testing/selftests/resctrl/resctrl_val.c | 8 ++++++--
7 files changed, 27 insertions(+), 38 deletions(-)
--
2.43.2
This is required, as CONFIG_DAMON_DEBUGFS is enabled, and --alltests UML
builds will fail due to the missing config option otherwise.
Fixes: f4cba4bf6777 ("mm/damon: rename CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED")
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is breaking all UML alltests builds, so we'd like to fix it sooner
rather than later. SeongJae, would you rather take this yourself, or can
we push it alongside any other KUnit fixes?
Johannes: Does this conflict with the CONFIG_NETDEVICES / CONFIG_WLAN
fixes to all_tests.config? I'd assume not, but I'm happy to take them
via KUnit if you'd prefer anyway.
Thanks,
-- David
---
tools/testing/kunit/configs/all_tests.config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index aa5ec149f96c..f388742cf266 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -38,6 +38,7 @@ CONFIG_DAMON_VADDR=y
CONFIG_DAMON_PADDR=y
CONFIG_DEBUG_FS=y
CONFIG_DAMON_DBGFS=y
+CONFIG_DAMON_DBGFS_DEPRECATED=y
CONFIG_REGMAP_BUILD=y
--
2.44.0.396.g6e790dbe36-goog
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to pass p_off parameter in *_gro_complete.
Next, the fields network_offset and inner_network_offset are added to
napi_gro_cb, and are both set during the receive phase of GRO. This is then
leveraged in the next commit to remove flush_id state from napi_gro_cb, and
stateful code in {ipv6,inet}_gro_receive which may be unnecessarily
complicated due to encapsulation support in GRO.
In addition, udpgro_fwd selftest is adjusted to include the socket lookup
case for vxlan. This selftest will test its supposed functionality once
local bind support is merged (https://lore.kernel.org/netdev/df300a49-7811-4126-a56a-a77100c8841b@gmail.c…).
v3 -> v4:
- Fix code comment and commit message typos
- v3:
https://lore.kernel.org/all/f939c84a-2322-4393-a5b0-9b1e0be8ed8e@gmail.com/
v2 -> v3:
- Use napi_gro_cb instead of skb->{offset}
- v2:
https://lore.kernel.org/netdev/2ce1600b-e733-448b-91ac-9d0ae2b866a4@gmail.c…
v1 -> v2:
- Pass p_off in *_gro_complete to fix UDP bug
- Remove more conditionals and memory fetches from inet_gro_flush
- v1:
https://lore.kernel.org/netdev/e1d22505-c5f8-4c02-a997-64248480338b@gmail.c…
Richard Gobert (4):
net: gro: add p_off param in *_gro_complete
selftests/net: add local address bind in vxlan selftest
net: gro: add {inner_}network_offset to napi_gro_cb
net: gro: move L3 flush checks to tcp_gro_receive
drivers/net/geneve.c | 7 +-
drivers/net/vxlan/vxlan_core.c | 11 ++--
include/linux/etherdevice.h | 2 +-
include/linux/netdevice.h | 3 +-
include/linux/udp.h | 2 +-
include/net/gro.h | 36 +++++++----
include/net/inet_common.h | 2 +-
include/net/tcp.h | 6 +-
include/net/udp.h | 8 +--
include/net/udp_tunnel.h | 2 +-
net/8021q/vlan_core.c | 6 +-
net/core/gro.c | 6 +-
net/ethernet/eth.c | 5 +-
net/ipv4/af_inet.c | 49 ++------------
net/ipv4/fou_core.c | 9 +--
net/ipv4/gre_offload.c | 6 +-
net/ipv4/tcp_offload.c | 79 ++++++++++++++++++-----
net/ipv4/udp.c | 3 +-
net/ipv4/udp_offload.c | 26 ++++----
net/ipv6/ip6_offload.c | 41 +++++-------
net/ipv6/tcpv6_offload.c | 7 +-
net/ipv6/udp.c | 3 +-
net/ipv6/udp_offload.c | 13 ++--
tools/testing/selftests/net/udpgro_fwd.sh | 10 ++-
24 files changed, 187 insertions(+), 155 deletions(-)
--
2.36.1
The longest running netdevsim test, nexthop.sh, currently takes
5 min to finish. Around 260s to be exact, and 310s on a debug kernel.
The default timeout in selftest is 45sec, so we need an explicit
config. Give ourselves some headroom and use 10min.
Commit under Fixes isn't really to "blame" but prior to that
netdevsim tests weren't integrated with kselftest infra
so blaming the tests themselves doesn't seem right, either.
Fixes: 8ff25dac88f6 ("netdevsim: add Makefile for selftests")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: dw(a)davidwei.uk
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/netdevsim/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/drivers/net/netdevsim/settings
diff --git a/tools/testing/selftests/drivers/net/netdevsim/settings b/tools/testing/selftests/drivers/net/netdevsim/settings
new file mode 100644
index 000000000000..a62d2fa1275c
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netdevsim/settings
@@ -0,0 +1 @@
+timeout=600
--
2.44.0
This cleans up the output of the tty_tstamp_update selftest to play a
bit more nicely with automated systems parsing the test output.
To do this I've also added a new helper ksft_test_result() which takes a
KSFT_ code as a report, this is something I've wanted on other occasions
but restructured things to avoid needing it. This time I figured I'd
just add it since it keeps coming up.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest: Add mechanism for reporting a KSFT_ result code
kselftest/tty: Report a consistent test name for the one test we run
tools/testing/selftests/kselftest.h | 22 ++++++++++++
tools/testing/selftests/tty/tty_tstamp_update.c | 48 +++++++++++++++++--------
2 files changed, 55 insertions(+), 15 deletions(-)
---
base-commit: 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478
change-id: 20240305-kselftest-tty-tname-5411444ce037
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.9-rc1.
- Link to v1: https://lore.kernel.org/r/20231219-b4-kselftest-seccomp-benchmark-timeout-v…
---
tools/testing/selftests/seccomp/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/seccomp/settings b/tools/testing/selftests/seccomp/settings
index 6091b45d226b..a953c96aa16e 100644
--- a/tools/testing/selftests/seccomp/settings
+++ b/tools/testing/selftests/seccomp/settings
@@ -1 +1 @@
-timeout=120
+timeout=180
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231219-b4-kselftest-seccomp-benchmark-timeout-05b66e7d29d1
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland(a)arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-trace-kernel(a)vger.kernel.org
---
.../selftests/ftrace/test.d/filter/event-filter-function.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
index 2de7c61d1ae3..3f74c09c56b6 100644
--- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -24,7 +24,7 @@ echo 0 > events/enable
echo "Get the most frequently calling function"
sample_events
-target_func=`cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
+target_func=`cat trace | grep -o 'call_site=\([^+]*\)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'`
if [ -z "$target_func" ]; then
exit_fail
fi
--
2.39.2
As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:
- The forwarding selftests use a configuration file where names of
interfaces are defined and various variables can be overridden. There
is also forwarding.config.sample that users can use as a template to
refer to when creating the config file. What happens a fair bit is
that users either do not know about this at all, or simply forget, and
are confused by cryptic failures about interfaces that cannot be
created.
In patches #1 - #3 have lib.sh just be the single source of truth with
regards to which variables exist. That includes the topology variables
which were previously only in the sample file, and any "tweak
variables", such as what tools to use, sleep times, etc.
forwarding.config.sample then becomes just a placeholder with a couple
examples. Unless specific HW should be exercised, or specific tools
used, the defaults are usually just fine.
- Several net/forwarding/ selftests (and one net/ one) cannot be run on
veth pairs, they need an actual HW interface to run on. They are
generic in the sense that any capable HW should pass them, which is
why they have been put to net/forwarding/ as opposed to drivers/net/,
but they do not generalize to veth. The fact that these tests are in
net/forwarding/, but still complaining when run, is confusing.
In patches #4 - #6 move these tests to a new directory
drivers/net/hw.
- The following patches extend the codebase to handle well test results
other than pass and fail.
Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
so that tests do not spuriously end up returning non-0 when they
are not supposed to.
In patches #8 - #10, introduce some missing ksft constants, then support
having those constants in RET, and then finally in EXIT_STATUS.
- The traffic scheduler tests generate a large amount of network traffic
to test the behavior of the scheduler. This demands a relatively
high-performance computer. On slow machines, such as with a debugging
kernel, the test would spuriously fail.
It can still be useful to "go through the motions" though, to possibly
catch bugs in setup of the scheduler graph and passing packets around.
Thus we still want to run the tests, just with lowered demands.
To that end, in patches #11 - #12, introduce an environment variable
KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
is provided to mark performance-sensitive parts of the selftest.
- In patch #13, use a similar mechanism to mark a NH group stats
selftest to XFAIL HW stats tests when run on VETH pairs.
- All these changes complicate the hitherto straightforward logging and
checking logic, so in patch #14, add a selftest that checks this
functionality in lib.sh.
Petr Machata (14):
selftests: net: libs: Change variable fallback syntax
selftests: forwarding.config.sample: Move overrides to lib.sh
selftests: forwarding: README: Document customization
selftests: forwarding: ipip_lib: Do not import lib.sh
selftests: forwarding: Move several selftests
selftests: forwarding: Ditch skip_on_veth()
selftests: forwarding: Change inappropriate log_test_skip() calls
selftests: lib: Define more kselftest exit codes
selftests: forwarding: Have RET track kselftest framework constants
selftests: forwarding: Convert log_test() to recognize RET values
selftests: forwarding: Support for performance sensitive tests
selftests: forwarding: Mark performance-sensitive tests
selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
selftests: forwarding: Add a test for testing lib.sh functionality
.../testing/selftests/drivers/net/hw/Makefile | 25 ++
.../net/hw}/devlink_port_split.py | 0
.../forwarding => drivers/net/hw}/ethtool.sh | 5 +-
.../net/hw}/ethtool_extended_state.sh | 5 +-
.../net/hw}/ethtool_lib.sh | 0
.../net/hw}/ethtool_mm.sh | 3 +-
.../net/hw}/ethtool_rmon.sh | 7 +-
.../net/hw}/hw_stats_l3.sh | 19 +-
.../net/hw}/hw_stats_l3_gre.sh | 7 +-
.../forwarding => drivers/net/hw}/loopback.sh | 5 +-
.../testing/selftests/drivers/net/hw/settings | 1 +
.../selftests/drivers/net/mlxsw/mlxsw_lib.sh | 2 +-
.../net/mlxsw/spectrum-2/resource_scale.sh | 1 -
.../net/mlxsw/spectrum/resource_scale.sh | 1 -
tools/testing/selftests/net/Makefile | 1 -
.../testing/selftests/net/forwarding/Makefile | 9 +-
tools/testing/selftests/net/forwarding/README | 33 +++
.../net/forwarding/forwarding.config.sample | 53 ++--
.../selftests/net/forwarding/ipip_lib.sh | 1 -
tools/testing/selftests/net/forwarding/lib.sh | 255 +++++++++++++-----
.../selftests/net/forwarding/lib_sh_test.sh | 208 ++++++++++++++
.../net/forwarding/router_mpath_nh_lib.sh | 12 +-
.../selftests/net/forwarding/sch_ets_tests.sh | 19 +-
.../selftests/net/forwarding/sch_red.sh | 10 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 2 +-
.../selftests/net/forwarding/tc_common.sh | 2 +-
.../selftests/net/forwarding/tc_tunnel_key.sh | 2 -
tools/testing/selftests/net/lib.sh | 48 +++-
28 files changed, 565 insertions(+), 171 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/hw/Makefile
rename tools/testing/selftests/{net => drivers/net/hw}/devlink_port_split.py (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool.sh (98%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_extended_state.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_lib.sh (100%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_mm.sh (99%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/ethtool_rmon.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3.sh (96%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/hw_stats_l3_gre.sh (92%)
rename tools/testing/selftests/{net/forwarding => drivers/net/hw}/loopback.sh (92%)
create mode 100644 tools/testing/selftests/drivers/net/hw/settings
create mode 100755 tools/testing/selftests/net/forwarding/lib_sh_test.sh
--
2.43.0
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v5->v6:
- Rebased to v6.9-rc1.
- Collect RB.
- Add multiple filter test.
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
v5: https://lore.kernel.org/all/20240229065625.114207-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 336 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 28 ++
4 files changed, 372 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first create the helper function which can be used
for the vpmu related tests. Then, it implement the test.
Changelog:
----------
v4->v5:
- Rebased to v6.8-rc6.
- Refactor the helper function, make it fine-grained and easy to be used.
- Namimg improvements.
- Use the kvm_device_attr_set() helper.
- Make the test descriptor array readable and clean.
- Delete the patch which moves the pmu related helper to vpmu.h.
- Remove the kvm_supports_pmu_event_filter() function since nobody will run
this on a old kernel.
v3->v4:
- Rebased to the v6.8-rc2.
v2->v3:
- Check the pmceid in guest code instead of pmu event count since different
hardware may have different event count result, check pmceid makes it stable
on different platform. [Eric]
- Some typo fixed and commit message improved.
v1->v2:
- Improve the commit message. [Eric]
- Fix the bug in [enable|disable]_counter. [Raghavendra & Marc]
- Add the check if kvm has attr KVM_ARM_VCPU_PMU_V3_FILTER.
- Add if host pmu support the test event throught pmceid0.
- Split the test_invalid_filter() to another patch. [Eric]
v1: https://lore.kernel.org/all/20231123063750.2176250-1-shahuang@redhat.com/
v2: https://lore.kernel.org/all/20231129072712.2667337-1-shahuang@redhat.com/
v3: https://lore.kernel.org/all/20240116060129.55473-1-shahuang@redhat.com/
v4: https://lore.kernel.org/all/20240202025659.5065-1-shahuang@redhat.com/
Shaoqin Huang (3):
KVM: selftests: aarch64: Add helper function for the vpmu vcpu
creation
KVM: selftests: aarch64: Introduce pmu_event_filter_test
KVM: selftests: aarch64: Add invalid filter test in
pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/aarch64/pmu_event_filter_test.c | 325 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 33 +-
.../selftests/kvm/include/aarch64/vpmu.h | 29 ++
4 files changed, 362 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
--
2.40.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
This series has some conflicts with SNC series from Maciej and also
with the MBA/MBM series from Babu.
--
i.
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (13):
selftests/resctrl: Convert get_mem_bw_imc() fd close to for loop
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 21 +-
tools/testing/selftests/resctrl/mba_test.c | 34 ++-
tools/testing/selftests/resctrl/mbm_test.c | 33 ++-
tools/testing/selftests/resctrl/resctrl.h | 48 ++--
tools/testing/selftests/resctrl/resctrl_val.c | 269 ++++++------------
tools/testing/selftests/resctrl/resctrlfs.c | 55 ++--
8 files changed, 224 insertions(+), 247 deletions(-)
--
2.39.2
Currently, VA exhaustion is being checked by passing a hint to mmap() and
expecting it to fail. This patch makes a stricter test by successful
write() calls from /proc/self/maps to a dump file, confirming that a
free chunk is indeed not available.
Changes in v2:
- Replace SZ_1GB with MAP_CHUNK_SIZE, tidy-up
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240314122250.68534-1-dev.jain@arm.com/
.../selftests/mm/virtual_address_range.c | 66 +++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 7bcf8d48256a..050e997e3be2 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -12,6 +12,8 @@
#include <errno.h>
#include <sys/mman.h>
#include <sys/time.h>
+#include <fcntl.h>
+
#include "../kselftest.h"
/*
@@ -93,6 +95,66 @@ static int validate_lower_address_hint(void)
return 1;
}
+static int validate_complete_va_space(void)
+{
+ unsigned long start_addr, end_addr, prev_end_addr;
+ char line[400];
+ char prot[6];
+ FILE *file;
+ int fd;
+
+ fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
+ unlink("va_dump");
+ if (fd < 0) {
+ ksft_test_result_skip("cannot create or open dump file\n");
+ ksft_finished();
+ }
+
+ file = fopen("/proc/self/maps", "r");
+ if (file == NULL)
+ ksft_exit_fail_msg("cannot open /proc/self/maps\n");
+
+ prev_end_addr = 0;
+ while (fgets(line, sizeof(line), file)) {
+ unsigned long hop;
+
+ if (sscanf(line, "%lx-%lx %s[rwxp-]",
+ &start_addr, &end_addr, prot) != 3)
+ ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
+
+ /* end of userspace mappings; ignore vsyscall mapping */
+ if (start_addr & (1UL << 63))
+ return 0;
+
+ /* /proc/self/maps must have gaps less than MAP_CHUNK_SIZE */
+ if (start_addr - prev_end_addr >= MAP_CHUNK_SIZE)
+ return 1;
+
+ prev_end_addr = end_addr;
+
+ if (prot[0] != 'r')
+ continue;
+
+ /*
+ * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
+ * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
+ * addresses after that. If the address was not held by this
+ * process, write would fail with errno set to EFAULT.
+ * Anyways, if write returns anything apart from 1, exit the
+ * program since that would mean a bug in /proc/self/maps.
+ */
+ hop = 0;
+ while (start_addr + hop < end_addr) {
+ if (write(fd, (void *)(start_addr + hop), 1) != 1)
+ return 1;
+ lseek(fd, 0, SEEK_SET);
+
+ hop += MAP_CHUNK_SIZE;
+ }
+ }
+ return 0;
+}
+
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
@@ -135,6 +197,10 @@ int main(int argc, char *argv[])
validate_addr(hptr[i], 1);
}
hchunks = i;
+ if (validate_complete_va_space()) {
+ ksft_test_result_fail("BUG in mmap() or /proc/self/maps\n");
+ ksft_finished();
+ }
for (i = 0; i < lchunks; i++)
munmap(ptr[i], MAP_CHUNK_SIZE);
--
2.34.1
Currently, VA exhaustion is being checked by passing a hint to mmap() and
expecting it to fail. This patch makes a stricter test by successful write()
calls from /proc/self/maps to a dump file, confirming that a free chunk is
indeed not available.
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240314122250.68534-1-dev.jain@arm.com/
.../selftests/mm/virtual_address_range.c | 69 +++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index 7bcf8d48256a..31063613dfd9 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -12,6 +12,8 @@
#include <errno.h>
#include <sys/mman.h>
#include <sys/time.h>
+#include <fcntl.h>
+
#include "../kselftest.h"
/*
@@ -93,6 +95,69 @@ static int validate_lower_address_hint(void)
return 1;
}
+static int validate_complete_va_space(void)
+{
+ unsigned long start_addr, end_addr, prev_end_addr;
+ char line[400];
+ char prot[6];
+ FILE *file;
+ int fd;
+
+ fd = open("va_dump", O_CREAT | O_WRONLY, 0600);
+ unlink("va_dump");
+ if (fd < 0) {
+ ksft_test_result_skip("cannot create or open dump file\n");
+ ksft_finished();
+ }
+
+ file = fopen("/proc/self/maps", "r");
+ if (file == NULL)
+ ksft_exit_fail_msg("cannot open /proc/self/maps\n");
+
+ prev_end_addr = 0;
+ while (fgets(line, sizeof(line), file)) {
+ unsigned long hop;
+ int ret;
+
+ ret = sscanf(line, "%lx-%lx %s[rwxp-]",
+ &start_addr, &end_addr, prot);
+ if (ret != 3)
+ ksft_exit_fail_msg("sscanf failed, cannot parse\n");
+
+ /* end of userspace mappings; ignore vsyscall mapping */
+ if (start_addr & (1UL << 63))
+ return 0;
+
+ /* /proc/self/maps must have gaps less than 1GB only */
+ if (start_addr - prev_end_addr >= SZ_1GB)
+ return 1;
+
+ prev_end_addr = end_addr;
+
+ if (prot[0] != 'r')
+ continue;
+
+ /*
+ * Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
+ * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
+ * addresses after that. If the address was not held by this
+ * process, write would fail with errno set to EFAULT.
+ * Anyways, if write returns anything apart from 1, exit the
+ * program since that would mean a bug in /proc/self/maps.
+ */
+ hop = 0;
+ while (start_addr + hop < end_addr) {
+ if (write(fd, (void *)(start_addr + hop), 1) != 1)
+ return 1;
+ else
+ lseek(fd, 0, SEEK_SET);
+
+ hop += MAP_CHUNK_SIZE;
+ }
+ }
+ return 0;
+}
+
int main(int argc, char *argv[])
{
char *ptr[NR_CHUNKS_LOW];
@@ -135,6 +200,10 @@ int main(int argc, char *argv[])
validate_addr(hptr[i], 1);
}
hchunks = i;
+ if (validate_complete_va_space()) {
+ ksft_test_result_fail("BUG in mmap() or /proc/self/maps\n");
+ ksft_finished();
+ }
for (i = 0; i < lchunks; i++)
munmap(ptr[i], MAP_CHUNK_SIZE);
--
2.34.1
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature will be available in the
next version of "AMD64 Architecture Programmer’s Manual".
Testing Done:
Added a selftest to test the Idle HLT intercept functionality.
Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
tools: Add KVM exit reason for the Idle HLT
selftests: Add an interface to read the data of named vcpu stat
selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 11 +-
tools/arch/x86/include/uapi/asm/svm.h | 2 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 11 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 41 ++++++
.../selftests/kvm/x86_64/svm_idlehlt_test.c | 119 ++++++++++++++++++
9 files changed, 186 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
base-commit: fdd58834d132046149699b88a27a0db26829f4fb
--
2.34.1
Hi,
While running kselftest on vanilla torvalds tree kernel commit v6.8-11167-g4438a810f396,
the test suite reported a number of errors.
I was using the latest iproute2-next suite on an Ubuntu 22.04 LTS box.
# Tests passed: 558
# Tests failed: 84
not ok 90 selftests: net: test_vxlan_mdb.sh # exit=1
495:# TEST: Destination IP - match [FAIL]
496:# TEST: Destination IP - no match [FAIL]
497:# TEST: Default destination port - match [FAIL]
498:# TEST: Default destination port - no match [FAIL]
499:# TEST: Non-default destination port - match [FAIL]
500:# TEST: Non-default destination port - no match [FAIL]
501:# TEST: Default destination VNI - match [FAIL]
502:# TEST: Default destination VNI - no match [FAIL]
503:# TEST: Non-default destination VNI - match [FAIL]
504:# TEST: Non-default destination VNI - no match [FAIL]
521:# TEST: Destination IP - match [FAIL]
522:# TEST: Destination IP - no match [FAIL]
523:# TEST: Default destination port - match [FAIL]
524:# TEST: Default destination port - no match [FAIL]
525:# TEST: Non-default destination port - match [FAIL]
526:# TEST: Non-default destination port - no match [FAIL]
527:# TEST: Default destination VNI - match [FAIL]
528:# TEST: Default destination VNI - no match [FAIL]
529:# TEST: Non-default destination VNI - match [FAIL]
530:# TEST: Non-default destination VNI - no match [FAIL]
549:# TEST: Forward valid source - first VTEP [FAIL]
550:# TEST: Forward valid source - second VTEP [FAIL]
551:# TEST: Block excluded source after removal - first VTEP [FAIL]
552:# TEST: Block excluded source after removal - second VTEP [FAIL]
553:# TEST: Forward valid source after removal - first VTEP [FAIL]
554:# TEST: Forward valid source after removal - second VTEP [FAIL]
571:# TEST: Forward valid source - first VTEP [FAIL]
572:# TEST: Forward valid source - second VTEP [FAIL]
573:# TEST: Block excluded source after removal - first VTEP [FAIL]
574:# TEST: Block excluded source after removal - second VTEP [FAIL]
575:# TEST: Forward valid source after removal - first VTEP [FAIL]
576:# TEST: Forward valid source after removal - second VTEP [FAIL]
593:# TEST: Forward valid source - first VTEP [FAIL]
594:# TEST: Forward valid source - second VTEP [FAIL]
595:# TEST: Block excluded source after removal - first VTEP [FAIL]
596:# TEST: Block excluded source after removal - second VTEP [FAIL]
597:# TEST: Forward valid source after removal - first VTEP [FAIL]
598:# TEST: Forward valid source after removal - second VTEP [FAIL]
615:# TEST: Forward valid source - first VTEP [FAIL]
616:# TEST: Forward valid source - second VTEP [FAIL]
617:# TEST: Block excluded source after removal - first VTEP [FAIL]
618:# TEST: Block excluded source after removal - second VTEP [FAIL]
619:# TEST: Forward valid source after removal - first VTEP [FAIL]
620:# TEST: Forward valid source after removal - second VTEP [FAIL]
636:# TEST: Forward valid source [FAIL]
637:# TEST: Receive of valid source after removal from group [FAIL]
648:# TEST: Forward valid source [FAIL]
649:# TEST: Receive of valid source after removal from group [FAIL]
660:# TEST: Forward valid source [FAIL]
661:# TEST: Receive of valid source after removal from group [FAIL]
672:# TEST: Forward valid source [FAIL]
673:# TEST: Receive of valid source after removal from group [FAIL]
683:# TEST: Egress VNI translation - PVID configured [FAIL]
684:# TEST: Egress VNI translation - no PVID configured [FAIL]
685:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
695:# TEST: Egress VNI translation - PVID configured [FAIL]
696:# TEST: Egress VNI translation - no PVID configured [FAIL]
697:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
707:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
709:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
710:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
711:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
712:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
713:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
714:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
715:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
716:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
734:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
736:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
737:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
738:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
739:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
740:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
741:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
742:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
743:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
761:# TEST: IP multicast - first VTEP [FAIL]
763:# TEST: Broadcast - first VTEP [FAIL]
765:# TEST: IP multicast after removal - first VTEP [FAIL]
766:# TEST: IP multicast after removal - second VTEP [FAIL]
779:# TEST: IP multicast - first VTEP [FAIL]
781:# TEST: Broadcast - first VTEP [FAIL]
783:# TEST: IP multicast after removal - first VTEP [FAIL]
784:# TEST: IP multicast after removal - second VTEP [FAIL]
The problem is present at least since 6.8-rc7.
Please find attached the config and the full output of test_vxlan_mdb.sh.
Hope this helps.
Best regards,
Mirsad Todorovac
New version of the sleepable bpf_timer code, without the HID changes, as
they can now go through the HID tree independantly.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (6):
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 132 ++++++++++++++++++++-
kernel/bpf/verifier.c | 92 +++++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/bpf_experimental.h | 4 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 1 +
tools/testing/selftests/bpf/progs/timer.c | 40 ++++++-
tools/testing/selftests/bpf/progs/timer_failure.c | 114 +++++++++++++++++-
11 files changed, 387 insertions(+), 11 deletions(-)
---
base-commit: 9187210eee7d87eea37b45ea93454a88681894a4
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a pasid.
This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address
space or nested on top of an GPA address space. This series only add VT-d
support as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Change log:
v1:
- Implemnet iommu_replace_device_pasid() to fall back to the original domain
if this replacement failed (Kevin)
- Add check in do_attach() to check corressponding attach_fn per the pasid value.
rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 50 +++--
drivers/iommu/iommufd/iommufd_private.h | 23 +++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++
drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 785 insertions(+), 42 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c
--
2.34.1
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Acked-by: Stanislav Fomichev <sdf(a)google.com>
---
V2 -> V3: No changes
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: c733239f8f530872a1f80d8c45dcafbaff368737
--
2.40.1
On riscv, mmap currently returns an address from the largest address
space that can fit entirely inside of the hint address. This makes it
such that the hint address is almost never returned. This patch raises
the mappable area up to and including the hint address. This allows mmap
to often return the hint address, which allows a performance improvement
over searching for a valid address as well as making the behavior more
similar to other architectures.
Note that a previous patch introduced stronger semantics compared to
other architectures for riscv mmap. On riscv, mmap will not use bits in
the upper bits of the virtual address depending on the hint address. On
other architectures, a random address is returned in the address space
requested. On all architectures the hint address will be returned if it
is available. This allows riscv applications to configure how many bits
in the virtual address should be left empty. This has the two benefits
of being able to request address spaces that are smaller than the
default and doesn't require the application to know the page table
layout of riscv.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Add back forgotten "mmap_end = STACK_TOP_MAX"
- Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@…
---
Charlie Jenkins (3):
riscv: mm: Use hint address in mmap if available
selftests: riscv: Generalize mm selftests
docs: riscv: Define behavior of mmap
Documentation/arch/riscv/vm-layout.rst | 16 ++--
arch/riscv/include/asm/processor.h | 22 +++---
tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_default.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++-----------
5 files changed, 67 insertions(+), 104 deletions(-)
---
base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc
change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1
--
- Charlie
v3:
- pull forward to v6.8
- style and small fixups recommended by jcameron
- update syscall number (will do all archs when RFC tag drops)
- update for new folio code
- added OCP link to device-tracked address hotness proposal
- kept void* over __u64 simply because it integrates cleanly with
existing migration code. If there's strong opinions, I can refactor.
This patch set is a proposal for a syscall analogous to move_pages,
that migrates pages between NUMA nodes using physical addressing.
The intent is to better enable user-land system-wide memory tiering
as CXL devices begin to provide memory resources on the PCIe bus.
For example, user-land software which is making decisions based on
data sources which expose physical address information no longer
must convert that information to virtual addressing to act upon it
(see background for info on how physical addresses are acquired).
The syscall requires CAP_SYS_ADMIN, since physical address source
information is typically protected by the same (or CAP_SYS_NICE).
This patch set broken into 3 patches:
1) refactor of existing migration code for code reuse
2) The sys_move_phys_pages system call.
3) ktest of the syscall
The sys_move_phys_pages system call validates the page may be
migrated by checking migratable-status of each vma mapping the page,
and the intersection of cpuset policies each vma's task.
Background:
Userspace job schedulers, memory managers, and tiering software
solutions depend on page migration syscalls to reallocate resources
across NUMA nodes. Currently, these calls enable movement of memory
associated with a specific PID. Moves can be requested in coarse,
process-sized strokes (as with migrate_pages), and on specific virtual
pages (via move_pages).
However, a number of profiling mechanisms provide system-wide information
that would benefit from a physical-addressing version move_pages.
There are presently at least 4 ways userland can acquire physical
address information for use with this interface, and 1 hardware offload
mechanism being proposed by opencompute.
1) /proc/pid/pagemap: can be used to do page table translations.
This is only really useful for testing, and the ktest was
written using this functionality.
2) X86: IBS (AMD) and PEBS (Intel) can be configured to return physical
and/or vitual address information.
3) zoneinfo: /proc/zoneinfo exposes the start PFN of zones
4) /sys/kernel/mm/page_idle: A way to query whether a PFN is idle.
So long as the page size is known, this can be used to identify
system-wide idle pages that could be migrated to lower tiers.
https://docs.kernel.org/admin-guide/mm/idle_page_tracking.html
5) CXL Offloaded Hotness Monitoring (Proposed): a CXL memory device
may provide hot/cold information about its memory. For example,
it may report the hottest device addresses (0-based) or a physical
address (if it has access to decoders for convert bases).
DPA can be cheaply converted to HPA by combining it with data
exposed by /sys/bus/cxl/ information (region address bases).
See: https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Information from these sources facilitates systemwide resource management,
but with the limitations of migrate_pages and move_pages applying to
individual tasks, their outputs must be converted back to virtual addresses
and re-associated with specific PIDs.
Doing this reverse-translation outside of the kernel requires considerable
space and compute, and it will have to be performed again by the existing
system calls. Much of this work can be avoided if the pages can be
migrated directly with physical memory addressing.
Gregory Price (3):
mm/migrate: refactor add_page_for_migration for code re-use
mm/migrate: Create move_phys_pages syscall
ktest: sys_move_phys_pages ktest
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 5 +
include/uapi/asm-generic/unistd.h | 8 +-
kernel/sys_ni.c | 1 +
mm/migrate.c | 288 ++++++++++++++++++++----
tools/include/uapi/asm-generic/unistd.h | 8 +-
tools/testing/selftests/mm/migration.c | 99 ++++++++
8 files changed, 370 insertions(+), 41 deletions(-)
--
2.39.1
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..7989ec608454 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.34.1
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
The series adding SNC support to the kernel is currently in review [1]
but the selftests for resctrl need NUMA-aware adjustments to use these
changes. Issues with resctrl selftests not working properly with SNC
enabled were originally reported by Shaopeng Tan [2][3] and the
following series resolves them.
The main concept currently missing from resctrl selftests is that while
resctrl tracks memory accesses on a single NUMA node (which normally is
the same as the CPU socket) on machines with SNC enabled memory accesses
can leak outside of the local NUMA node and into other NUMA nodes on the
same socket. In that case resctrl could report a diminished value in one
of its monitoring technologies: Cache Monitoring Technology (CMT) or
Memory Bandwidth Monitoring (MBM) .
Implemented solutions for both CMT and MBM follow the same idea which is
to simply sum values reported by different NUMA nodes for a single
Resource Monitoring ID (RMID).
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240228112215.8044-tony.luck@intel.com/
[2] https://lore.kernel.org/all/TYAPR01MB6330B9B17686EF426D2C3F308B25A@TYAPR01M…
[3] https://lore.kernel.org/lkml/TYAPR01MB6330A4EB3633B791939EA45E8B39A@TYAPR01…
Maciej Wieczor-Retman (4):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: SNC support for CMT
selftests/resctrl: SNC support for MBM
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cache.c | 17 ++-
tools/testing/selftests/resctrl/cat_test.c | 2 +-
tools/testing/selftests/resctrl/cmt_test.c | 6 +-
tools/testing/selftests/resctrl/mba_test.c | 3 +-
tools/testing/selftests/resctrl/mbm_test.c | 4 +-
tools/testing/selftests/resctrl/resctrl.h | 13 +-
tools/testing/selftests/resctrl/resctrl_val.c | 46 ++++---
tools/testing/selftests/resctrl/resctrlfs.c | 128 +++++++++++++++++-
8 files changed, 185 insertions(+), 34 deletions(-)
--
2.44.0
Hi,
With the commit v6.8-11167-g4438a810f396 in vanilla torvalds tree, there seem to be problems with
the icmp_redirect.sh tests.
The iproute2-next tools were used, commit 7a6d30c95da9.
# timeout set to 3600
# selftests: net: icmp_redirect.sh
#
# ###########################################################################
# Legacy routing
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Legacy routing with VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# ###########################################################################
# Routing with nexthop objects and VRF
# ###########################################################################
#
# TEST: IPv4: redirect exception [FAIL]
# TEST: IPv6: redirect exception [ OK ]
# TEST: IPv4: redirect exception plus mtu [FAIL]
# TEST: IPv6: redirect exception plus mtu [ OK ]
# TEST: IPv4: routing reset [ OK ]
# TEST: IPv6: routing reset [ OK ]
# TEST: IPv4: mtu exception [ OK ]
# TEST: IPv6: mtu exception [ OK ]
# TEST: IPv4: mtu exception plus redirect [FAIL]
# TEST: IPv6: mtu exception plus redirect [ OK ]
#
# Tests passed: 28
# Tests failed: 12
# Tests xfailed: 0
not ok 45 selftests: net: icmp_redirect.sh # exit=1
These errors are not introduced with this commit, but were already present at least in 6.8-rc7.
Hope this helps.
Best regards,
Mirsad Todorovac
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
---
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: 4c8644f86c854c214aaabbcc24a27fa4c7e6a951
--
2.40.1
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v3:
- Changing from input() to stdin
- Add checking for non-regular files
- Spacing fix
- Small printing fix
tools/testing/kunit/kunit.py | 54 +++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..641b8ca83e3e 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,40 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append('/dev/stdin')
+ elif len(parsed_files) == 1 and parsed_files[0] == "debugfs":
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isdir(file):
+ print(f'Ignoring directory "{file}"')
+ elif os.path.exists(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ else:
+ print(f'Could not find "{file}"')
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +590,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.291.gc1ea87d7ee-goog
Add ability to parse multiple files. Additionally add the
ability to parse all results in the KUnit debugfs repository.
How to parse multiple files:
./tools/testing/kunit/kunit.py parse results.log results2.log
How to parse all files in directory:
./tools/testing/kunit/kunit.py parse directory_path/*
How to parse KUnit debugfs repository:
./tools/testing/kunit/kunit.py parse debugfs
For each file, the parser outputs the file name, results, and test
summary. At the end of all parsing, the parser outputs a total summary
line.
This feature can be easily tested on the tools/testing/kunit/test_data/
directory.
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Changes since v2:
- Fixed bug with input from command line. I changed this to use
input(). Daniel, let me know if this works for you.
- Add more specific warning messages
tools/testing/kunit/kunit.py | 56 +++++++++++++++++++++++++-----------
1 file changed, 40 insertions(+), 16 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index bc74088c458a..1aa3d736d80c 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -511,19 +511,42 @@ def exec_handler(cli_args: argparse.Namespace) -> None:
def parse_handler(cli_args: argparse.Namespace) -> None:
- if cli_args.file is None:
- sys.stdin.reconfigure(errors='backslashreplace') # type: ignore
- kunit_output = sys.stdin # type: Iterable[str]
- else:
- with open(cli_args.file, 'r', errors='backslashreplace') as f:
- kunit_output = f.read().splitlines()
- # We know nothing about how the result was created!
- metadata = kunit_json.Metadata()
- request = KunitParseRequest(raw_output=cli_args.raw_output,
- json=cli_args.json)
- result, _ = parse_tests(request, metadata, kunit_output)
- if result.status != KunitStatus.SUCCESS:
- sys.exit(1)
+ parsed_files = cli_args.files # type: List[str]
+ total_test = kunit_parser.Test()
+ total_test.status = kunit_parser.TestStatus.SUCCESS
+ if not parsed_files:
+ parsed_files.append(input("File path: "))
+
+ if parsed_files[0] == "debugfs" and len(parsed_files) == 1:
+ parsed_files.pop()
+ for (root, _, files) in os.walk("/sys/kernel/debug/kunit"):
+ parsed_files.extend(os.path.join(root, f) for f in files if f == "results")
+
+ if not parsed_files:
+ print("No files found.")
+
+ for file in parsed_files:
+ if os.path.isfile(file):
+ print(file)
+ with open(file, 'r', errors='backslashreplace') as f:
+ kunit_output = f.read().splitlines()
+ # We know nothing about how the result was created!
+ metadata = kunit_json.Metadata()
+ request = KunitParseRequest(raw_output=cli_args.raw_output,
+ json=cli_args.json)
+ _, test = parse_tests(request, metadata, kunit_output)
+ total_test.subtests.append(test)
+ elif os.path.isdir(file):
+ print("Ignoring directory ", file)
+ else:
+ print("Could not find ", file)
+
+ if len(parsed_files) > 1: # if more than one file was parsed output total summary
+ print('All files parsed.')
+ if not request.raw_output:
+ stdout.print_with_timestamp(kunit_parser.DIVIDER)
+ kunit_parser.bubble_up_test_results(total_test)
+ kunit_parser.print_summary_line(total_test)
subcommand_handlers_map = {
@@ -569,9 +592,10 @@ def main(argv: Sequence[str]) -> None:
help='Parses KUnit results from a file, '
'and parses formatted results.')
add_parse_opts(parse_parser)
- parse_parser.add_argument('file',
- help='Specifies the file to read results from.',
- type=str, nargs='?', metavar='input_file')
+ parse_parser.add_argument('files',
+ help='List of file paths to read results from or keyword'
+ '"debugfs" to read all results from the debugfs directory.',
+ type=str, nargs='*', metavar='input_files')
cli_args = parser.parse_args(massage_argv(argv))
base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2
--
2.44.0.278.ge034bb2e1d-goog