Hello Andrii Nakryiko,
This is a semi-automatic email about new static checker warnings.
Commit c381203eadb7 ("selftests/bpf: add trusted global subprog arg
tests") from Jan 29, 2024, leads to the following Smatch complaint:
./tools/testing/selftests/bpf/progs/verifier_global_ptr_args.c:88 trusted_task_arg_nonnull_fail2()
warn: variable dereferenced before check 'nullable' (see line 86)
./tools/testing/selftests/bpf/progs/verifier_global_ptr_args.c
85 /* should fail, PTR_TO_BTF_ID_OR_NULL */
86 res = subprog_trusted_task_nonnull(nullable);
^^^^^^^^
This is dereferenced
87
88 if (nullable)
^^^^^^^^
NULL check is too late
89 bpf_task_release(nullable);
90
regards,
dan carpenter
Hello,
KernelCI is hosting a bi-weekly call on Thursday to discuss improvements
to existing upstream tests, the development of new tests to increase
kernel testing coverage, and the enablement of these tests in KernelCI.
Below is a list of the tests the community has been working on and their
latest status updates, as discussed in the last meeting held on
2024-08-08:
*KTAP performance counters*
- Upcoming presentation @LPC2024 by Tim Bird on adding benchmark results
to KTAP: https://lpc.events/event/18/contributions/1791/
- Proposing new system to handle benchmark data, composed of 3 parts:
adding syntax to KTAP to support benchmark values, using a set of
external criteria for interpreting benchmark results, an automated tool
to determine and set the reference values used in these criteria.
- One related topic for discussion is where to store the reference files
and the test configuration, including all details that might impact the
results.
*Missing devices kselftest*
- Proposing new kselftest to report devices that go missing in the system:
https://lore.kernel.org/all/20240724-kselftest-dev-exist-v1-1-9bc21aa761b5@…
- Received feedback on the usability of the test and main unknowns about
management of reference files in tests
*Boot time test*
- RFC:
https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@collabora.com/…
- Got feedback on the potential use of scripts from pm-graph. The
bootgraph.py script could be adapted for the test with some
modifications, such as adding support for machine-readable output and
different ftrace configurations.
*Suspend/resume in cpufreq kselftest*
- Sent v2 for patch adding RTC wakeup alarm in the cpufreq kselftest:
https://lore.kernel.org/lkml/20240715192634.19272-1-shreeya.patel@collabora…
- Looking into using the sleepgraph.py script from pm-graph to calculate
suspend/hibernation and resume time:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
- Would require additional configs to be enabled in an automated
environment
*TAP conformance in kselftests and other fixes*
- Some dead selftests were removed:
https://lore.kernel.org/all/20240725110817.659099-1-usama.anjum@collabora.c…
and
https://lore.kernel.org/all/20240725121212.808206-1-usama.anjum@collabora.c…
- Bitmap test module conversion to KUnit was not accepted:
https://lore.kernel.org/lkml/49108735-c776-4b6f-8264-62a827dd7b26@collabora…
- Fixing error when kvm suite is built for unsupported architecure:
https://lore.kernel.org/kvm/c2aaa06e-e86d-4af9-bce4-6067e53cdf39@collabora.…
Please reply to this thread if you'd like to join the call or discuss any
of the topics further. We look forward to collaborating with the community
to improve upstream tests and expand coverage to more areas of interest
within the kernel.
Best regards,
Laura Nao
From: Allison Henderson <allison.henderson(a)oracle.com>
Hi All,
This series is a new selftest that Vegard, Chuck and myself have been
working on to provide some test coverage for rds. I've modified the
scripts to include the feedback from the last version, but let me know
if there's anything missed. Questions and comments appreciated.
Thanks everyone!
Allison
Changes in v2:
- Removed qemu vm creation and related code
- Updated README.txt with examples of running the test with virtme
- Removed init.sh. run.sh now directly calls test.py
- Some clean up done with the return code handling since there is no
vm between the scripts anymore
- Imported ip python function in
tools/testing/selftests/net/lib/py/utils.py into test.py
- Adapted test.py to use the imported ip function, and removed the
local ip wrapper
- Some line wrap clean up
- Link to v1:
https://lore.kernel.org/netdev/20240626012834.5678-3-allison.henderson@orac…
Vegard Nossum (3):
.gitignore: add .gcda files
net: rds: add option for GCOV profiling
selftests: rds: add testing infrastructure
.gitignore | 1 +
Documentation/dev-tools/gcov.rst | 11 +
MAINTAINERS | 1 +
net/rds/Kconfig | 9 +
net/rds/Makefile | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/net/rds/Makefile | 12 +
tools/testing/selftests/net/rds/README.txt | 41 ++++
tools/testing/selftests/net/rds/config.sh | 53 +++++
tools/testing/selftests/net/rds/run.sh | 224 ++++++++++++++++++
tools/testing/selftests/net/rds/test.py | 262 +++++++++++++++++++++
11 files changed, 620 insertions(+)
create mode 100644 tools/testing/selftests/net/rds/Makefile
create mode 100644 tools/testing/selftests/net/rds/README.txt
create mode 100755 tools/testing/selftests/net/rds/config.sh
create mode 100755 tools/testing/selftests/net/rds/run.sh
create mode 100644 tools/testing/selftests/net/rds/test.py
--
2.25.1
v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=*
====
v16 also got a very thorough review and some testing (thanks again!).
Thes version addresses all the concerns reported on v15, in terms of
feedback and issues reported.
Major changes:
- Use ASSERT_RTNL.
- Moved around some of the page_pool helpers definitions so I can hide
some netmem helpers in private files as Jakub suggested.
- Don't make every net_iov hold a ref on the binding as Jakub suggested.
- Fix issue reported by Taehee where we access queues after they have
been freed.
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v17/
v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=*
====
v15 got a thorough review and some testing, and this version addresses almost
all the feedback. Some more minor comments where the authors said it
could be done later, I left out.
Major changes:
- Addition of dma-buf introspection to page-pool-get and queue-get.
- Fixes to selftests suggested by Taehee.
- Fixes to documentation suggested by Donald.
- A couple of suggestions and fixes to TCP patches by Eric and David.
- Fixes to number assignements suggested by Arnd.
- Use rtnl_lock()ing to guard against queue reconfiguration while the
page_pool initialization is happening. (Jakub).
- Fixes to a few warnings reproduced by Taehee.
- Fixes to dma-buf binding suggested by Taehee and Jakub.
- Fixes to netlink UAPI suggested by Jakub
- Applied a number of Reviewed-bys and Acked-bys (including ones I lost
from v13+).
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v16/
One caveat: Taehee reproduced a KASAN warning and reported it here:
https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd…
I estimate the issue to be minor and easily fixable:
https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW…
I hope to be able to follow up with a fix to net tree as net-next closes
imminently, but if this iteration doesn't make it in, I will repost with
a fix squashed after net-next reopens, no problem.
v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=*
====
No material changes in this version, only a fix to linking against
libynl.a from the last version. Per Jakub's instructions I've pulled one
of his patches into this series, and now use the new libynl.a correctly,
I hope.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v15/
v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=…
====
No material changes in this version. Only rebase and re-verification on
top of net-next. v13, I think, raced with commit ebad6d0334793
("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to
net-next that caused a patchwork failure to apply. This series should
apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded
IP_GRE config").
I did not wait the customary 24hr as Jakub said it's OK to repost as soon
as I build test the rebased version:
https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
Cc: Taehee Yoo <ap420073(a)gmail.com>
Cc: Donald Hunter <donald.hunter(a)gmail.com>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (14):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: move dmaddr helpers to .c file
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
netdev: add dmabuf introspection
Documentation/netlink/specs/netdev.yaml | 61 +++
Documentation/networking/devmem.rst | 269 ++++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 9 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 128 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 164 +++++++-
include/net/page_pool/helpers.h | 42 +-
include/net/page_pool/types.h | 8 +
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 12 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 13 +
include/uapi/linux/uio.h | 17 +
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 7 +-
net/core/devmem.c | 378 +++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 111 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/netmem_priv.h | 36 ++
net/core/page_pool.c | 147 +++++--
net/core/page_pool_priv.h | 3 +
net/core/page_pool_user.c | 4 +
net/core/skbuff.c | 77 +++-
net/core/sock.c | 68 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 13 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 9 +
tools/testing/selftests/net/ncdevmem.c | 536 ++++++++++++++++++++++++
49 files changed, 2563 insertions(+), 121 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 net/core/netmem_priv.h
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.46.0.rc1.232.g9752f9e123-goog
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executable is run
from elsewhere. Changing it to $ORIGIN makes it resolve relative
to the directory in which the executable resides, which is supposedly
the desired behaviour.
Discovered by the check-rpaths script[1][2] that checks for insecure
RPATH/RUNPATH[3], such as relative directories, during an attempt
to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/bpf/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index dd49c1d23a60..6a3dc9b99159 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -241,7 +241,7 @@ $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_r
$(filter-out -static,$(CFLAGS) $(LDFLAGS)) $(filter %.c,$^) \
-lurandom_read $(filter-out -static,$(LDLIBS)) -L$(OUTPUT) \
-fuse-ld=$(LLD) -Wl,-znoseparate-code -Wl,--build-id=sha1 \
- -Wl,-rpath=. -o $@
+ -Wl,-rpath=\$$ORIGIN/ -o $@
$(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
$(call msg,SIGN-FILE,,$@)
--
2.28.0
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executables are run
from elsewhere. However, the sole sched test (cs_prctl_test)
does not require any locally-built libraries to run, so the RPATH
directive can be removed.
Discovered by the /usr/lib/rpm/check-rpaths script[1][2] that checks
for insecure RPATH/RUNPATH[3], such as containing relative directories,
during an attempt to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/sched/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/sched/Makefile b/tools/testing/selftests/sched/Makefile
index 099ee9213557..0e4581ded9d6 100644
--- a/tools/testing/selftests/sched/Makefile
+++ b/tools/testing/selftests/sched/Makefile
@@ -4,8 +4,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
CLANG_FLAGS += -no-integrated-as
endif
-CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -Wl,-rpath=./ \
- $(CLANG_FLAGS)
+CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) $(CLANG_FLAGS)
LDLIBS += -lpthread
TEST_GEN_FILES := cs_prctl_test
--
2.28.0
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executables are run
from elsewhere. Changing it to $ORIGIN makes it resolve relative
to the directory in which the executables reside, which is supposedly
the desired behaviour.
Discovered by the /usr/lib/rpm/check-rpaths script[1][2] that checks
for insecure RPATH/RUNPATH[3], such as containing relative directories,
during an attempt to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/rseq/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb58..27544a67d6f0 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -6,7 +6,7 @@ endif
top_srcdir = ../../../..
-CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=./ \
+CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=\$$ORIGIN/ \
$(CLANG_FLAGS) -I$(top_srcdir)/tools/include
LDLIBS += -lpthread -ldl
--
2.28.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
So many "Address not found" messages occur at the end of forwarding tests
when using "ip address del" command for an invalid address:
TEST: FDB limits interacting with FDB type local [ OK ]
Error: ipv4: Address not found.
... ...
TEST: IGMPv3 S,G port entry automatic add to a *,G port [ OK ]
Error: ipv4: Address not found.
Error: ipv6: address not found.
... ...
TEST: Isolated port flooding [ OK ]
Error: ipv4: Address not found.
Error: ipv6: address not found.
... ...
TEST: Externally learned FDB entry - ageing & roaming [ OK ]
Error: ipv4: Address not found.
Error: ipv6: address not found.
This patch gnores these messages and redirects them to /dev/null in
__addr_add_del().
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
tools/testing/selftests/net/forwarding/lib.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index ff96bb7535ff..8670b6053cde 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -839,7 +839,7 @@ __addr_add_del()
array=("${@}")
for addrstr in "${array[@]}"; do
- ip address $add_del $addrstr dev $if_name
+ ip address $add_del $addrstr dev $if_name &> /dev/null
done
}
--
2.43.0
This small series includes fixes for creation of veth pairs for
networkless kernels & adds tests for turning the different network
interface features on and off in selftests/net/netdevice.sh script.
Changes in v4:
Move veth creation/removal to the main shell script.
Tested using vng on a networkless kernel and the script works, sample
output below the changes.
Changes in v3:
https://lore.kernel.org/all/20240614113240.41550-1-jain.abhinav177@gmail.co…
Add a check for netdev, create veth pair for testing.
Restore feature to its initial state.
Changes in v2:
https://lore.kernel.org/all/20240609132124.51683-1-jain.abhinav177@gmail.co…
Remove tail usage; use read to parse the features from temp file.
v1:
https://lore.kernel.org/all/20240606212714.27472-1-jain.abhinav177@gmail.co…
```
# selftests: net: netdevice.sh
# No valid network device found, creating veth pair
# PASS: veth0: set interface up
# PASS: veth0: set MAC address
# SKIP: veth0: set IP address
# PASS: veth0: ethtool list features
# PASS: veth0: Turned off feature: rx-checksumming
# PASS: veth0: Turned on feature: rx-checksumming
# PASS: veth0: Restore feature rx-checksumming to initial state on
# Actual changes:
# tx-checksum-ip-generic: off
# tx-tcp-segmentation: off [not requested]
....
....
....
# PASS: veth1: Restore feature tx-nocache-copy to initial state off
# PASS: veth1: Turned off feature: tx-vlan-stag-hw-insert
# PASS: veth1: Turned on feature: tx-vlan-stag-hw-insert
# PASS: veth1: Restore feature tx-vlan-stag-hw-insert to initial state on
# PASS: veth1: Turned off feature: rx-vlan-stag-hw-parse
# PASS: veth1: Turned on feature: rx-vlan-stag-hw-parse
# PASS: veth1: Restore feature rx-vlan-stag-hw-parse to initial state on
# PASS: veth1: Turned off feature: rx-gro-list
# PASS: veth1: Turned on feature: rx-gro-list
# PASS: veth1: Restore feature rx-gro-list to initial state off
# PASS: veth1: Turned off feature: rx-udp-gro-forwarding
# PASS: veth1: Turned on feature: rx-udp-gro-forwarding
# PASS: veth1: Restore feature rx-udp-gro-forwarding to initial state off
# Cannot get register dump: Operation not supported
# SKIP: veth1: ethtool dump not supported
# PASS: veth1: ethtool stats
# PASS: veth1: stop interface
# Removed veth pair
ok 12 selftests: net: netdevice.sh
```
Abhinav Jain (2):
selftests: net: Create veth pair for testing in networkless kernel
selftests: net: Add on/off checks for non-fixed features of interface
tools/testing/selftests/net/netdevice.sh | 55 +++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
--
2.34.1
Add a new kselftest to detect and report slowdowns in key boot events. The
test uses ftrace to track timings for specific boot events and compares
these timestamps against reference values provided in YAML format.
The test includes the following files:
- `bootconfig` file: configures ftrace and lists reference key boot
events.
- `config` fragment: enables boot time tracing and attaches the
bootconfig file to the kernel image.
- `kprobe_timestamps_to_yaml.py` script: parses the current trace file to
extract event names and timestamps and writes them to a YAML file. The
script is intended to be run once to generate initial reference values;
the generated file is not meant to be stored in the kernel sources but
should be provided as input to the test itself. YAML format was chosen
to allow easy integration with per-platform data used in other tests,
such as the discoverable devices probe test in
tools/testing/selftests/devices. Another option is to use JSON, as the
file is not intended for manual editing and JSON is already supported
by the Python standard library.
- `test_boot_time.py` script: parses the current trace file and compares
timestamps against the values in the YAML file provided as input.
Reports a failure if any timestamp differs from the reference value by
more than the specified delta.
- `trace_utils.py` file: utility functions to mount debugfs and parse the
trace file to extract relevant information.
The bootconfig file provided is an initial draft with some reference kprobe
events to showcase how the test works. I would appreciate feedback from
those interested in running this test on which boot events should be added.
Different key events might be relevant depending on the platform and its
boot time requirements. This file should serve as a common ground and be
populated with critical events and functions common to different platforms.
Feedback on the overall approach of this test and suggestions for
additional boot events to trace would be greatly appreciated.
Example output with a deliberately small delta of 0.01 to demonstrate failures:
TAP version 13
1..4
ok 1 populate_rootfs_begin
# 'run_init_process_begin' differs by 0.033990 seconds.
not ok 2 run_init_process_begin
# 'run_init_process_end' differs by 0.033796 seconds.
not ok 3 run_init_process_end
ok 4 unpack_to_rootfs_begin
# Totals: pass:2 fail:2 xfail:0 xpass:0 skip:0 error:0
This patch depends on "kselftest: Move ksft helper module to common
directory":
https://lore.kernel.org/all/20240705-dev-err-log-selftest-v2-2-163b9cd7b3c1…
which was picked through the usb tree and is queued for 6.11-rc1.
Best,
Laura
Laura Nao (1):
kselftests: Add test to detect boot event slowdowns
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/boot-time/Makefile | 17 ++++
tools/testing/selftests/boot-time/bootconfig | 8 ++
tools/testing/selftests/boot-time/config | 4 +
.../boot-time/kprobe_timestamps_to_yaml.py | 55 +++++++++++
.../selftests/boot-time/test_boot_time.py | 94 +++++++++++++++++++
.../selftests/boot-time/trace_utils.py | 63 +++++++++++++
7 files changed, 242 insertions(+)
create mode 100644 tools/testing/selftests/boot-time/Makefile
create mode 100644 tools/testing/selftests/boot-time/bootconfig
create mode 100644 tools/testing/selftests/boot-time/config
create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py
create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py
create mode 100644 tools/testing/selftests/boot-time/trace_utils.py
--
2.30.2
There are multiple possible timer sources which could be useful for
the sound stream synchronization: hrtimers, hardware clocks (e.g. PTP),
timer wheels (jiffies). Currently, using one of them to synchronize
the audio stream of snd-aloop module would require writing a
kernel-space driver which exports an ALSA timer through the
snd_timer interface.
However, it is not really convenient for application developers, who may
want to define their custom timer sources for audio synchronization.
For instance, we could have a network application which receives frames
and sends them to snd-aloop pcm device, and another application
listening on the other end of snd-aloop. It makes sense to transfer a
new period of data only when certain amount of frames is received
through the network, but definitely not when a certain amount of jiffies
on a local system elapses. Since all of the devices are purely virtual
it won't introduce any glitches and will help the application developers
to avoid using sample-rate conversion.
This patch series introduces userspace-driven ALSA timers: virtual
timers which are created and controlled from userspace. The timer can
be created from the userspace using the new ioctl SNDRV_TIMER_IOCTL_CREATE.
After creating a timer, it becomes available for use system-wide, so it
can be passed to snd-aloop as a timer source (timer_source parameter
would be "-1.SNDRV_TIMER_GLOBAL_UDRIVEN.{timer_id}"). When the userspace
app decides to trigger a timer, it calls another ioctl
SNDRV_TIMER_IOCTL_TRIGGER on the file descriptor of a timer. It
initiates a transfer of a new period of data.
Userspace-driven timers are associated with file descriptors. If the
application wishes to destroy the timer, it can simply release the file
descriptor of a virtual timer.
I believe introducing new ioctl calls is quite inconvenient (as we have
a limited amount of them), but other possible ways of app <-> kernel
communication (like virtual FS) seem completely inappropriate for this
task (but I'd love to discuss alternative solutions).
This patch series also updates the snd-aloop module so the global timers
can be used as a timer_source for it (it allows using userspace-driven
timers as timer source).
V1 -> V2:
- Fix some problems found by Christophe Jaillet
<christophe.jaillet(a)wanadoo.fr>
V2 -> V3:
- Add improvements suggested by Takashi Iwai <tiwai(a)suse.de>
Please, find the patch-specific changelog in the following patches.
Ivan Orlov (4):
ALSA: aloop: Allow using global timers
Docs/sound: Add documentation for userspace-driven ALSA timers
ALSA: timer: Introduce virtual userspace-driven timers
selftests: ALSA: Cover userspace-driven timers with test
Documentation/sound/index.rst | 1 +
Documentation/sound/utimers.rst | 120 +++++++++++
include/uapi/sound/asound.h | 20 +-
sound/core/Kconfig | 10 +
sound/core/timer.c | 221 ++++++++++++++++++++
sound/drivers/aloop.c | 2 +
tools/testing/selftests/alsa/Makefile | 2 +-
tools/testing/selftests/alsa/global-timer.c | 87 ++++++++
tools/testing/selftests/alsa/utimer-test.c | 170 +++++++++++++++
9 files changed, 631 insertions(+), 2 deletions(-)
create mode 100644 Documentation/sound/utimers.rst
create mode 100644 tools/testing/selftests/alsa/global-timer.c
create mode 100644 tools/testing/selftests/alsa/utimer-test.c
--
2.34.1
Hi Kees and All,
There are several tests in kselftest subsystem which load modules to tests
the internals of the kernel. Most of these test modules are just loaded by
the kselftest, their status isn't read and reported to the user logs. Hence
they don't provide benefit of executing those tests.
I've found patches from Kees where he has been converting such kselftests
to kunit tests [1]. The probable motivation is to move tests output of
kselftest subsystem which only triggers tests without correctly reporting
the results. On the other hand, kunit is there to test the kernel's
internal functions which can't be done by userspace.
Kselftest: Test user facing APIs from userspace
Kunit: Test kernel's internal functions from kernelspace
This brings me to conclusion that kselftest which are loading modules to
test kernelspace should be converted to kunit tests. I've noted several
such kselftests.
This is just my understanding. Please mention if I'm correct above or more
reasons to support kselftest test modules transformation into kunit test.
[1] https://lore.kernel.org/all/20221018082824.never.845-kees@kernel.org/
--
BR,
Muhammad Usama Anjum
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). The user must provide a shadow stack
address and size, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with a shadow stack token at the top of the
stack.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
Please further note that the token consumption done by clone3() is not
currently implemented in an atomic fashion, Rick indicated that he would
look into fixing this if people are OK with the implementation.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (9):
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Explicitly handle child exits due to signals
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 41 ++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 104 +++++++---
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 13 ++
include/uapi/linux/sched.h | 13 +-
kernel/fork.c | 76 ++++++--
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 224 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 40 +++-
tools/testing/selftests/ksft_shstk.h | 63 ++++++
15 files changed, 511 insertions(+), 88 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A regression happened where running the ownership test passes on the first
iteration but fails running it a second time. This was caught and fixed,
but a later change brought it back. The regression was missed because the
automated tests only run the tests once per boot.
Change the ownership test to iterate through the tests twice, as this will
catch the regression with a single run.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../ftrace/test.d/00basic/test_ownership.tc | 34 +++++++++++--------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
index c45094d1e1d2..71e43a92352a 100644
--- a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -83,32 +83,38 @@ run_tests() {
done
}
-mount -o remount,"$new_options" .
+# Run the tests twice as leftovers can cause issues
+for loop in 1 2 ; do
-run_tests
+ echo "Running iteration $loop"
-mount -o remount,"$mount_options" .
+ mount -o remount,"$new_options" .
-for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
- test "$d" $original_group
-done
+ run_tests
+
+ mount -o remount,"$mount_options" .
+
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+ done
# check instances as well
-chgrp $other_group instances
+ chgrp $other_group instances
-instance="$(mktemp -u test-XXXXXX)"
+ instance="$(mktemp -u test-XXXXXX)"
-mkdir instances/$instance
+ mkdir instances/$instance
-cd instances/$instance
+ cd instances/$instance
-run_tests
+ run_tests
-cd ../..
+ cd ../..
-rmdir instances/$instance
+ rmdir instances/$instance
-chgrp $original_group instances
+ chgrp $original_group instances
+done
exit 0
--
2.43.0
This small series fixes is_madv_discard() and adds a small sanity check
test to selftests/mm/mseal_test. Without this patch, is_madv_discard()
erroneously thinks innocent ops like MADV_RANDOM are discard operations
(which they are not, and are supposed to be allowed, per the overall
design).
Based on Linus's tree and taken from my mseal depessimization series[1].
[1]: https://lore.kernel.org/all/20240806212808.1885309-1-pedro.falcato@gmail.co…
Pedro Falcato (2):
mseal: Fix is_madv_discard()
selftests/mm: Add mseal test for no-discard madvise
mm/mseal.c | 14 +++++++---
tools/testing/selftests/mm/mseal_test.c | 34 +++++++++++++++++++++++++
2 files changed, 45 insertions(+), 3 deletions(-)
--
2.46.0
The dma-iommu needs to find the correct domain for MSI mapping. With an
IOMMU_DOMAIN_NESTED, the mapping resides in its parent paging domain.
Add a get_msi_mapping_domain op for drivers to return paging domains.
Add an iommufd selftest coverage for that, by doing a loopback test.
Add arm_smmu_get_msi_mapping_domain in the SMMUv3 driver so its nesting
feature could work with MSI correctly.
This is based on top of the reserved-IOVA change:
https://lore.kernel.org/all/20240802053458.2754673-1-nicolinc@nvidia.com/
And Jason's SMMUv3 nesting series:
https://lore.kernel.org/all/0-v1-54e734311a7f+14f72-smmuv3_nesting_jgg@nvid…
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_nesting_sw_msi/
[changelog]
v3:
* Refined PATCH-2 commit message
* Added domain->ops check in PATCH-2
* Added PATCH-4 to implement in SMMUv3 driver
v2:
https://lore.kernel.org/all/cover.1722644866.git.nicolinc@nvidia.com/
* Resent with a proper bug fix.
Thanks
Nicolin
Nicolin Chen (3):
iommufd: Reorder include files
iommufd/selftest: Add coverage for IOMMU_RESV_SW_MSI
iommu/arm-smmu-v3: Implement arm_smmu_get_msi_mapping_domain
Robin Murphy (1):
iommu/dma: Support MSIs through nested domains
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++
drivers/iommu/dma-iommu.c | 18 +++-
drivers/iommu/iommufd/device.c | 4 +-
drivers/iommu/iommufd/fault.c | 4 +-
drivers/iommu/iommufd/io_pagetable.c | 8 +-
drivers/iommu/iommufd/io_pagetable.h | 2 +-
drivers/iommu/iommufd/ioas.c | 2 +-
drivers/iommu/iommufd/iommufd_private.h | 9 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +-
drivers/iommu/iommufd/iova_bitmap.c | 2 +-
drivers/iommu/iommufd/main.c | 8 +-
drivers/iommu/iommufd/pages.c | 10 +-
drivers/iommu/iommufd/selftest.c | 92 ++++++++++++++++++-
include/linux/iommu.h | 4 +
include/linux/iommufd.h | 4 +-
include/uapi/linux/iommufd.h | 2 +-
tools/testing/selftests/iommu/iommufd_utils.h | 9 ++
17 files changed, 160 insertions(+), 34 deletions(-)
--
2.43.0
This patch series adds a selftest suite to validate the s390x
architecture specific ucontrol KVM interface.
When creating a VM on s390x it is possible to create it as userspace
controlled VM or in short ucontrol VM.
These VMs delegates the management of the VM to userspace instead
of handling most events within the kernel. Consequently the userspace
has to manage interrupts, memory allocation etc.
Before this patch set this functionality lacks any public test cases.
It is desirable to add test cases for this interface to be able to
reduce the risk of breaking changes in the future.
In order to provision a ucontrol VM the kernel needs to be compiled with
the CONFIG_KVM_S390_UCONTROL enabled. The users with sys_admin capability
can then create a new ucontrol VM providing the KVM_VM_S390_UCONTROL
parameter to the KVM_CREATE_VM ioctl.
The kernels existing selftest helper functions can only be partially be
reused for these tests.
The test cases cover existing special handling of ucontrol VMs within the
implementation and basic VM creation and handling cases:
* Reject setting HPAGE when VM is ucontrol
* Assert KVM_GET_DIRTY_LOG is rejected
* Assert KVM_S390_VM_MEM_LIMIT_SIZE is rejected
* Assert state of initial SIE flags setup by the kernel
* Run simple program in VM with and without DAT
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
The patch set does also contain some code cleanup / consolidation of
architecture specific defines that are now used in multiple test cases.
---
v4:
- PATCH 5: Remove not yet used include for debug print functions
- PATCH 6: Add include for debug print functions (removed from patch 5)
Remove no longer needed code since stopped but is reset
before starting since v3 (thanks Janosch)
Adjust test output to use leading zeros instead of spaces in sieic
- PATCH 7: Rename constant to PGM_SEGMENT_TRANSLATION (thanks Janosch)
Put comments on their own lines
v3:
- Remove stopped bit before starting the VM (no initial stop in multiple
test cases) (thanks Janosch)
- PATCH 2: Clarified SIE control block vs SIE instruction (thanks
Janosch)
- PATCH 3: Make use of CAP_TO_MASK(CAP_SYS_ADMIN) instead of custom
define (thanks Janosch)
Removed Reviewed-By: Claudio
- PATCH 4: Remove erroneous 1MB offset from self->base_hva (thanks
Janosch)
- PATCH 6-8: Change name of test program _pgm to _asm to prevent confusion
- PATCH 10: Move KVM_S390_UCONTROL default option to actual debug config
(thanks Christian)
v2:
- add ucontrol to s390 debug config (new patch)
- PATCH 2: changed atomic_t to __u32 (thanks Claudio)
- PATCH 4: reformatted comment in FIXTURE_SETUP(uc_kvm)
- PATCH 5: refactored to display 8 byte blocks + more internal reuse
(thanks Claudio)
- PATCH 7: make use of more declarative defines instead of magic values
- PATCH 8: make use of more declarative defines instead of magic values
(thanks Claudio)
- PATCH 9: add reference to fix verified by the test case
Christoph Schlameuss (10):
selftests: kvm: s390: Define page sizes in shared header
selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace
tests
selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
selftests: kvm: s390: Add test fixture and simple VM setup tests
selftests: kvm: s390: Add debug print functions
selftests: kvm: s390: Add VM run test case
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
arch/s390/configs/debug_defconfig | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/s390x/debug_print.h | 69 ++
.../selftests/kvm/include/s390x/processor.h | 5 +
.../testing/selftests/kvm/include/s390x/sie.h | 240 +++++++
.../selftests/kvm/lib/s390x/processor.c | 10 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +-
tools/testing/selftests/kvm/s390x/config | 2 +
.../testing/selftests/kvm/s390x/debug_test.c | 4 +-
tools/testing/selftests/kvm/s390x/memop.c | 4 +-
tools/testing/selftests/kvm/s390x/tprot.c | 5 +-
.../selftests/kvm/s390x/ucontrol_test.c | 596 ++++++++++++++++++
13 files changed, 929 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/s390x/debug_print.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/sie.h
create mode 100644 tools/testing/selftests/kvm/s390x/config
create mode 100644 tools/testing/selftests/kvm/s390x/ucontrol_test.c
base-commit: c0ecd6388360d930440cc5554026818895199923
--
2.45.2
First 4 patches are more-or-less cleanups/preparations.
Patch 5 was sent to me/contributed off-list by Mohammad, who wants 32-bit
kernels to run TCP-AO.
Patch 6 is a workaround/fix for slow VMs. Albeit, I can't reproduce
the issue, but I hope it will fix netdev flakes for connect-deny-*
tests.
And the biggest change is adding TCP-AO tracepoints to selftests.
I think it's a good addition by the following reasons:
- The related tracepoints are now tested;
- It allows tcp-ao selftests to raise expectations on the kernel
behavior - up from the syscalls exit statuses + net counters.
- Provides tracepoints usage samples.
As tracepoints are not a stable ABI, any kernel changes done to them
will be reflected to the selftests, which also will allow users
to see how to change their code. It's quite better than parsing dmesg
(what BGP was doing pre-tracepoints, ugh).
Somewhat arguably, the code parses trace_pipe, rather than uses
libtraceevent (which any sane user should do). The reason behind that is
the same as for rt-netlink macros instead of libmnl: I'm trying
to minimize the library dependencies of the selftests. And the
performance of formatting text in kernel and parsing it again in a test
is not critical.
Current output sample:
> ok 73 Trace events matched expectations: 13 tcp_hash_md5_required[2] tcp_hash_md5_unexpected[4] tcp_hash_ao_required[3] tcp_ao_key_not_found[4]
Previously, tracepoints selftests were part of kernel tcp tracepoints
submission [1], but since then the code was quite changed:
- Now generic tracing setup is in lib/ftrace.c, separate from
lib/ftrace-tcp.c which utilizes TCP trace points. This separation
allows future selftests to trace non-TCP events, i.e. to find out
an skb's drop reason, which was useful in the creation of TCP-CLOSE
stress-test (not in this patch set, but used in attempt to reproduce
the issue from [2]).
- Another change is that in the previous submission the trace events
where used only to detect unexpected TCP-AO/TCP-MD5 events. In this
version the selftests will fail if an expected trace event didn't
appear.
Let's see how reliable this is on the netdev bot - it obviously passes
on my testing, but potentially may require a temporary XFAIL patch
if it misbehaves on a slow VM.
[1] https://lore.kernel.org/lkml/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3…
Signed-off-by: Dmitry Safonov <0x7f454c46(a)gmail.com>
---
Changes in v2:
- Fixed two issues with parsing TCP-AO events: the socket state and TCP
segment flags. Hopefully, won't fail on netdev.
- Reword patch 1 & 2 messages to be more informative and at some degree
formal (Paolo)
- Since commit e33a02ed6a4f ("selftests: Add printf attribute to
kselftest prints") it's possible to use __printf instead of "raw" gcc
attribute - switch using that, as checkpatch suggests.
- Link to v1: https://lore.kernel.org/r/20240730-tcp-ao-selftests-upd-6-12-v1-0-ffd4bf15d…
---
Dmitry Safonov (6):
selftests/net: Clean-up double assignment
selftests/net: Provide test_snprintf() helper
selftests/net: Be consistent in kconfig checks
selftests/net: Don't forget to close nsfd after switch_save_ns()
selftests/net: Synchronize client/server before counters checks
selftests/net: Add trace events matching to tcp_ao
Mohammad Nassiri (1):
selftests/tcp_ao: Fix printing format for uint64_t
tools/testing/selftests/net/tcp_ao/Makefile | 3 +-
tools/testing/selftests/net/tcp_ao/bench-lookups.c | 2 +-
tools/testing/selftests/net/tcp_ao/config | 1 +
tools/testing/selftests/net/tcp_ao/connect-deny.c | 25 +-
tools/testing/selftests/net/tcp_ao/connect.c | 6 +-
tools/testing/selftests/net/tcp_ao/icmps-discard.c | 2 +-
.../testing/selftests/net/tcp_ao/key-management.c | 18 +-
tools/testing/selftests/net/tcp_ao/lib/aolib.h | 176 ++++++-
.../testing/selftests/net/tcp_ao/lib/ftrace-tcp.c | 549 +++++++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/ftrace.c | 466 +++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/kconfig.c | 31 +-
tools/testing/selftests/net/tcp_ao/lib/setup.c | 15 +-
tools/testing/selftests/net/tcp_ao/lib/sock.c | 1 -
tools/testing/selftests/net/tcp_ao/lib/utils.c | 26 +
tools/testing/selftests/net/tcp_ao/restore.c | 30 +-
tools/testing/selftests/net/tcp_ao/rst.c | 2 +-
tools/testing/selftests/net/tcp_ao/self-connect.c | 19 +-
tools/testing/selftests/net/tcp_ao/seq-ext.c | 28 +-
.../selftests/net/tcp_ao/setsockopt-closed.c | 6 +-
tools/testing/selftests/net/tcp_ao/unsigned-md5.c | 35 +-
20 files changed, 1375 insertions(+), 66 deletions(-)
---
base-commit: 3361a6eae59664ffae640ff7a838f5bd89c24461
change-id: 20240730-tcp-ao-selftests-upd-6-12-4d3e53a74f3f
Best regards,
--
Dmitry Safonov <0x7f454c46(a)gmail.com>
This revision only updates the tests from the previous revision[1], and
integrates an Acked-by[2] and a Reviewed-By[3] into the first commit
message.
Documentation/admin-guide/cgroup-v2.rst | 22 ++-
include/linux/cgroup-defs.h | 5 +
include/linux/cgroup.h | 3 +
include/linux/memcontrol.h | 5 +
include/linux/page_counter.h | 11 +-
kernel/cgroup/cgroup-internal.h | 2 +
kernel/cgroup/cgroup.c | 7 +
mm/memcontrol.c | 116 +++++++++++++--
mm/page_counter.c | 30 +++-
tools/testing/selftests/cgroup/cgroup_util.c | 22 +++
tools/testing/selftests/cgroup/cgroup_util.h | 2 +
tools/testing/selftests/cgroup/test_memcontrol.c | 264 ++++++++++++++++++++++++++++++++-
12 files changed, 454 insertions(+), 35 deletions(-)
[1]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/
[2]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/#…
[3]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/#…
Thank you all for the support and reviews so far!
David Finkel
Senior Principal Software Engineer
Vimeo Inc.
Hello,
this series brings a new set of test converted to the test_progs framework.
Since the tests are quite small, I chose to group three tests conversion in
the same series, but feel free to let me know if I should keep one series
per test. The series focuses on cgroup testing and converts the following
tests:
- get_cgroup_id_user
- cgroup_storage
- test_skb_cgroup_id_user
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (4):
selftests/bpf: convert get_current_cgroup_id_user to test_progs
selftests/bpf: convert test_cgroup_storage to test_progs
selftests/bpf: add proper section name to bpf prog and rename it
selftests/bpf: convert test_skb_cgroup_id_user to test_progs
tools/testing/selftests/bpf/.gitignore | 3 -
tools/testing/selftests/bpf/Makefile | 8 +-
tools/testing/selftests/bpf/get_cgroup_id_user.c | 151 -----------------
.../selftests/bpf/prog_tests/cgroup_ancestor.c | 159 ++++++++++++++++++
.../bpf/prog_tests/cgroup_get_current_cgroup_id.c | 58 +++++++
.../selftests/bpf/prog_tests/cgroup_storage.c | 65 ++++++++
...test_skb_cgroup_id_kern.c => cgroup_ancestor.c} | 2 +-
tools/testing/selftests/bpf/progs/cgroup_storage.c | 24 +++
tools/testing/selftests/bpf/test_cgroup_storage.c | 174 --------------------
tools/testing/selftests/bpf/test_skb_cgroup_id.sh | 63 -------
.../selftests/bpf/test_skb_cgroup_id_user.c | 183 ---------------------
11 files changed, 309 insertions(+), 581 deletions(-)
---
base-commit: 0e2eaf4b33f65e904b69bae6b956f3f610dbba9a
change-id: 20240725-convert_cgroup_tests-d07c66053225
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
The system register definitions in the arm64 get-reg-list are all done
with directly specified magic numbers rather than using the definitions
we import from the main kernel. This is error prone, and requires us to
audit the additions to get-reg-list separately to what we do when
specifying the registers for the main kernel. Since Marc has indicated
that this isn't a deliberate or desired choice let's start using the
constants we have defined.
We first manually update the data used to filter registers based on ID
register fields to use a simplified macro that specifies the register
and ID field in a muc more compact fashion. This is done first since
there is an error in the ID register field for the S1PIE registers. We
then replace all the remaining named system register specifications with
use of the existing KVM_ARM64_SYS_REG() macro.
This is just a first step, there's a bunch more work we could be doing
here, the main thing being making use of the encodings in
arch/arm64/tools/sysreg to convert more of the registers (including
updating as more registers are converted to use the generator).
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Add use of designated initalisers when converting filtering macros.
- Manual handling of CNTV_CTL_EL0 and CNTV_CVAL_EL0.
- Commit message tweaks.
- Link to v1: https://lore.kernel.org/r/20240802-kvm-arm64-get-reg-list-v1-0-3a5bf8f80765…
---
Mark Brown (3):
KVM: selftests: arm64: Simplify specification of filtered registers
KVM: selftests: arm64: Use symbolic definitions for incorrect encodings
KVM: selftests: arm64: Use generated defines for named system registers
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 244 ++++++++++-----------
1 file changed, 122 insertions(+), 122 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20240802-kvm-arm64-get-reg-list-a86a37460bdd
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The system register definitions in the arm64 get-reg-list are all done
with directly specified magic numbers rather than using the definitions
we import from the main kernel. This is error prone, and requires us to
audit the additions to get-reg-list separately to what we do when
specifying the registers for the main kernel. Since Marc has indicated
that this isn't a deliberate or desired choice let's start using the
constants we have defined.
We first manually update the data used to filter registers based on ID
register fields to use a simplified macro that specifies the register
and ID field in a muc more compact fashion. This is done first since
there is an error in the ID register field for the S1PIE registers. We
then replace all the remaining named system register specifications with
use of the existing KVM_ARM64_SYS_REG() macro.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
KVM: selftests: arm64: Simplify specification of filtered registers
KVM: selftests: arm64: Use generated defines for named system registers
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 237 ++++++++++-----------
1 file changed, 115 insertions(+), 122 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20240802-kvm-arm64-get-reg-list-a86a37460bdd
Best regards,
--
Mark Brown <broonie(a)kernel.org>
In commit 7d3c33b290b1 ("kunit: Device wrappers should also manage driver name"),
the kunit_kstrdup_const() and kunit_kfree_const() were introduced as an
optimisation of kunit_kstrdup(), which only copy/free strings from the
kernel rodata.
However, these are inline functions, and is_kernel_rodata() only works
for built-in code. This causes problems in two cases:
- If kunit is built as a module, __{start,end}_rodata is not defined.
- If a kunit test using these functions is built as a module, it will
suffer the same fate.
Restrict the is_kernel_rodata() case to when KUnit is built as a module,
which fixes the first case, at the cost of losing the optimisation.
Also, make kunit_{kstrdup,kfree}_const non-inline, so that other modules
using them will not accidentally depend on is_kernel_rodata(). If KUnit
is built-in, they'll benefit from the optimisation, if KUnit is not,
they won't, but the string will be properly duplicated.
(And fix a couple of typos in the doc comment, too.)
Reported-by: Nico Pache <npache(a)redhat.com>
Closes: https://lore.kernel.org/all/CAA1CXcDKht4vOL-acxrARbm6JhGna8_k8wjYJ-vHONink8…
Fixes: 7d3c33b290b1 ("kunit: Device wrappers should also manage driver name")
Signed-off-by: David Gow <davidgow(a)google.com>
---
include/kunit/test.h | 16 +++-------------
lib/kunit/test.c | 19 +++++++++++++++++++
2 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index da9e84de14c0..5ac237c949a0 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -489,11 +489,7 @@ static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp
* Calls kunit_kfree() only if @x is not in .rodata section.
* See kunit_kstrdup_const() for more information.
*/
-static inline void kunit_kfree_const(struct kunit *test, const void *x)
-{
- if (!is_kernel_rodata((unsigned long)x))
- kunit_kfree(test, x);
-}
+void kunit_kfree_const(struct kunit *test, const void *x);
/**
* kunit_kstrdup() - Duplicates a string into a test managed allocation.
@@ -527,16 +523,10 @@ static inline char *kunit_kstrdup(struct kunit *test, const char *str, gfp_t gfp
* @gfp: flags passed to underlying kmalloc().
*
* Calls kunit_kstrdup() only if @str is not in the rodata section. Must be freed with
- * kunit_free_const() -- not kunit_free().
+ * kunit_kfree_const() -- not kunit_kfree().
* See kstrdup_const() and kunit_kmalloc_array() for more information.
*/
-static inline const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
-{
- if (is_kernel_rodata((unsigned long)str))
- return str;
-
- return kunit_kstrdup(test, str, gfp);
-}
+const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp);
/**
* kunit_vm_mmap() - Allocate KUnit-tracked vm_mmap() area
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e8b1b52a19ab..089c832e3cdb 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -874,6 +874,25 @@ void kunit_kfree(struct kunit *test, const void *ptr)
}
EXPORT_SYMBOL_GPL(kunit_kfree);
+void kunit_kfree_const(struct kunit *test, const void *x)
+{
+#if !IS_MODULE(CONFIG_KUNIT)
+ if (!is_kernel_rodata((unsigned long)x))
+#endif
+ kunit_kfree(test, x);
+}
+EXPORT_SYMBOL_GPL(kunit_kfree_const);
+
+const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
+{
+#if !IS_MODULE(CONFIG_KUNIT)
+ if (is_kernel_rodata((unsigned long)str))
+ return str;
+#endif
+ return kunit_kstrdup(test, str, gfp);
+}
+EXPORT_SYMBOL_GPL(kunit_kstrdup_const);
+
void kunit_cleanup(struct kunit *test)
{
struct kunit_resource *res;
--
2.46.0.rc2.264.g509ed76dc8-goog
kunit_driver_create() accepts a name for the driver, but does not copy
it, so if that name is either on the stack, or otherwise freed, we end
up with a use-after-free when the driver is cleaned up.
Instead, strdup() the name, and manage it as another KUnit allocation.
As there was no existing kunit_kstrdup(), we add one. Further, add a
kunit_ variant of strdup_const() and kfree_const(), so we don't need to
allocate and manage the string in the majority of cases where it's a
constant.
This fixes a KASAN splat with overflow.overflow_allocation_test, when
built as a module.
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reported-by: Nico Pache <npache(a)redhat.com>
Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
---
There's some more serious changes since the RFC I sent, so please take a
closer look.
Thanks,
-- David
Changes since RFC:
https://groups.google.com/g/kunit-dev/c/81V9b9QYON0/m/PFKNKDKAAAAJ
- Add and use the kunit_kstrdup_const() and kunit_free_const()
functions.
- Fix a typo in the doc comments.
---
include/kunit/test.h | 58 ++++++++++++++++++++++++++++++++++++++++++++
lib/kunit/device.c | 7 ++++--
2 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index e2a1f0928e8b..da9e84de14c0 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -28,6 +28,7 @@
#include <linux/types.h>
#include <asm/rwonce.h>
+#include <asm/sections.h>
/* Static key: true if any KUnit tests are currently running */
DECLARE_STATIC_KEY_FALSE(kunit_running);
@@ -480,6 +481,63 @@ static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp
return kunit_kmalloc_array(test, n, size, gfp | __GFP_ZERO);
}
+
+/**
+ * kunit_kfree_const() - conditionally free test managed memory
+ * @x: pointer to the memory
+ *
+ * Calls kunit_kfree() only if @x is not in .rodata section.
+ * See kunit_kstrdup_const() for more information.
+ */
+static inline void kunit_kfree_const(struct kunit *test, const void *x)
+{
+ if (!is_kernel_rodata((unsigned long)x))
+ kunit_kfree(test, x);
+}
+
+/**
+ * kunit_kstrdup() - Duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kstrdup() and kunit_kmalloc_array() for more information.
+ */
+static inline char *kunit_kstrdup(struct kunit *test, const char *str, gfp_t gfp)
+{
+ size_t len;
+ char *buf;
+
+ if (!str)
+ return NULL;
+
+ len = strlen(str) + 1;
+ buf = kunit_kmalloc(test, len, gfp);
+ if (buf)
+ memcpy(buf, str, len);
+ return buf;
+}
+
+/**
+ * kunit_kstrdup_const() - Conditionally duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * Calls kunit_kstrdup() only if @str is not in the rodata section. Must be freed with
+ * kunit_free_const() -- not kunit_free().
+ * See kstrdup_const() and kunit_kmalloc_array() for more information.
+ */
+static inline const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
+{
+ if (is_kernel_rodata((unsigned long)str))
+ return str;
+
+ return kunit_kstrdup(test, str, gfp);
+}
+
/**
* kunit_vm_mmap() - Allocate KUnit-tracked vm_mmap() area
* @test: The test context object.
diff --git a/lib/kunit/device.c b/lib/kunit/device.c
index 25c81ed465fb..520c1fccee8a 100644
--- a/lib/kunit/device.c
+++ b/lib/kunit/device.c
@@ -89,7 +89,7 @@ struct device_driver *kunit_driver_create(struct kunit *test, const char *name)
if (!driver)
return ERR_PTR(err);
- driver->name = name;
+ driver->name = kunit_kstrdup_const(test, name, GFP_KERNEL);
driver->bus = &kunit_bus_type;
driver->owner = THIS_MODULE;
@@ -192,8 +192,11 @@ void kunit_device_unregister(struct kunit *test, struct device *dev)
const struct device_driver *driver = to_kunit_device(dev)->driver;
kunit_release_action(test, device_unregister_wrapper, dev);
- if (driver)
+ if (driver) {
+ const char *driver_name = driver->name;
kunit_release_action(test, driver_unregister_wrapper, (void *)driver);
+ kunit_kfree_const(test, driver_name);
+ }
}
EXPORT_SYMBOL_GPL(kunit_device_unregister);
--
2.46.0.rc1.232.g9752f9e123-goog
In arm64 pKVM and QuIC's Gunyah protected VM model, we want to support
grabbing shmem user pages instead of using KVM's guestmemfd. These
hypervisors provide a different isolation model than the CoCo
implementations from x86. KVM's guest_memfd is focused on providing
memory that is more isolated than AVF requires. Some specific examples
include ability to pre-load data onto guest-private pages, dynamically
sharing/isolating guest pages without copy, and (future) migrating
guest-private pages. In sum of those differences after a discussion in
[1] and at PUCK, we want to try to stick with existing shmem and extend
GUP to support the isolation needs for arm64 pKVM and Gunyah. To that
end, we introduce the concept of "exclusive GUP pinning", which enforces
that only one pin of any kind is allowed when using the FOLL_EXCLUSIVE
flag is set. This behavior doesn't affect FOLL_GET or any other folio
refcount operations that don't go through the FOLL_PIN path.
[1]: https://lore.kernel.org/all/20240319143119.GA2736@willie-the-truck/
Tree with patches at:
https://git.codelinaro.org/clo/linux-kernel/gunyah-linux/-/tree/sent/exclus…
anup(a)brainfault.org, paul.walmsley(a)sifive.com,
palmer(a)dabbelt.com, aou(a)eecs.berkeley.edu, seanjc(a)google.com,
viro(a)zeniv.linux.org.uk, brauner(a)kernel.org,
willy(a)infradead.org, akpm(a)linux-foundation.org,
xiaoyao.li(a)intel.com, yilun.xu(a)intel.com,
chao.p.peng(a)linux.intel.com, jarkko(a)kernel.org,
amoorthy(a)google.com, dmatlack(a)google.com,
yu.c.zhang(a)linux.intel.com, isaku.yamahata(a)intel.com,
mic(a)digikod.net, vbabka(a)suse.cz, vannapurve(a)google.com,
ackerleytng(a)google.com, mail(a)maciej.szmigiero.name,
david(a)redhat.com, michael.roth(a)amd.com, wei.w.wang(a)intel.com,
liam.merwick(a)oracle.com, isaku.yamahata(a)gmail.com,
kirill.shutemov(a)linux.intel.com, suzuki.poulose(a)arm.com,
steven.price(a)arm.com, quic_eberman(a)quicinc.com,
quic_mnalajal(a)quicinc.com, quic_tsoni(a)quicinc.com,
quic_svaddagi(a)quicinc.com, quic_cvanscha(a)quicinc.com,
quic_pderrin(a)quicinc.com, quic_pheragu(a)quicinc.com,
catalin.marinas(a)arm.com, james.morse(a)arm.com,
yuzenghui(a)huawei.com, oliver.upton(a)linux.dev, maz(a)kernel.org,
will(a)kernel.org, qperret(a)google.com, keirf(a)google.com,
tabba(a)google.com
Signed-off-by: Elliot Berman <quic_eberman(a)quicinc.com>
---
Elliot Berman (2):
mm/gup-test: Verify exclusive pinned
mm/gup_test: Verify GUP grabs same pages twice
Fuad Tabba (3):
mm/gup: Move GUP_PIN_COUNTING_BIAS to page_ref.h
mm/gup: Add an option for obtaining an exclusive pin
mm/gup: Add support for re-pinning a normal pinned page as exclusive
include/linux/mm.h | 57 ++++----
include/linux/mm_types.h | 2 +
include/linux/page_ref.h | 74 ++++++++++
mm/Kconfig | 5 +
mm/gup.c | 265 ++++++++++++++++++++++++++++++----
mm/gup_test.c | 108 ++++++++++++++
mm/gup_test.h | 1 +
tools/testing/selftests/mm/gup_test.c | 5 +-
8 files changed, 457 insertions(+), 60 deletions(-)
---
base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f
change-id: 20240509-exclusive-gup-66259138bbff
Best regards,
--
Elliot Berman <quic_eberman(a)quicinc.com>
Hi Linus,
Please pull the kselftest fixes update for Linux 6.11-rc3.
This kselftest fixes update consists of a single fix to the conditional
in ksft.py script which incorrectly flags a test suite failed when there
are skipped tests in the mix. The logic is fixed to take skipped tests
into account and report the test as passed.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 8400291e289ee6b2bf9779ff1c83a291501f017b:
Linux 6.11-rc1 (2024-07-28 14:19:55 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.11-rc3
for you to fetch changes up to 170c966cbe274e664288cfc12ee919d5e706dc50:
selftests: ksft: Fix finished() helper exit code on skipped tests (2024-07-31 11:38:56 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.11-rc3
This kselftest fixes update consists of a single fix to the conditional
in ksft.py script which incorrectly flags a test suite failed when there
are skipped tests in the mix. The logic is fixed to take skipped tests
into account and report the test as passed.
----------------------------------------------------------------
Laura Nao (1):
selftests: ksft: Fix finished() helper exit code on skipped tests
tools/testing/selftests/kselftest/ksft.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------
The first 2 patches in this series fix cpuset bugs found by Chen Ridong.
Patch 3 streamlines the sched domain rebuild process for hotplug
operation and eliminates the use of intermediate cpuset states for
sched domain generation. Patch 4 modifies generate_sched_domains()
to check the correctness of partition roots with non-overlapping CPUs.
Patch 5 adds new test cases to cover the bugs fixed in patches 1 and 2.
Chen Ridong (1):
cgroup/cpuset: fix panic caused by partcmd_update
Waiman Long (4):
cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if
cpus.exclusive not set
cgroup/cpuset: Eliminate unncessary sched domains rebuilds in hotplug
cgroup/cpuset: Check for partition roots with overlapping CPUs
selftest/cgroup: Add new test cases to test_cpuset_prs.sh
kernel/cgroup/cpuset.c | 70 ++++++++++---------
.../selftests/cgroup/test_cpuset_prs.sh | 12 +++-
2 files changed, 49 insertions(+), 33 deletions(-)
--
2.43.5
The current support for LLVM and clang in nolibc and its testsuite is
very limited.
* Various architectures plain do not compile
* The user *has* to specify "-Os" otherwise the program crashes
* Cross-compilation of the tests does not work
* Using clang is not wired up in run-tests.sh
This series extends this support.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (12):
tools/nolibc: use clang-compatible asm syntax in arch-arm.h
tools/nolibc: limit powerpc stack-protector workaround to GCC
tools/nolibc: move entrypoint specifics to compiler.h
tools/nolibc: use attribute((naked)) if available
selftests/nolibc: report failure if no testcase passed
selftests/nolibc: avoid passing NULL to printf("%s")
selftests/nolibc: determine $(srctree) first
selftests/nolibc: setup objtree without Makefile.include
selftests/nolibc: add support for LLVM= parameter
selftests/nolibc: add cc-option compatible with clang cross builds
selftests/nolibc: run-tests.sh: avoid overwriting CFLAGS_EXTRA
selftests/nolibc: run-tests.sh: allow building through LLVM
tools/include/nolibc/arch-aarch64.h | 4 ++--
tools/include/nolibc/arch-arm.h | 8 ++++----
tools/include/nolibc/arch-i386.h | 4 ++--
tools/include/nolibc/arch-loongarch.h | 4 ++--
tools/include/nolibc/arch-mips.h | 4 ++--
tools/include/nolibc/arch-powerpc.h | 6 +++---
tools/include/nolibc/arch-riscv.h | 4 ++--
tools/include/nolibc/arch-s390.h | 4 ++--
tools/include/nolibc/arch-x86_64.h | 4 ++--
tools/include/nolibc/compiler.h | 12 ++++++++++++
tools/testing/selftests/nolibc/Makefile | 27 ++++++++++++++++-----------
tools/testing/selftests/nolibc/nolibc-test.c | 4 ++--
tools/testing/selftests/nolibc/run-tests.sh | 20 ++++++++++++++++----
13 files changed, 67 insertions(+), 38 deletions(-)
---
base-commit: 0db287736bc586fcd5a2925518ef09eec6924803
change-id: 20240727-nolibc-llvm-3fad68590d4c
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Don't print that 88 sub-tests are going to be executed, but then skip.
This is against TAP compliance. Instead check pre-requisites first
before printing total number of tests.
Old non-tap compliant output:
TAP version 13
1..88
ok 2 # SKIP all tests require euid == 0
# Planned tests != run tests (88 != 1)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
New and correct output:
TAP version 13
1..0 # SKIP all tests require euid == 0
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Changes since v1:
- Remove simplifying if condition lines
- Update the patch message
---
tools/testing/selftests/openat2/resolve_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/openat2/resolve_test.c b/tools/testing/selftests/openat2/resolve_test.c
index bbafad440893c..85a4c64ee950d 100644
--- a/tools/testing/selftests/openat2/resolve_test.c
+++ b/tools/testing/selftests/openat2/resolve_test.c
@@ -508,12 +508,13 @@ void test_openat2_opath_tests(void)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(NUM_TESTS);
/* NOTE: We should be checking for CAP_SYS_ADMIN here... */
if (geteuid() != 0)
ksft_exit_skip("all tests require euid == 0\n");
+ ksft_set_plan(NUM_TESTS);
+
test_openat2_opath_tests();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
--
2.39.2
A small bugfix for "run-user XARCH=ppc64le" and run-user support for
run-tests.sh.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: introduce QEMU_ARCH_USER
selftests/nolibc: run-tests.sh: enable testing via qemu-user
tools/testing/selftests/nolibc/Makefile | 5 ++++-
tools/testing/selftests/nolibc/run-tests.sh | 22 +++++++++++++++++++---
2 files changed, 23 insertions(+), 4 deletions(-)
---
base-commit: ba335752620565c25c3028fff9496bb8ef373602
change-id: 20770915-nolibc-run-user-845375a3ec4f
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Extend pmu_counters_test to AMD CPUs.
As the AMD PMU is quite different from Intel with different events and
feature sets, this series introduces a new code path to test it,
specifically focusing on the core counters including the
PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
counters exist, but are not as important and can be deferred to a
later series.
The first patch is a bug fix that could be submitted separately.
The series has been tested on both Intel and AMD machines, but I have
not found an AMD machine old enough to lack PerfCtrExtCore. I have
made efforts that no part of the code has any dependency on its
presence.
I am aware of similar work in this direction done by Jinrong Liang
[1]. He told me he is not working on it currently and I am not
intruding by making my own submission.
[1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/
Colton Lewis (6):
KVM: x86: selftests: Fix typos in macro variable use
KVM: x86: selftests: Define AMD PMU CPUID leaves
KVM: x86: selftests: Set up AMD VM in pmu_counters_test
KVM: x86: selftests: Test read/write core counters
KVM: x86: selftests: Test core events
KVM: x86: selftests: Test PerfMonV2
.../selftests/kvm/include/x86_64/processor.h | 7 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 267 ++++++++++++++++--
2 files changed, 249 insertions(+), 25 deletions(-)
--
2.46.0.rc2.264.g509ed76dc8-goog