This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
Changes in v2:
- Replace all occurrences of SIGPIPE with SIGSEGV
- Fixed a mismatch between code comment and ksft log
- Add a testcase: Raise the same signal again; it must not be queued
- Remove unneeded <assert.h>, <unistd.h>
- Give a detailed test description in the comments; also describe the
exact meaning of delivered and blocked
- Handle errors for all libc functions/syscalls
- Mention tests in Makefile and .gitignore in alphabetical order
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 194 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 199 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/mp_dmabuf_devmem.h | 46 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 208 ++++++++-
include/net/page_pool/helpers.h | 153 +++++--
include/net/page_pool/types.h | 22 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 375 ++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 103 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 360 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 10 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
47 files changed, 2759 insertions(+), 266 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.2.505.gda0bf45e8d-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v7:
- a better fix for tls_sw_recvmsg.
v6:
- add a fix for tls_sw_recvmsg().
v5:
- add a new patch "Check recv lengths in test_sockmap" instead of using
"continue" in msg_loop.
v4:
- address Martin's comments for v3. (thanks.)
- add Yonghong's "Acked-by" tags. (thanks.)
- update subject-prefix from "bpf-next" to "bpf".
Patch 1, v3 of "selftests/bpf: Add F_SETFL for fcntl":
- detect nonblock flag automatically, then test_sockmap can run in both
block and nonblock modes.
- use continue instead of again in v2.
Patch 2, fix for umount cgroup2 error.
Geliang Tang (2):
tls: receive msg again for sk_redirect
selftests/bpf: Add F_SETFL for fcntl in test_sockmap
net/tls/tls_sw.c | 3 +++
tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)
--
2.43.0
Hi,
This builds on the proposal[1] from Mark and lets me convert the
existing usercopy selftest to KUnit. Besides adding this basic test to
the KUnit collection, it also opens the door for execve testing (which
depends on having a functional current->mm), and should provide the
basic infrastructure for adding Mark's much more complete usercopy tests.
v2:
- dropped "initial VMA", turns out it wasn't needed (Mark)
- various cleanups in the test itself (Vitor)
- moved new kunit resource to a separate file (David)
v1: https://lore.kernel.org/all/20240519190422.work.715-kees@kernel.org/
-Kees
[1] https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/
Kees Cook (2):
kunit: test: Add vm_mmap() allocation resource manager
usercopy: Convert test_user_copy to KUnit test
MAINTAINERS | 1 +
include/kunit/test.h | 17 ++
lib/Kconfig.debug | 21 +-
lib/Makefile | 2 +-
lib/kunit/Makefile | 1 +
lib/kunit/user_alloc.c | 111 +++++++++
lib/{test_user_copy.c => usercopy_kunit.c} | 273 ++++++++++-----------
7 files changed, 271 insertions(+), 155 deletions(-)
create mode 100644 lib/kunit/user_alloc.c
rename lib/{test_user_copy.c => usercopy_kunit.c} (47%)
--
2.34.1
This patchset enables both detecting as well as dumping compilable
prototypes for kfuncs.
The first commit instructs pahole to DECL_TAG kfuncs when available.
This requires v1.27 which was released on 6/11/24. With it, users will
be able to look at BTF inside vmlinux (or modules) and check if the
kfunc they want is available.
The final commit teaches bpftool how to dump kfunc prototypes. This
is done for developer convenience.
The rest of the commits are fixups to enable selftests to use the
newly dumped kfunc prototypes. With these, selftests will regularly
exercise the newly added codepaths.
Tested with and without the required pahole changes:
* https://github.com/kernel-patches/bpf/pull/7186
* https://github.com/kernel-patches/bpf/pull/7187
=== Changelog ===
From v4:
* Change bpf_session_cookie() return type
* Only fixup used fentry test kfunc prototypes
* Extract out projection detection into shared btf_is_projection_of()
* Fix kernel test robot build warnings about doc comments
From v3:
* Teach selftests to use dumped prototypes
From v2:
* Update Makefile.btf with pahole flag
* More error checking
* Output formatting changes
* Drop already-merged commit
From v1:
* Add __weak annotation
* Use btf_dump for kfunc prototypes
* Update kernel bpf_rdonly_cast() signature
Daniel Xu (12):
kbuild: bpf: Tell pahole to DECL_TAG kfuncs
bpf: selftests: Fix bpf_iter_task_vma_new() prototype
bpf: selftests: Fix fentry test kfunc prototypes
bpf: selftests: Fix bpf_cpumask_first_zero() kfunc prototype
bpf: selftests: Fix bpf_map_sum_elem_count() kfunc prototype
bpf: Make bpf_session_cookie() kfunc return long *
bpf: selftests: Namespace struct_opt callbacks in bpf_dctcp
bpf: verifier: Relax caller requirements for kfunc projection type
args
bpf: treewide: Align kfunc signatures to prog point-of-view
bpf: selftests: nf: Opt out of using generated kfunc prototypes
bpf: selftests: xfrm: Opt out of using generated kfunc prototypes
bpftool: Support dumping kfunc prototypes from BTF
fs/verity/measure.c | 5 +-
include/linux/bpf.h | 8 +--
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 13 ++++-
kernel/bpf/crypto.c | 24 +++++---
kernel/bpf/helpers.c | 39 +++++++++----
kernel/bpf/verifier.c | 12 +++-
kernel/trace/bpf_trace.c | 17 +++---
net/core/filter.c | 32 +++++++----
scripts/Makefile.btf | 2 +-
tools/bpf/bpftool/btf.c | 55 +++++++++++++++++++
.../testing/selftests/bpf/bpf_experimental.h | 2 +-
tools/testing/selftests/bpf/progs/bpf_dctcp.c | 36 ++++++------
.../selftests/bpf/progs/get_func_ip_test.c | 7 +--
.../selftests/bpf/progs/ip_check_defrag.c | 10 ++--
.../selftests/bpf/progs/map_percpu_stats.c | 2 +-
.../selftests/bpf/progs/nested_trust_common.h | 2 +-
.../testing/selftests/bpf/progs/test_bpf_nf.c | 1 +
.../selftests/bpf/progs/test_bpf_nf_fail.c | 1 +
.../bpf/progs/verifier_netfilter_ctx.c | 6 +-
.../selftests/bpf/progs/xdp_synproxy_kern.c | 1 +
tools/testing/selftests/bpf/progs/xfrm_info.c | 1 +
22 files changed, 193 insertions(+), 84 deletions(-)
--
2.44.0
KTAP parsers interpret the output of ksft_test_result_*() as being the
name of the test. The map_fixed_noreplace test uses a dynamically
allocated base address for the mmap()s that it tests and currently
includes this in the test names that it logs so the test names that are
logged are not stable between runs. It also uses multiples of PAGE_SIZE
which mean that runs for kernels with different PAGE_SIZE configurations
can't be directly compared. Both these factors cause issues for CI
systems when interpreting and displaying results.
Fix this by replacing the current test names with fixed strings
describing the intent of the mappings that are logged, the existing
messages with the actual addresses and sizes are retained as diagnostic
prints to aid in debugging.
Fixes: 4838cf70e539 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/map_fixed_noreplace.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/map_fixed_noreplace.c b/tools/testing/selftests/mm/map_fixed_noreplace.c
index b74813fdc951..d53de2486080 100644
--- a/tools/testing/selftests/mm/map_fixed_noreplace.c
+++ b/tools/testing/selftests/mm/map_fixed_noreplace.c
@@ -67,7 +67,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error: munmap failed!?\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 5*PAGE_SIZE at base\n");
addr = base_addr + page_size;
size = 3 * page_size;
@@ -76,7 +77,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error: first mmap() failed unexpectedly\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 3*PAGE_SIZE at base+PAGE_SIZE\n");
/*
* Exact same mapping again:
@@ -93,7 +95,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:1: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 5*PAGE_SIZE at base\n");
/*
* Second mapping contained within first:
@@ -111,7 +114,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:2: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 2*PAGE_SIZE at base+PAGE_SIZE\n");
/*
* Overlap end of existing mapping:
@@ -128,7 +132,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:3: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 2*PAGE_SIZE at base+(3*PAGE_SIZE)\n");
/*
* Overlap start of existing mapping:
@@ -145,7 +150,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:4: mmap() succeeded when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() 2*PAGE_SIZE bytes at base\n");
/*
* Adjacent to start of existing mapping:
@@ -162,7 +168,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:5: mmap() failed when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() PAGE_SIZE at base\n");
/*
* Adjacent to end of existing mapping:
@@ -179,7 +186,8 @@ int main(void)
dump_maps();
ksft_exit_fail_msg("Error:6: mmap() failed when it shouldn't have\n");
}
- ksft_test_result_pass("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_print_msg("mmap() @ 0x%lx-0x%lx p=%p result=%m\n", addr, addr + size, p);
+ ksft_test_result_pass("mmap() PAGE_SIZE at base+(4*PAGE_SIZE)\n");
addr = base_addr;
size = 5 * page_size;
---
base-commit: c3f38fa61af77b49866b006939479069cd451173
change-id: 20240605-kselftest-mm-fixed-noreplace-44e7e55c861a
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hello,
This v2 addresses some issues observed when running the ACPI probe
kselftest proposed in v1[1] across various devices and improves the overall
reliability of the test.
The acpi-extract-ids script has been improved to:
- Parse both .c and .h files
- Add an option to print only IDs matched by a driver (i.e. defined in an
ACPI match tables or in lists of IDs provided by the drivers)
The test_unprobed_devices.sh script relies on sysfs information to
determine if a device was successfully bound to a driver. Not all devices
listed in /sys/devices are expected to have a driver folder, so the script
has been adjusted to handle these cases and avoid generating false
negatives.
The test_unprobed_devices.sh test script logic has been modified to:
- Check the status attribute (when available) to exclusively test hardware
devices that are physically present, enabled and operational
- Traverse only ACPI objects with a physical_node* link, to ensure testing
of correctly enumerated devices
- Skip devices whose HID or CID are not matched by any driver, as
determined by the list generated through the acpi-extract-ids script
- Skip devices with HID or CID listed in the ignored IDs list. This list
has been added to contain IDs of devices that don't require a driver or
cannot be represented as platform devices (e.g. ACPI container and module
devices).
- Skip devices that are natively enumerated and don't need a driver, such
as certain PCI bridges
- Skip devices unassigned to any subsystem, devices linked to other devices
and class devices
Some of the heuristics used by the script are suboptimal and might require
adjustments over time. This kind of tests would greatly benefit from a
dedicated interface that exposes information about devices expected to be
matched by drivers and their probe status. Discussion regarding this matter
was initiated in v1.
As of now, I have not identified a suitable method for exposing this
information; I plan on submitting a separate RFC to propose some options
and engage in discussion. Meanwhile, this v2 focuses on utilizing already
available information to provide an ACPI equivalent of the existing DT
kselftest [2].
Adding in CC the people involved in the discussion at Plumbers [3], feel
free to add anyone that might be interested in this.
This series depends on:
- https://lore.kernel.org/all/20240102141528.169947-1-laura.nao@collabora.com…
- https://lore.kernel.org/all/20240131-ktap-sh-helpers-extend-v1-0-98ffb46871…
Thanks,
Laura
[1] https://lore.kernel.org/all/20230925155806.1812249-2-laura.nao@collabora.co…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
[3] https://www.youtube.com/watch?v=oE73eVSyFXQ&t=9377s
Original cover letter:
Regressions that prevent a driver from probing a device can significantly
affect the functionality of a platform.
A kselftest to verify if devices on a DT-based platform are probed
correctly was recently introduced [4], but no such generic test is
available for ACPI platforms yet. bootrr [5] provides device probe
testing, but relies on a pre-defined list of the peripherals present on
each DUT.
On ACPI based hardware, a complete description of the platform is
provided to the OS by the system firmware. ACPI namespace objects are
mapped by the Linux ACPI subsystem into a device tree in
/sys/devices/LNXSYSTEM:00; the information in this subtree can be parsed
to build a list of the hw peripherals present on the DUT dynamically.
This series adds a test to verify if the devices declared in the ACPI
namespace and supported by the kernel are probed correctly.
This work follows a similar approach to [4], adapted for the ACPI use
case.
The first patch introduces a script that builds a list of all ACPI device
IDs supported by the kernel, by inspecting the acpi_device_id structs in
the sources. This list can be used to avoid testing ACPI-enumerated
devices that don't have a matching driver in the kernel. This script was
highly inspired by the dt-extract-compatibles script [6].
In the second patch, a new kselftest is added. It parses the
/sys/devices/LNXSYSTEM:00 tree to obtain a list of all platform
peripherals and verifies which of those, if supported, are correctly
bound to a driver.
Feedback is much appreciated,
Thank you,
Laura
[4] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
[5] https://github.com/kernelci/bootr
[6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scr…
Laura Nao (2):
acpi: Add script to extract ACPI device ids in the kernel
kselftest: Add test to detect unprobed devices on ACPI platforms
MAINTAINERS | 2 +
scripts/acpi/acpi-extract-ids | 99 +++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/acpi/.gitignore | 1 +
tools/testing/selftests/acpi/Makefile | 21 +++
tools/testing/selftests/acpi/id_ignore_list | 3 +
.../selftests/acpi/test_unprobed_devices.sh | 138 ++++++++++++++++++
7 files changed, 265 insertions(+)
create mode 100755 scripts/acpi/acpi-extract-ids
create mode 100644 tools/testing/selftests/acpi/.gitignore
create mode 100644 tools/testing/selftests/acpi/Makefile
create mode 100644 tools/testing/selftests/acpi/id_ignore_list
create mode 100755 tools/testing/selftests/acpi/test_unprobed_devices.sh
--
2.30.2
Although "TAP" word is being used already in documentation, but it hasn't
been defined in informative way for developers that how to write TAP
conformant tests and what are the benefits. Write a short brief about it.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Documentation/dev-tools/kselftest.rst | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index dcf634e411bd9..b579f491f3e97 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -228,6 +228,14 @@ In general, the rules for selftests are
* Don't cause the top-level "make run_tests" to fail if your feature is
unconfigured.
+ * The output of tests must conform to the TAP standard to ensure high
+ testing quality and to capture failures/errors with specific details.
+ The kselftest.h and kselftest_harness.h headers provide wrappers for
+ outputting test results such as pass, fail, or skip etc. These wrappers
+ should be used instead of reinventing the wheel or using raw printf and
+ exit statements. CI systems can easily parse TAP output messages to
+ detect test failures.
+
Contributing new tests (details)
================================
--
2.39.2
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Drop type, noconnect and must_fail from network_helper_opts. And use
start_server_str in mptcp and test_tcp_check_syncookie_user.
Patches 1-3 address Martin's comments in the previous series.
Geliang Tang (5):
selftests/bpf: Drop type from network_helper_opts
selftests/bpf: Drop noconnect from network_helper_opts
selftests/bpf: Drop must_fail from network_helper_opts
selftests/bpf: Use start_server_str in mptcp
selftests/bpf: Use start_server_str in test_tcp_check_syncookie_user
tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++-----
tools/testing/selftests/bpf/network_helpers.h | 5 ++--
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 5 +---
.../bpf/prog_tests/ip_check_defrag.c | 7 ++---
.../testing/selftests/bpf/prog_tests/mptcp.c | 7 +----
.../bpf/test_tcp_check_syncookie_user.c | 29 ++-----------------
6 files changed, 21 insertions(+), 52 deletions(-)
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v6:
- add a fix for tls_sw_recvmsg().
v5:
- add a new patch "Check recv lengths in test_sockmap" instead of using
"continue" in msg_loop.
v4:
- address Martin's comments for v3. (thanks.)
- add Yonghong's "Acked-by" tags. (thanks.)
- update subject-prefix from "bpf-next" to "bpf".
Patch 1, v3 of "selftests/bpf: Add F_SETFL for fcntl":
- detect nonblock flag automatically, then test_sockmap can run in both
block and nonblock modes.
- use continue instead of again in v2.
Patch 2, fix for umount cgroup2 error.
Geliang Tang (2):
tls: wait for receiving next skb for sk_redirect
selftests/bpf: Add F_SETFL for fcntl in test_sockmap
net/tls/tls_sw.c | 2 ++
tools/testing/selftests/bpf/test_sockmap.c | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
--
2.43.0
PMU event filter test fails on zen4 architecture because of the
unavailability of family and model check for zen4 in use_amd_pmu().
use_amd_pmu() is added to detect architectures that supports event
select 0xc2 umask 0 as "retired branch instructions".
Model ranges in is_zen1(), is_zen2() and is_zen3() are used only for
sever SOCs, so they might not cover all the model ranges which supports
retired branch instructions.
X86_FEATURE_ZEN is a synthetic feature flag specifically added to
recognize all Zen generations by commit 232afb557835d ("x86/CPU/AMD: Add
X86_FEATURE_ZEN1"). init_amd_zen_common() uses family >= 0x17 check to
enable X86_FEATURE_ZEN.
Family 17h+ is where Zen and its successors start and that event 0xc2,0
is supported on all currently released F17h+ processors as branch
instruction retired and it is true going forward to maintain the
backward compatibility for the branch instruction retired.
Since X86_FEATURE_ZEN is not recognized in selftest framework, instead
of checking family and model value for all zen architecture, "family >=
0x17" check is added in use_amd_pmu().
Fixes: bef9a701f3eb ("selftests: kvm/x86: Add test for KVM_SET_PMU_EVENT_FILTER")
Suggested-by: Sandipan Das <sandipan.das(a)amd.com>
Signed-off-by: Manali Shukla <manali.shukla(a)amd.com>
---
.../kvm/x86_64/pmu_event_filter_test.c | 32 +++----------------
1 file changed, 5 insertions(+), 27 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 26b3e7efe5dd..f65033fab0c0 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -353,38 +353,16 @@ static bool use_intel_pmu(void)
kvm_pmu_has(X86_PMU_FEATURE_BRANCH_INSNS_RETIRED);
}
-static bool is_zen1(uint32_t family, uint32_t model)
-{
- return family == 0x17 && model <= 0x0f;
-}
-
-static bool is_zen2(uint32_t family, uint32_t model)
-{
- return family == 0x17 && model >= 0x30 && model <= 0x3f;
-}
-
-static bool is_zen3(uint32_t family, uint32_t model)
-{
- return family == 0x19 && model <= 0x0f;
-}
-
/*
- * Determining AMD support for a PMU event requires consulting the AMD
- * PPR for the CPU or reference material derived therefrom. The AMD
- * test code herein has been verified to work on Zen1, Zen2, and Zen3.
- *
- * Feel free to add more AMD CPUs that are documented to support event
- * select 0xc2 umask 0 as "retired branch instructions."
+ * Family 17h+ is where Zen and its successors start and that event
+ * 0xc2,0 is supported on all currently released F17h+ processors as
+ * branch instruction retired and it is true going forward to maintain
+ * the backward compatibility for the branch instruction retired.
*/
static bool use_amd_pmu(void)
{
uint32_t family = kvm_cpu_family();
- uint32_t model = kvm_cpu_model();
-
- return host_cpu_is_amd &&
- (is_zen1(family, model) ||
- is_zen2(family, model) ||
- is_zen3(family, model));
+ return family >= 0x17;
}
/*
--
2.34.1
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
v3:
Additional Randy Dunlap' comments.
v2:
Update according to Randy Dunlap' comments.
https://lore.kernel.org/linux-mm/20240611034903.3456796-1-jeffxu@chromium.o…
v1:
https://lore.kernel.org/linux-mm/20240607203543.2151433-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
v2:
Update according to Randy Dunlap' comments.
v1:
https://lore.kernel.org/linux-mm/20240607203543.2151433-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
These two subsystems require very similar fixes, so I'm sending them
out together.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1.
2) Added a Reviewed-by tag from Ryan Roberts. See [1] for that.
Related work: I've sent a separate fix that allows "make CC=clang" to
work in addition to "make LLVM=1" [2].
[1] https://lore.kernel.org/518dd1e3-e31a-41c3-b488-9b75a64b6c8a@arm.com
[2] https://lore.kernel.org/20240531183751.100541-2-jhubbard@nvidia.com
John Hubbard (2):
selftests/openat2: fix clang build failures: -static-libasan,
LOCAL_HDRS
selftests/fchmodat2: fix clang build failure due to -static-libasan
tools/testing/selftests/fchmodat2/Makefile | 11 ++++++++++-
tools/testing/selftests/openat2/Makefile | 14 ++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
base-commit: cc8ed4d0a8486c7472cd72ec3c19957e509dc68c
--
2.45.1
Correctable memory errors are very common on servers with large
amount of memory, and are corrected by ECC, but with two
pain points to users:
1. Correction usually happens on the fly and adds latency overhead
2. Not-fully-proved theory states excessive correctable memory
errors can develop into uncorrectable memory error.
Soft offline is kernel's additional solution for memory pages
having (excessive) corrected memory errors. Impacted page is migrated
to healthy page if it is in use, then the original page is discarded
for any future use.
The actual policy on whether (and when) to soft offline should be
maintained by userspace, especially in case of HugeTLB hugepages.
Soft-offline dissolves a hugepage, either in-use or free, into
chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
If userspace has not acknowledged such behavior, it may be surprised
when later mmap hugepages MAP_FAILED due to lack of hugepages.
In addition, discarding the entire 1G memory page only because of
corrected memory errors sounds very costly and kernel better not
doing under the hood. But today there are at least 2 such cases:
1. GHES driver sees both GHES_SEV_CORRECTED and
CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
2. RAS Correctable Errors Collector counts correctable errors per
PFN and when the counter for a PFN reaches threshold
In both cases, userspace has no control of the soft offline performed
by kernel's memory failure recovery.
This patch series give userspace the control of soft-offlining
HugeTLB pages: kernel only soft offlines hugepage if userspace has
opt-ed in for that specific hugepage size, and exposed to userspace
by a new sysfs entry called softoffline_corrected_errors under
/sys/kernel/mm/hugepages/hugepages-${size}kB directory:
* When softoffline_corrected_errors=0, skip soft offlining for all
hugepages of size ${size}kB.
* When softoffline_corrected_errors=1, soft offline as before this
patch series.
By default softoffline_corrected_errors is 1.
This patch set is based at
commit a52b4f11a2e1 ("selftest mm/mseal read-only elf memory segment").
Jiaqi Yan (3):
mm/memory-failure: userspace controls soft-offlining hugetlb pages
selftest/mm: test softoffline_corrected_errors behaviors
docs: hugetlbpage.rst: add softoffline_corrected_errors
Documentation/admin-guide/mm/hugetlbpage.rst | 15 +-
include/linux/hugetlb.h | 17 ++
mm/hugetlb.c | 34 +++
mm/memory-failure.c | 7 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-soft-offline.c | 262 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
8 files changed, 340 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/mm/hugetlb-soft-offline.c
--
2.45.1.288.g0e0cd299f1-goog
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: thead: add a vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for VCSR_VXRM/VCSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
.../devicetree/bindings/riscv/extensions.yaml | 10 +
Documentation/devicetree/bindings/riscv/thead.yaml | 7 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 13 +
arch/riscv/include/asm/hwprobe.h | 4 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 249 +++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 51 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 25 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 67 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 898 insertions(+), 271 deletions(-)
---
base-commit: 11cc01d4d2af304b7288251aad7e03315db8dffc
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Currently, we can run string-stream and assertion tests only when they
are built into the kernel (with config options = y), since some of the
symbols (string-stream functions and functions from assert.c) are not
exported into any of the namespaces, therefore they are not accessible
for the modules.
This patch series exports the required symbols into the KUnit namespace.
Also, it makes the string-stream test a separate module and removes the
log test stub from kunit-test since now we can access the string-stream
symbols even if the test which uses it is built as a module.
Additionally, this patch series merges the assertion test suite into the
kunit-test, since assert.c (and all of the assertion formatting
functions in it) is a part of the KUnit core.
Ivan Orlov (5):
kunit: string-stream: export non-static functions
kunit: kunit-test: Remove stub for log tests
kunit: string-stream-test: Make it a separate module
kunit: assert: export non-static functions
kunit: Merge assertion test into kunit-test.c
lib/kunit/Kconfig | 8 +
lib/kunit/Makefile | 7 +-
lib/kunit/assert.c | 4 +
lib/kunit/assert_test.c | 388 --------------------------------
lib/kunit/kunit-test.c | 397 +++++++++++++++++++++++++++++++--
lib/kunit/string-stream-test.c | 2 +
lib/kunit/string-stream.c | 12 +-
7 files changed, 405 insertions(+), 413 deletions(-)
delete mode 100644 lib/kunit/assert_test.c
--
2.34.1
The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.
For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.
To better support process-wide synchronous self-monitoring, without
events propagating to children that do not share the current process's
shared environment, two pre-requisite patches are added to optionally
restrict inheritance to CLONE_THREAD, and remove events on exec (without
affecting the parent).
Examples how to use these features can be found in the tests added at
the end of the series. In addition to the tests added, the series has
also been subjected to syzkaller fuzzing (focus on 'kernel/events/'
coverage).
Motivation and Example Uses
---------------------------
1. Our immediate motivation is low-overhead sampling-based race
detection for user space [1]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).
2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Other ideas for use-cases we found interesting, but should only
illustrate the range of potential to further motivate the utility (we're
sure there are more):
3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.
4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This could be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.
5. Threads receiving signals on performance events to
throttle/unthrottle themselves.
6. Tracking data flow globally.
Changelog
---------
v4:
* Fix for parent and child racing to exit in sync_child_event().
* Fix race between irq_work running and task's sighand being released by
release_task().
* Generalize setting si_perf and si_addr independent of event type;
introduces perf_event_attr::sig_data, which can be set by user space
to be propagated to si_perf.
* Warning in perf_sigtrap() if ctx->task and current mismatch; we expect
this on architectures that do not properly implement
arch_irq_work_raise().
* Require events that want sigtrap to be associated with a task.
* Dropped "perf: Add breakpoint information to siginfo on SIGTRAP"
in favor of more generic solution (perf_event_attr::sig_data).
v3:
* Add patch "perf: Rework perf_event_exit_event()" to beginning of
series, courtesy of Peter Zijlstra.
* Rework "perf: Add support for event removal on exec" based on
the added "perf: Rework perf_event_exit_event()".
* Fix kselftests to work with more recent libc, due to the way it forces
using the kernel's own siginfo_t.
* Add basic perf-tool built-in test.
v2/RFC: https://lkml.kernel.org/r/20210310104139.679618-1-elver@google.com
* Patch "Support only inheriting events if cloned with CLONE_THREAD"
added to series.
* Patch "Add support for event removal on exec" added to series.
* Patch "Add kselftest for process-wide sigtrap handling" added to
series.
* Patch "Add kselftest for remove_on_exec" added to series.
* Implicitly restrict inheriting events if sigtrap, but the child was
cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if
the child cleared all signal handlers to continue sending SIGTRAP.
* Various minor fixes (see details in patches).
v1/RFC: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com
Pre-series: The discussion at [2] led to the changes in this series. The
approach taken in "Add support for SIGTRAP on perf events" to trigger
the signal was suggested by Peter Zijlstra in [3].
[2] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ…
[3] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n…
Marco Elver (9):
perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
perf: Support only inheriting events if cloned with CLONE_THREAD
perf: Add support for event removal on exec
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf: Add support for SIGTRAP on perf events
selftests/perf_events: Add kselftest for process-wide sigtrap handling
selftests/perf_events: Add kselftest for remove_on_exec
tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
perf test: Add basic stress test for sigtrap handling
Peter Zijlstra (1):
perf: Rework perf_event_exit_event()
arch/m68k/kernel/signal.c | 3 +
arch/x86/kernel/signal_compat.c | 5 +-
fs/signalfd.c | 4 +
include/linux/compat.h | 2 +
include/linux/perf_event.h | 9 +-
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +-
include/uapi/linux/perf_event.h | 12 +-
include/uapi/linux/signalfd.h | 4 +-
kernel/events/core.c | 302 +++++++++++++-----
kernel/fork.c | 2 +-
kernel/signal.c | 11 +
tools/include/uapi/linux/perf_event.h | 12 +-
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 5 +
tools/perf/tests/sigtrap.c | 150 +++++++++
tools/perf/tests/tests.h | 1 +
.../testing/selftests/perf_events/.gitignore | 3 +
tools/testing/selftests/perf_events/Makefile | 6 +
tools/testing/selftests/perf_events/config | 1 +
.../selftests/perf_events/remove_on_exec.c | 260 +++++++++++++++
tools/testing/selftests/perf_events/settings | 1 +
.../selftests/perf_events/sigtrap_threads.c | 210 ++++++++++++
23 files changed, 924 insertions(+), 87 deletions(-)
create mode 100644 tools/perf/tests/sigtrap.c
create mode 100644 tools/testing/selftests/perf_events/.gitignore
create mode 100644 tools/testing/selftests/perf_events/Makefile
create mode 100644 tools/testing/selftests/perf_events/config
create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c
create mode 100644 tools/testing/selftests/perf_events/settings
create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c
--
2.31.0.208.g409f899ff0-goog
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
Changes in v2:
- Replace all occurrences of SIGPIPE with SIGSEGV
- Add a testcase: Raise the same signal again; it must not be queued
- Remove unneeded <assert.h>, <unistd.h>
- Give a detailed test description in the comments; also describe the
exact meaning of delivered and blocked
- Handle errors for all libc functions/syscalls
- Mention tests in Makefile and .gitignore in alphabetical order
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 194 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 199 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
The different patches here are some unrelated fixes for MPTCP:
- Patch 1 ensures 'snd_una' is initialised on connect in case of MPTCP
fallback to TCP followed by retransmissions before the processing of
any other incoming packets. A fix for v5.9+.
- Patch 2 makes sure the RmAddr MIB counter is incremented, and only
once per ID, upon the reception of a RM_ADDR. A fix for v5.10+.
- Patch 3 doesn't update 'add addr' related counters if the connect()
was not possible. A fix for v5.7+.
- Patch 4 updates the mailmap file to add Geliang's new email address.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (1):
mailmap: map Geliang's new email address
Paolo Abeni (1):
mptcp: ensure snd_una is properly initialized on connect
YonglongLi (2):
mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
mptcp: pm: update add_addr counters after connect
.mailmap | 1 +
net/mptcp/pm_netlink.c | 21 ++++++++++++++-------
net/mptcp/protocol.c | 1 +
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++--
4 files changed, 19 insertions(+), 9 deletions(-)
---
base-commit: c44711b78608c98a3e6b49ce91678cd0917d5349
change-id: 20240607-upstream-net-20240607-misc-fixes-024007171d60
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Hi,
This builds on the proposal[1] from Mark and lets me convert the
existing usercopy selftest to KUnit. Besides adding this basic test to
the KUnit collection, it also opens the door for execve testing (which
depends on having a functional current->mm), and should provide the
basic infrastructure for adding Mark's much more complete usercopy tests.
-Kees
[1] https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/
Kees Cook (2):
kunit: test: Add vm_mmap() allocation resource manager
usercopy: Convert test_user_copy to KUnit test
MAINTAINERS | 1 +
include/kunit/test.h | 17 ++
lib/Kconfig.debug | 21 +-
lib/Makefile | 2 +-
lib/kunit/test.c | 139 +++++++++++-
lib/{test_user_copy.c => usercopy_kunit.c} | 252 ++++++++++-----------
6 files changed, 288 insertions(+), 144 deletions(-)
rename lib/{test_user_copy.c => usercopy_kunit.c} (52%)
--
2.34.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
v6:
- Adjust closing/rollback of the IMC perf
- Move the comment in measure_vals() to function level
- Capitalize MBM
- binded to -> bound to
- Language tweak into kerneldoc
- Removed stale paragraph from commit message
v5:
- Open mem bw file only once and use rewind().
- Add \n to mem bw file read to allow reading fresh values from the file.
- Return 0 if create_grp() is given NULL grp_name (matches the original
behavior). Mention this in function's kerneldoc.
- Cast pid_t to int before printing with %d.
- Caps/typo fixes to kerneldoc and commit messages.
- Use imperative tone in commit messages and improve them based on points
that came up during review.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 10 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 371 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 67 ++--
8 files changed, 291 insertions(+), 285 deletions(-)
--
2.39.2
This patch addresses the present TODO in the file.
I have tested it manually on my system and added relevant filtering to
ensure that the correct feature list is being checked.
Signed-off-by: Abhinav Jain <jain.abhinav177(a)gmail.com>
---
tools/testing/selftests/net/netdevice.sh | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/netdevice.sh b/tools/testing/selftests/net/netdevice.sh
index e3afcb424710..cbe2573c3827 100755
--- a/tools/testing/selftests/net/netdevice.sh
+++ b/tools/testing/selftests/net/netdevice.sh
@@ -117,14 +117,31 @@ kci_netdev_ethtool()
return 1
fi
- ethtool -k "$netdev" > "$TMP_ETHTOOL_FEATURES"
+ ethtool -k "$netdev" | tail -n +2 > "$TMP_ETHTOOL_FEATURES"
if [ $? -ne 0 ];then
echo "FAIL: $netdev: ethtool list features"
rm "$TMP_ETHTOOL_FEATURES"
return 1
fi
echo "PASS: $netdev: ethtool list features"
- #TODO for each non fixed features, try to turn them on/off
+
+ for feature in $(grep -v fixed "$TMP_ETHTOOL_FEATURES" | \
+ awk '{print $1}' | sed 's/://'); do
+ ethtool --offload "$netdev" "$feature" off
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned off feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn off feature: $feature"
+ fi
+
+ ethtool --offload "$netdev" "$feature" on
+ if [ $? -eq 0 ]; then
+ echo "PASS: $netdev: Turned on feature: $feature"
+ else
+ echo "FAIL: $netdev: Failed to turn on feature: $feature"
+ fi
+ done
+
rm "$TMP_ETHTOOL_FEATURES"
kci_netdev_ethtool_test 74 'dump' "ethtool -d $netdev"
--
2.34.1
From: Pankaj Raghav <p.raghav(a)samsung.com>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the
variable dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio
support. The issue was compiler (gcc 13.2.0) was optimizing out the
dummy variable, therefore, not creating huge page in the page cache.
Use asm volatile() trick to force the compiler not to optimize out
the loop where we read from the mmaped addr. This is similar to what is
being done in other tests (cow.c, etc)
As the variable is now used in the asm statement, remove the unused
attribute.
Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com>
---
Changes since v2:
- Use the asm volatile trick to force the compiler to not optimize the
read into dummy variable. (David)
tools/testing/selftests/mm/split_huge_page_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index d3c7f5fb3e7b..e5e8dafc9d94 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -300,7 +300,7 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
char **addr)
{
size_t i;
- int __attribute__((unused)) dummy = 0;
+ int dummy = 0;
srand(time(NULL));
@@ -341,6 +341,7 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
for (size_t i = 0; i < fd_size; i++)
dummy += *(*addr + i);
+ asm volatile("" : "+r" (dummy));
if (!check_huge_file(*addr, fd_size / pmd_pagesize, pmd_pagesize)) {
ksft_print_msg("No large pagecache folio generated, please provide a filesystem supporting large folio\n");
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
--
2.44.1
The purpose of this series is to rethink how HID-BPF is invoked.
Currently it implies a jmp table, a prog fd bpf_map, a preloaded tracing
bpf program and a lot of manual work for handling the bpf program
lifetime and addition/removal.
OTOH, bpf_struct_ops take care of most of the bpf handling leaving us
with a simple list of ops pointers, and we can directly call the
struct_ops program from the kernel as a regular function.
The net gain right now is in term of code simplicity and lines of code
removal (though is an API breakage), but udev-hid-bpf is able to handle
such breakages.
In the near future, we will be able to extend the HID-BPF struct_ops
with entrypoints for hid_hw_raw_request() and hid_hw_output_report(),
allowing for covering all of the initial use cases:
- firewalling a HID device
- fixing all of the HID device interactions (not just device events as
it is right now).
The matching user-space loader (udev-hid-bpf) MR is at
https://gitlab.freedesktop.org/libevdev/udev-hid-bpf/-/merge_requests/86
I'll put it out of draft once this is merged.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- drop HID_BPF_FLAGS enum and use BPF_F_BEFORE instead
- fix .init_members to not open code member->offset
- allow struct hid_device to be writeable from HID-BPF for its name,
uniq and phys
- Link to v1: https://lore.kernel.org/r/20240528-hid_bpf_struct_ops-v1-0-8c6663df27d8@ker…
---
Benjamin Tissoires (16):
HID: rename struct hid_bpf_ops into hid_ops
HID: bpf: add hid_get/put_device() helpers
HID: bpf: implement HID-BPF through bpf_struct_ops
selftests/hid: convert the hid_bpf selftests with struct_ops
HID: samples: convert the 2 HID-BPF samples into struct_ops
HID: bpf: add defines for HID-BPF SEC in in-tree bpf fixes
HID: bpf: convert in-tree fixes into struct_ops
HID: bpf: remove tracing HID-BPF capability
selftests/hid: add subprog call test
Documentation: HID: amend HID-BPF for struct_ops
Documentation: HID: add a small blurb on udev-hid-bpf
HID: bpf: Artist24: remove unused variable
HID: bpf: error on warnings when compiling bpf objects
bpf: allow bpf helpers to be used into HID-BPF struct_ops
HID: bpf: rework hid_bpf_ops_btf_struct_access
HID: bpf: make part of struct hid_device writable
Documentation/hid/hid-bpf.rst | 173 ++++---
drivers/hid/bpf/Makefile | 2 +-
drivers/hid/bpf/entrypoints/Makefile | 93 ----
drivers/hid/bpf/entrypoints/README | 4 -
drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 25 -
drivers/hid/bpf/entrypoints/entrypoints.lskel.h | 248 ---------
drivers/hid/bpf/hid_bpf_dispatch.c | 266 +++-------
drivers/hid/bpf/hid_bpf_dispatch.h | 12 +-
drivers/hid/bpf/hid_bpf_jmp_table.c | 565 ---------------------
drivers/hid/bpf/hid_bpf_struct_ops.c | 298 +++++++++++
drivers/hid/bpf/progs/FR-TEC__Raptor-Mach-2.bpf.c | 9 +-
drivers/hid/bpf/progs/HP__Elite-Presenter.bpf.c | 6 +-
drivers/hid/bpf/progs/Huion__Kamvas-Pro-19.bpf.c | 9 +-
.../hid/bpf/progs/IOGEAR__Kaliber-MMOmentum.bpf.c | 6 +-
drivers/hid/bpf/progs/Makefile | 2 +-
.../hid/bpf/progs/Microsoft__XBox-Elite-2.bpf.c | 6 +-
drivers/hid/bpf/progs/Wacom__ArtPen.bpf.c | 6 +-
drivers/hid/bpf/progs/XPPen__Artist24.bpf.c | 10 +-
drivers/hid/bpf/progs/XPPen__ArtistPro16Gen2.bpf.c | 24 +-
drivers/hid/bpf/progs/hid_bpf.h | 5 +
drivers/hid/hid-core.c | 6 +-
include/linux/hid_bpf.h | 125 ++---
samples/hid/Makefile | 5 +-
samples/hid/hid_bpf_attach.bpf.c | 18 -
samples/hid/hid_bpf_attach.h | 14 -
samples/hid/hid_mouse.bpf.c | 26 +-
samples/hid/hid_mouse.c | 39 +-
samples/hid/hid_surface_dial.bpf.c | 10 +-
samples/hid/hid_surface_dial.c | 53 +-
tools/testing/selftests/hid/hid_bpf.c | 100 +++-
tools/testing/selftests/hid/progs/hid.c | 100 +++-
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 19 +-
32 files changed, 805 insertions(+), 1479 deletions(-)
---
base-commit: 70ec81c2e2b4005465ad0d042e90b36087c36104
change-id: 20240513-hid_bpf_struct_ops-e3212a224555
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
This series fixes build errors found by clang to allow the x86 suite to
get built with the clang.
Unfortunately, there is one bug [1] in the clang becuase of which
extended asm isn't handled correctly by it and build fails for
sysret_rip.c. Hence even after this series the build of this test would
fail with clang. Should we disable this test for now when clang is used
until the bug is fixed in clang? Not sure. Any opinions?
[1] https://github.com/llvm/llvm-project/issues/53728
Muhammad Usama Anjum (8):
selftests: x86: Remove dependence of headers file
selftests: x86: check_initial_reg_state: remove -no-pie while using
-static
selftests: x86: test_vsyscall: remove unused function
selftests: x86: fsgsbase_restore: fix asm directive from =rm to =r
selftests: x86: syscall_arg_fault_32: remove unused variable
selftests: x86: test_FISTTP: use fisttps instead of ambigous fisttp
selftests: x86: fsgsbase: Remove unused function and variable
selftests: x86: amx: Remove unused functions
tools/testing/selftests/x86/Makefile | 9 +++++----
tools/testing/selftests/x86/amx.c | 16 ----------------
tools/testing/selftests/x86/fsgsbase.c | 6 ------
tools/testing/selftests/x86/fsgsbase_restore.c | 2 +-
tools/testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/test_FISTTP.c | 8 ++++----
tools/testing/selftests/x86/test_vsyscall.c | 5 -----
7 files changed, 10 insertions(+), 37 deletions(-)
--
2.39.2
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V4:
- Added this test to run_vmtests.sh.
V3:
- Fixed the build error when it is compiled with _FORTIFY_SOURCE.
V2:
- Addressed all review commets from Muhammad Usama Anjum
https://lore.kernel.org/all/20240604132801.23377-1-donettom@linux.ibm.com/
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 ++++++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 1 +
3 files changed, 120 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..986f3b6c7f7b
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 3157204b9047..5698d519170d 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -265,6 +265,7 @@ CATEGORY="hugetlb" run_test ./map_hugetlb
CATEGORY="hugetlb" run_test ./hugepage-mremap
CATEGORY="hugetlb" run_test ./hugepage-vmemmap
CATEGORY="hugetlb" run_test ./hugetlb-madvise
+CATEGORY="hugetlb" run_test ./hugetlb_dio
nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
# For this test, we need one and just one huge page
--
2.43.0
in the main function of vdso_restorer.c,there is a dlopen function,
but there is no dlclose function to close the file
Signed-off-by: liujing <liujing(a)cmss.chinamobile.com>
---
tools/testing/selftests/x86/vdso_restorer.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/x86/vdso_restorer.c b/tools/testing/selftests/x86/vdso_restorer.c
index fe99f2434155..a0b1155dee31 100644
--- a/tools/testing/selftests/x86/vdso_restorer.c
+++ b/tools/testing/selftests/x86/vdso_restorer.c
@@ -57,6 +57,8 @@ int main()
return 0;
}
+ dlclose(vdso);
+
memset(&sa, 0, sizeof(sa));
sa.handler = handler_with_siginfo;
sa.flags = SA_SIGINFO;
--
2.18.2
Conform individual tests to TAP output. One patch conform one test. With
this series, all vDSO tests become TAP conformant.
Muhammad Usama Anjum (4):
kselftests: vdso: vdso_test_clock_getres: conform test to TAP output
kselftests: vdso: vdso_test_correctness: conform test to TAP output
kselftests: vdso: vdso_test_getcpu: conform test to TAP output
kselftests: vdso: vdso_test_gettimeofday: conform test to TAP output
.../selftests/vDSO/vdso_test_clock_getres.c | 68 ++++----
.../selftests/vDSO/vdso_test_correctness.c | 146 +++++++++---------
.../testing/selftests/vDSO/vdso_test_getcpu.c | 16 +-
.../selftests/vDSO/vdso_test_gettimeofday.c | 23 +--
4 files changed, 126 insertions(+), 127 deletions(-)
--
2.39.2
The kselftests may be built in a couple different ways:
make LLVM=1
make CC=clang
In order to handle both cases, set LLVM=1 if CC=clang. That way,the rest
of lib.mk, and any Makefiles that include lib.mk, can base decisions
solely on whether or not LLVM is set.
Then, build upon that to disable a pair of clang warnings that are
already silenced on gcc.
Doing it this way is much better than the piecemeal approach that I
started with in [1] and [2]. Thanks to Nathan Chancellor for the patch
reviews that led to this approach.
Changes since the first version:
1) Wrote a detailed explanation for suppressing two clang warnings, in
both a lib.mk comment, and the commit description.
2) Added a Reviewed-by tag to the first patch.
[1] https://lore.kernel.org/20240527214704.300444-1-jhubbard@nvidia.com
[2] https://lore.kernel.org/20240527213641.299458-1-jhubbard@nvidia.com
John Hubbard (2):
selftests/lib.mk: handle both LLVM=1 and CC=clang builds
selftests/lib.mk: silence some clang warnings that gcc already ignores
tools/testing/selftests/lib.mk | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V3:
- Fixed the build error when it is compiled with _FORTIFY_SOURCE.
V2:
- Addressed all review commets from Muhammad Usama Anjum
https://lore.kernel.org/all/20240604132801.23377-1-donettom@linux.ibm.com/
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..986f3b6c7f7b
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
--
2.43.0
From: Jeff Xu <jeffxu(a)google.com>
By default, memfd_create() creates a non-sealable MFD, unless the
MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
is new, we expect not many applications will rely on the nature of
MFD_NOEXEC_SEAL being sealable. In most cases, the application already
sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl
vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
to be sealable. This means, any application that does not desire this
behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
migrate/enforce non-executable MFD. This adjustment ensures that
applications can anticipate that the sealable characteristic will
remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás
used Debian Code Search and GitHub to try to find potential breakages
and could only find a single one. Dbus-broker's memfd_create() wrapper
is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
work around it [1]. This workaround will break. Luckily, this only
affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[2]. In
addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a…
[2]: https://github.com/bus1/dbus-broker/pull/366
[3]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
History
======
V2:
update commit message.
add testcase for vm.memfd_noexec
add documentation.
V1:
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
Jeff Xu (2):
memfd: fix MFD_NOEXEC_SEAL to be non-sealable by default
memfd:add MEMFD_NOEXEC_SEAL documentation
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++
mm/memfd.c | 9 +--
tools/testing/selftests/memfd/memfd_test.c | 26 ++++++-
4 files changed, 120 insertions(+), 6 deletions(-)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.1.288.g0e0cd299f1-goog
`MFD_NOEXEC_SEAL` should remove the executable bits and set
`F_SEAL_EXEC` to prevent further modifications to the executable
bits as per the comment in the uapi header file:
not executable and sealed to prevent changing to executable
However, currently, it also unsets `F_SEAL_SEAL`, essentially
acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
that it should be so, and indeed up until the second version
of the of the patchset[0] that introduced `MFD_EXEC` and
`MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
was changed in the third revision of the patchset[1] without
a clear explanation.
This behaviour is suprising for application developers,
there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
has the additional effect of `MFD_ALLOW_SEALING`.
So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
This is technically an ABI break, but it seems very unlikely that an
application would depend on this behaviour (unless by accident).
[0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
[1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Barnabás Pőcze <pobrn(a)protonmail.com>
---
Or did I miss the explanation as to why MFD_NOEXEC_SEAL should
imply MFD_ALLOW_SEALING? If so, please direct me to it and
sorry for the noise.
---
mm/memfd.c | 9 ++++-----
tools/testing/selftests/memfd/memfd_test.c | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c
index 7d8d3ab3fa37..8b7f6afee21d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- if (file_seals) {
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
*file_seals |= F_SEAL_EXEC;
- }
- } else if (flags & MFD_ALLOW_SEALING) {
- /* MFD_EXEC and MFD_ALLOW_SEALING are set */
+ }
+
+ if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
if (file_seals)
*file_seals &= ~F_SEAL_SEAL;
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 18f585684e20..b6a7ad68c3c1 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1151,7 +1151,7 @@ static void test_noexec_seal(void)
mfd_def_size,
MFD_CLOEXEC | MFD_NOEXEC_SEAL);
mfd_assert_mode(fd, 0666);
- mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
}
--
2.45.0
In order to be able to save the current value of a sysctl without changing
it, split the relevant bit out of sysctl_set() into a new helper.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
---
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
Notes:
v2:
- New patch.
tools/testing/selftests/net/forwarding/lib.sh | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index eabbdf00d8ca..9086d2015296 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1134,12 +1134,19 @@ bridge_ageing_time_get()
}
declare -A SYSCTL_ORIG
+sysctl_save()
+{
+ local key=$1; shift
+
+ SYSCTL_ORIG[$key]=$(sysctl -n $key)
+}
+
sysctl_set()
{
local key=$1; shift
local value=$1; shift
- SYSCTL_ORIG[$key]=$(sysctl -n $key)
+ sysctl_save "$key"
sysctl -qw $key="$value"
}
--
2.45.0
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 141 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 146 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
Hello,
We're pleased to announce the return of the Kernel Testing &
Dependability Micro-Conference at Linux Plumbers 2024:
https://lpc.events/event/18/contributions/1665/
You can already submit proposals by selecting the micro-conf in
the Track drop-down list:
https://lpc.events/login/?next=/event/18/abstracts/%23submit-abstract
Please note that the deadline for submissions is *Sunday 16th June*
The event description contains a list of suggested topics
inherited from past editions. Is there anything in particular
you would like to see discussed this year?
Knowing people's interests helps with triaging proposals and
making the micro-conf as relevant as possible. See you there!
Thanks,
Guillaume & Shuah & Sasha
When compiling with Android bionic, the MAP_HUGE_* and SHM_HUGE_* macros
are redefined because they are included from the uapi by sys/mman.h and
sys/shm.h:
INFO: From Compiling common/tools/testing/selftests/mm/thuge-gen.c:
common/tools/testing/selftests/mm/thuge-gen.c:32:9: warning: 'MAP_HUGE_2MB' macro redefined [-Wmacro-redefined]
32 | #define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
| ^
external/_main~_repo_rules~prebuilt_ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/linux/mman.h:38:9: note: previous definition is here
38 | #define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
| ^
common/tools/testing/selftests/mm/thuge-gen.c:33:9: warning: 'MAP_HUGE_1GB' macro redefined [-Wmacro-redefined]
33 | #define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)
| ^
external/_main~_repo_rules~prebuilt_ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/linux/mman.h:44:9: note: previous definition is here
This test should probably use the uapi definitions instead of redefining
them. However, glibc gets struct redefinitions when including sys/shm.h
and linux/shm.h together. So, add guards for the SHM_HUGE_* macros
instead.
Edward Liaw (2):
selftests/mm: Include linux/mman.h
selftests/mm: Guard defines from shm
tools/testing/selftests/mm/thuge-gen.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--
2.45.1.467.gbab1589fc0-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
For moving dctcp test dedicated code out of do_test() into test_dctcp().
This patchset adds a new helper start_test() in bpf_tcp_ca.c to refactor
do_test().
Address Martin's comments for the previous series.
Geliang Tang (5):
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add start_test helper in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 140 +++++++++++-------
1 file changed, 85 insertions(+), 55 deletions(-)
--
2.43.0
From: Pankaj Raghav <p.raghav(a)samsung.com>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the
variable dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio
support. The issue was compiler (gcc 13.2.0) was optimizing out the
dummy variable, therefore, not creating huge page in the page cache.
Add volatile keyword to force compiler not to optimize out the loop
where we read from the mmaped addr.
Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index d3c7f5fb3e7b..c573a58f80ab 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -300,7 +300,7 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
char **addr)
{
size_t i;
- int __attribute__((unused)) dummy = 0;
+ volatile int __attribute__((unused)) dummy = 0;
srand(time(NULL));
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
--
2.44.1
From: Pankaj Raghav <p.raghav(a)samsung.com>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the
variable dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio
support. The issue was compiler (gcc 13.2.0) was optimizing out the
dummy variable, therefore, not creating huge page in the page cache.
Make it as a global variable to force the compiler not to optimize out
the loop where we read from the mmaped addr.
Signed-off-by: Pankaj Raghav <p.raghav(a)samsung.com>
---
Changes since v1:
- Make the dummy variable as a global variable(willy).
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index d3c7f5fb3e7b..c4857de2c042 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -23,6 +23,11 @@
uint64_t pagesize;
unsigned int pageshift;
uint64_t pmd_pagesize;
+/*
+ * Used by create_pagecache_thp_and_fd() to do mmap read.
+ * Made it as global to avoid compiler optimizing out the variable.
+ */
+int dummy;
#define SPLIT_DEBUGFS "/sys/kernel/debug/split_huge_pages"
#define SMAP_PATH "/proc/self/smaps"
@@ -300,7 +305,6 @@ int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, int *fd,
char **addr)
{
size_t i;
- int __attribute__((unused)) dummy = 0;
srand(time(NULL));
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
--
2.44.1
While looking at using 'lib.sh' for the MPTCP selftests [1], we found
some small issues with 'lib.sh'. Here they are:
- Patch 1: fix 'errexit' (set -e) support with busywait. 'errexit' is
supported in some functions, not all. A fix for v6.8+.
- Patch 2: avoid confusing error messages linked to the cleaning part
when the netns setup fails. A fix for v6.8+.
- Patch 3: set a variable as local to avoid accidentally changing the
value of a another one with the same name on the caller side. A fix
for v6.10-rc1+.
Link: https://lore.kernel.org/mptcp/5f4615c3-0621-43c5-ad25-55747a4350ce@kernel.o… [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (3):
selftests: net: lib: support errexit with busywait
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: set 'i' as local
tools/testing/selftests/net/lib.sh | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
---
base-commit: a535d59432370343058755100ee75ab03c0e3f91
change-id: 20240605-upstream-net-20240605-selftests-net-lib-fixes-7a90a1a8d9d2
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V2:
- Addressed all review commets from Muhammad Usama Anjum
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..e4f4924179c8
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
--
2.43.0
Fixed MAC addresses help with debugging as last four bytes identify the
network namespace.
Signed-off-by: Lukasz Majewski <lukma(a)denx.de>
---
tools/testing/selftests/net/hsr/hsr_ping.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index 3684b813b0f6..f5d207fc770a 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -152,6 +152,15 @@ setup_hsr_interfaces()
ip -net "$ns3" addr add 100.64.0.3/24 dev hsr3
ip -net "$ns3" addr add dead:beef:1::3/64 dev hsr3 nodad
+ ip -net "$ns1" link set address 00:11:22:00:01:01 dev ns1eth1
+ ip -net "$ns1" link set address 00:11:22:00:01:02 dev ns1eth2
+
+ ip -net "$ns2" link set address 00:11:22:00:02:01 dev ns2eth1
+ ip -net "$ns2" link set address 00:11:22:00:02:02 dev ns2eth2
+
+ ip -net "$ns3" link set address 00:11:22:00:03:01 dev ns3eth1
+ ip -net "$ns3" link set address 00:11:22:00:03:02 dev ns3eth2
+
# All Links up
ip -net "$ns1" link set ns1eth1 up
ip -net "$ns1" link set ns1eth2 up
--
2.20.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
Hopefully these reach also Shuah successfully as I've recently seen
rejects for mail from @linux.intel.com to gmail addresses.
v5:
- Open mem bw file only once and use rewind().
- Add \n to mem bw file read to allow reading fresh values from the file.
- Return 0 if create_grp() is given NULL grp_name (matches the original
behavior). Mention this in function's kerneldoc.
- Cast pid_t to int before printing with %d.
- Caps/typo fixes to kerneldoc and commit messages.
- Use imperative tone in commit messages and improve them based on points
that came up during review.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 10 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 364 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 67 ++--
8 files changed, 290 insertions(+), 279 deletions(-)
--
2.39.2
Hi,
I'm trying to build arm64 selftests on next-20240529. I'm getting build
failures. Complete logs are attached while some snippets are as following:
gcc pac.c /pauth/pac_corruptor.o /pauth/helper.o -o /pauth/pac -Wall -O2 -g
-I/linux_mainline/tools/testing/selftests/ -I/linux_mainline/tools/include
-mbranch-protection=pac-ret -march=armv8.2-a
In file included from pac.c:13:
../../kselftest_harness.h: In function ‘clone3_vfork’:
../../kselftest_harness.h:88:9: error: variable ‘args’ has initializer but
incomplete type
88 | struct clone_args args = {
CC check_prctl
check_prctl.c: In function ‘set_tagged_addr_ctrl’:
check_prctl.c:19:14: error: ‘PR_SET_TAGGED_ADDR_CTRL’ undeclared (first use
in this function)
19 | ret = prctl(PR_SET_TAGGED_ADDR_CTRL, val, 0, 0, 0);
gcc -mbranch-protection=standard -DBTI=1 -ffreestanding -Wall -Wextra -Wall
-O2 -g -I/linux_mainline/tools/testing/selftests/
-I/linux_mainline/tools/include -c -o /bti/test-bti.o test.c
test.c: In function ‘handler’:
test.c:85:50: error: ‘PSR_BTYPE_MASK’ undeclared (first use in this
function); did you mean ‘PSR_MODE_MASK’?
85 | write(1, &"00011011"[((uc->uc_mcontext.pstate & PSR_BTYPE_MASK)
I've GCC 8 installed. I'm not expecting the errors because of a little
older compiler. Any more ideas about the failures?
--
BR,
Muhammad Usama Anjum
The series composes of two parts. The first part Specifically,
patch 1 adds a comment at a callsite of riscv_setup_vsize to clarify how
vlenb is observed by the system. Patch 2 fixes the issue by failing the
boot process of a secondary core if vlenb mismatches.
Here is the organization of the series:
- Patch 1, 2 provide a fix for mismatching vlen problem [1]. The
solution is to fail secondary cores if their vlenb is not the same as
the boot core.
- Patch 3 is a cleanup for introducing ZVE* Vector subextensions. It
gives the obsolete ISA parser the ability to expand ISA extensions for
sigle letter extensions.
- Patch 4, 5, 6 introduce Zve32x, Zve32f, Zve64x, Zve64f, Zve64d for isa
parsing and hwprobe, and document about it.
- Patch 7 makes has_vector() check against ZVE32X instead of V, so most
userspace Vector supports will be available for bare ZVE32X.
- Patch 8 updates the prctl test so that it runs on ZVE32X.
The series is tested on a QEMU and verified that booting, Vector
programs context-switch, signal, ptrace, prctl interfaces works when we
only report partial V from the ISA.
Note that the signal test was performed after applying the commit
c27fa53b858b ("riscv: Fix vector state restore in rt_sigreturn()")
This patch should be able to apply on risc-v for-next branch on top of
the commit 0a16a1728790 ("riscv: select ARCH_HAS_FAST_MULTIPLIER")
[1]: https://lore.kernel.org/all/20240228-vicinity-cornstalk-4b8eb5fe5730@spud/T…
Changes in v5:
- Rebase on top of for-next
- Update comments (1, 7)
- Reorder the documentation patch to the front of patches that it
documents about. (5->4)
- Include ZVE64D to the list, which single letter V implies (6)
- Remove ZVE32F_IMPLY_LIST (5)
- Change the semantic of has_vector() thus rewrite patch 7
- Remove the patch that fixes integer promotion as it is merged else
place (8)
- Link to v4: https://lore.kernel.org/r/20240412-zve-detection-v4-0-e0c45bb6b253@sifive.c…
Changes in v4:
- Add a patch to trigger prctl test on ZVE32X (9)
- Add a patch to fix integer promotion bug in hwprobe (8)
- Fix a build fail on !CONFIG_RISCV_ISA_V (7)
- Add more comment in the assembly code change (2)
- Link to v3: https://lore.kernel.org/r/20240318-zve-detection-v3-0-e12d42107fa8@sifive.c…
Changelog v3:
- Include correct maintainers and mailing list into CC.
- Cleanup isa string parser code (3)
- Adjust extensions order and name (4, 5)
- Refine commit message (6)
Changelog v2:
- Update comments and commit messages (1, 2, 7)
- Refine isa_exts[] lists for zve extensions (4)
- Add a patch for dt-binding (5)
- Make ZVE* extensions depend on has_vector(ZVE32X) (6, 7)
---
---
Andy Chiu (8):
riscv: vector: add a comment when calling riscv_setup_vsize()
riscv: smp: fail booting up smp if inconsistent vlen is detected
riscv: cpufeature: call match_isa_ext() for single-letter extensions
dt-bindings: riscv: add Zve32[xf] Zve64[xfd] ISA extension description
riscv: cpufeature: add zve32[xf] and zve64[xfd] isa detection
riscv: hwprobe: add zve Vector subextensions into hwprobe interface
riscv: vector: adjust minimum Vector requirement to ZVE32X
selftest: run vector prctl test for ZVE32X
Documentation/arch/riscv/hwprobe.rst | 15 ++++++
.../devicetree/bindings/riscv/extensions.yaml | 30 +++++++++++
arch/riscv/include/asm/hwcap.h | 5 ++
arch/riscv/include/asm/vector.h | 10 ++--
arch/riscv/include/uapi/asm/hwprobe.h | 5 ++
arch/riscv/kernel/cpufeature.c | 60 +++++++++++++++++++---
arch/riscv/kernel/head.S | 19 ++++---
arch/riscv/kernel/smpboot.c | 14 +++--
arch/riscv/kernel/sys_hwprobe.c | 11 +++-
arch/riscv/kernel/vector.c | 5 +-
arch/riscv/lib/uaccess.S | 2 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 6 +--
12 files changed, 151 insertions(+), 31 deletions(-)
---
base-commit: 0a16a172879012c42f55ae8c2883e17c1e4e388f
change-id: 20240318-zve-detection-50106d2da527
Best regards,
--
Andy Chiu <andy.chiu(a)sifive.com>
hsr_redbox.sh test need to create bridge for testing. Add the missing
config CONFIG_BRIDGE in config file.
Fixes: eafbf0574e05 ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
tools/testing/selftests/net/hsr/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/hsr/config b/tools/testing/selftests/net/hsr/config
index 22061204fb69..241542441c51 100644
--- a/tools/testing/selftests/net/hsr/config
+++ b/tools/testing/selftests/net/hsr/config
@@ -2,3 +2,4 @@ CONFIG_IPV6=y
CONFIG_NET_SCH_NETEM=m
CONFIG_HSR=y
CONFIG_VETH=y
+CONFIG_BRIDGE=y
--
2.43.0
This patchset makes it possible for MGLRU to consult secondary MMUs
while doing aging, not just during eviction. This allows for more
accurate reclaim decisions, which is especially important for proactive
reclaim.
This series makes the necessary MMU notifier changes to MGLRU and then
includes optimizations on top of that. This series also now includes
changes to access_tracking_perf_test to verify that aging works properly
for pages that are mainly used by KVM.
access_tracking_perf_test also has a mode (-p) to check performance of
MGLRU aging while the VM is faulting memory in. Here are some results:
Testing MGLRU aging while vCPUs are faulting in memory on x86 with the
TDP MMU. THPs disabled.
The test results varied a decent amount from run to run. I did my best to
take representative averages, but nonetheless, the big picture is the
important part.
Main takeaways:
- With the optimizations, the workload is much less impacted
by the presence of aging.
- With the optimizations, MGLRU is able to do aging much more
quickly, especially at 8+ vCPUs on my machine.
./access_tracking_perf_test -p -l -b 1G -v $N_VCPUS # 1G per vCPU
num_vcpus vcpu wall time aging avg pass time
1 (no aging) 0.878822016 n/a
1 (no opt) 0.938250568 0.008236007
1 (opt) 0.912270190 0.007314582
2 (no aging) 0.984959659 n/a
2 (no opt) 1.057880728 0.017989741
2 (opt) 1.037735641 0.013996319
4 (no aging) 1.264881581 n/a
4 (no opt) 1.318849182 0.056164918
4 (opt) 1.314653465 0.029311993
8 (no aging) 1.473883699 n/a
8 (no opt) 1.589441079 0.227419586s
8 (opt) 1.498439592 0.063857740s
16 (no aging) 2.048766096 n/a
16 (no opt) 2.399335597 1.247142841s
16 (opt) 2.000914001 0.121628689s
32 (no aging) 3.316256321 n/a
32 (no opt) 3.955417018 4.347290433
32 (opt) 3.355274507 0.250886289
64 (no aging) 6.498958516 n/a
64 (no opt) 7.127533884 9.815592054
64 (opt) 6.442582168 1.392907010
112 (no aging) 8.498029491 n/a
112 (no opt) 10.21372495 13.47381656
112 (opt) 8.896963554 2.292223850
Previous versions of this series included logic in MGLRU and KVM to
support batching the updates to secondary page tables. This version
removes this logic, as it was complex and not necessary to enable
proactive reclaim. This optimization, as well as the additional
optimizations for arm64 and powerpc, can be done in a later series.
Changes since v3[1]:
- Vastly simplified the series (thanks David). Removed mmu notifier
batching logic entirely.
- Cleaned up how locking is done for mmu_notifier_test/clear_young
(thanks David).
- Look-around is now only done when there are no secondary MMUs
subscribed to MMU notifiers.
- CONFIG_LRU_GEN_WALKS_SECONDARY_MMU has been added.
- Fixed the lockless implementation of kvm_{test,}age_gfn for x86
(thanks David).
- Added MGLRU functional and performance tests to
access_tracking_perf_test (thanks Axel).
- In v3, an mm would be completely ignored (for aging) if there was a
secondary MMU but support for secondary MMU walking was missing. Now,
missing secondary MMU walking support simply skips the notifier
calls (except for eviction).
- Added a sanity check for that range->lockless and range->on_lock are
never both provided for the memslot walk.
For the changes from v2[2] to v3, see v3[1].
This series applies cleanly to mm/mm-unstable and kvm/queue.
[1]: https://lore.kernel.org/linux-mm/20240401232946.1837665-1-jthoughton@google…
[2]: https://lore.kernel.org/kvmarm/20230526234435.662652-1-yuzhao@google.com/
James Houghton (7):
mm/Kconfig: Add LRU_GEN_WALKS_SECONDARY_MMU
mm: multi-gen LRU: Have secondary MMUs participate in aging
KVM: Add lockless memslot walk to KVM
KVM: Move MMU lock acquisition for test/clear_young to architecture
KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn
KVM: arm64: Relax locking for kvm_test_age_gfn and kvm_age_gfn
KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test
Documentation/admin-guide/mm/multigen_lru.rst | 6 +-
arch/arm64/kvm/hyp/pgtable.c | 9 +-
arch/arm64/kvm/mmu.c | 30 +-
arch/loongarch/kvm/mmu.c | 20 +-
arch/mips/kvm/mmu.c | 21 +-
arch/powerpc/kvm/book3s.c | 14 +-
arch/riscv/kvm/mmu.c | 26 +-
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 10 +-
arch/x86/kvm/mmu/tdp_iter.h | 27 +-
arch/x86/kvm/mmu/tdp_mmu.c | 67 ++-
include/linux/kvm_host.h | 1 +
include/linux/mmzone.h | 6 +-
mm/Kconfig | 8 +
mm/rmap.c | 9 +-
mm/vmscan.c | 144 +++++--
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/access_tracking_perf_test.c | 365 ++++++++++++++--
.../selftests/kvm/include/lru_gen_util.h | 55 +++
.../testing/selftests/kvm/lib/lru_gen_util.c | 391 ++++++++++++++++++
virt/kvm/kvm_main.c | 38 +-
21 files changed, 1104 insertions(+), 145 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/lru_gen_util.h
create mode 100644 tools/testing/selftests/kvm/lib/lru_gen_util.c
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1.288.g0e0cd299f1-goog
Currrentl a 32 bit 1u value is being shifted more than 32 bits causing
overflow and incorrect checking of bits 32-63. Fix this by using the
BIT_ULL macro for shifting bits.
Detected by cppcheck:
sev_init2_tests.c:108:34: error: Shifting 32-bit value by 63 bits is
undefined behaviour [shiftTooManyBits]
Fixes: dfc083a181ba ("selftests: kvm: add tests for KVM_SEV_INIT2")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
V2: Fix incorrect variable in 2nd BIT_ULL(), kudos to Dan Carpenter for
catching this error.
---
tools/testing/selftests/kvm/x86_64/sev_init2_tests.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
index 7a4a61be119b..ea09f7a06aa4 100644
--- a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
+++ b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
@@ -105,11 +105,11 @@ void test_features(uint32_t vm_type, uint64_t supported_features)
int i;
for (i = 0; i < 64; i++) {
- if (!(supported_features & (1u << i)))
+ if (!(supported_features & BIT_ULL(i)))
test_init2_invalid(vm_type,
&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) },
"unknown feature");
- else if (KNOWN_FEATURES & (1u << i))
+ else if (KNOWN_FEATURES & BIT_ULL(i))
test_init2(vm_type,
&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) });
}
--
2.39.2
Hi Linus,
Please pull this urgent kselftest fixes update for Linux 6.10-rc3.
This kselftest fixes update consists of fixes to build warnings
in several tests and fixes to ftrace tests.
diff for pull request is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0:
Linux 6.10-rc1 (2024-05-26 15:20:12 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.10-rc3
for you to fetch changes up to 4bf15b1c657d22d1d70173e43264e4606dfe75ff:
selftests/futex: don't pass a const char* to asprintf(3) (2024-05-31 14:37:10 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.10-rc3
This kselftest fixes update consists of fixes to build warnings
in several tests and fixes to ftrace tests.
----------------------------------------------------------------
John Hubbard (3):
selftests/futex: pass _GNU_SOURCE without a value to the compiler
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
Mark Brown (1):
kselftest/alsa: Ensure _GNU_SOURCE is defined
Masami Hiramatsu (Google) (3):
selftests/ftrace: Fix to check required event file
selftests/ftrace: Update required config
selftests/tracing: Fix event filter test to retry up to 10 times
Michael Ellerman (3):
selftests: cachestat: Fix build warnings on ppc64
selftests/openat2: Fix build warnings on ppc64
selftests/overlayfs: Fix build error on ppc64
Steven Rostedt (Google) (1):
tracing/selftests: Fix kprobe event name test for .isra. functions
tools/testing/selftests/alsa/Makefile | 2 +-
tools/testing/selftests/cachestat/test_cachestat.c | 1 +
.../selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
tools/testing/selftests/ftrace/config | 26 ++++++++++++++++------
.../ftrace/test.d/dynevent/test_duplicates.tc | 2 +-
.../ftrace/test.d/filter/event-filter-function.tc | 20 ++++++++++++++++-
.../ftrace/test.d/kprobe/kprobe_eventname.tc | 3 ++-
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
.../selftests/futex/functional/futex_requeue_pi.c | 2 +-
tools/testing/selftests/openat2/openat2_test.c | 1 +
11 files changed, 47 insertions(+), 15 deletions(-)
----------------------------------------------------------------
Hi,
This seven series fix an issue reported by kernel test robot [3].
Shuah, I (as well as Kees and Sean [4]) think this should be in -next
really soon to make sure everything works fine for the v6.9 release,
which is not currently the case. I cannot test against all kselftests
though. I would prefer to let you handle this, but I guess you're not
able to do so and I'll push it on my branch without reply from you.
Even if I push it on my branch, please push it on yours too as soon as
you see this and I'll remove it from mine.
Mark, Jakub, could you please test this series?
As reported by Kernel Test Robot [1] and Sean Christopherson [2], some
tests fail since v6.9-rc1 . This is due to the use of vfork() which
introduced some side effects. Similarly, while making it more generic,
a previous commit made some Landlock file system tests flaky, and
subject to the host's file system mount configuration.
This series fixes all these side effects by replacing vfork() with
clone3() and CLONE_VFORK, which is cleaner (no arbitrary shared memory)
and makes the Kselftest framework more robust.
I tried different approaches and I found this one to be the cleaner and
less invasive for current test cases.
I successfully ran the following tests (using TEST_F and
fork/clone/clone3, and KVM_ONE_VCPU_TEST) with this series:
- kvm:fix_hypercall_test
- kvm:sync_regs_test
- kvm:userspace_msr_exit_test
- kvm:vmx_pmu_caps_test
- landlock:fs_test
- landlock:net_test
- landlock:ptrace_test
- move_mount_set_group:move_mount_set_group_test
- net/af_unix:scm_pidfd
- perf_events:remove_on_exec
- pidfd:pidfd_getfd_test
- pidfd:pidfd_setns_test
- seccomp:seccomp_bpf
- user_events:abi_test
[1] https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com
[2] https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com
[3] https://lore.kernel.org/r/202405100339.vfBe0t9C-lkp@intel.com
[4] https://lore.kernel.org/r/202405061002.01D399877A@keescook
Previous versions:
v1: https://lore.kernel.org/r/20240426172252.1862930-1-mic@digikod.net
v2: https://lore.kernel.org/r/20240429130931.2394118-1-mic@digikod.net
v3: https://lore.kernel.org/r/20240429191911.2552580-1-mic@digikod.net
v4: https://lore.kernel.org/r/20240502210926.145539-1-mic@digikod.net
v5: https://lore.kernel.org/r/20240503105820.300927-1-mic@digikod.net
v6: https://lore.kernel.org/r/20240506165518.474504-1-mic@digikod.net
Regards,
Mickaël Salaün (10):
selftests/pidfd: Fix config for pidfd_setns_test
selftests/landlock: Fix FS tests when run on a private mount point
selftests/harness: Fix fixture teardown
selftests/harness: Fix interleaved scheduling leading to race
conditions
selftests/landlock: Do not allocate memory in fixture data
selftests/harness: Constify fixture variants
selftests/pidfd: Fix wrong expectation
selftests/harness: Share _metadata between forked processes
selftests/harness: Fix vfork() side effects
selftests/harness: Handle TEST_F()'s explicit exit codes
tools/testing/selftests/kselftest_harness.h | 127 +++++++++++++-----
tools/testing/selftests/landlock/fs_test.c | 83 +++++++-----
tools/testing/selftests/pidfd/config | 2 +
.../selftests/pidfd/pidfd_setns_test.c | 2 +-
4 files changed, 147 insertions(+), 67 deletions(-)
base-commit: e67572cd2204894179d89bd7b984072f19313b03
--
2.45.0
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string
parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have
been left out since they target microcontrollers/embedded CPUs and are
not needed by RVA23U64.
Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd
(if D), this series modifies the way ISA string is parsed and now does
it in two phases. First one parses the string and the second one
validates it for the final ISA description.
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Link: https://lore.kernel.org/all/20240409143839.558784-1-cleger@rivosinc.com/ [2]
---
v6:
- Rebased on riscv/for-next
- Remove ternary operator to use 'if()' instead in extension checks
- v5: https://lore.kernel.org/all/20240517145302.971019-1-cleger@rivosinc.com/
v5:
- Merged in Zimop to avoid any uneeded series dependencies
- Rework dependency resolution loop to loop on source isa first rather
than on all extension.
- Disabled extensions in source isa once set in resolved isa
- Rename riscv_resolve_isa() parameters
- v4: https://lore.kernel.org/all/20240429150553.625165-1-cleger@rivosinc.com/
v4:
- Modify validate() callbacks to return 0, -EPROBEDEFER or another
error.
- v3: https://lore.kernel.org/all/20240423124326.2532796-1-cleger@rivosinc.com/
v3:
- Fix typo "exists" -> "exist"
- Remove C implies Zca, Zcd, Zcf, dt-bindings rules
- Rework ISA string resolver to handle dependencies
- v2: https://lore.kernel.org/all/20240418124300.1387978-1-cleger@rivosinc.com/
v2:
- Add Zc* dependencies validation in dt-bindings
- v1: https://lore.kernel.org/lkml/20240410091106.749233-1-cleger@rivosinc.com/
Clément Léger (16):
dt-bindings: riscv: add Zimop ISA extension description
riscv: add ISA extension parsing for Zimop
riscv: hwprobe: export Zimop ISA extension
RISC-V: KVM: Allow Zimop extension for Guest/VM
KVM: riscv: selftests: Add Zimop extension to get-reg-list test
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA extensions validation callback
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 28 ++
.../devicetree/bindings/riscv/extensions.yaml | 95 ++++++
arch/riscv/include/asm/cpufeature.h | 1 +
arch/riscv/include/asm/hwcap.h | 7 +-
arch/riscv/include/uapi/asm/hwprobe.h | 6 +
arch/riscv/include/uapi/asm/kvm.h | 6 +
arch/riscv/kernel/cpufeature.c | 278 ++++++++++++------
arch/riscv/kernel/sys_hwprobe.c | 6 +
arch/riscv/kvm/vcpu_onereg.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 24 ++
10 files changed, 375 insertions(+), 88 deletions(-)
--
2.45.1
Add support for (yet again) more RVA23U64 missing extensions. Add
support for Zimop, Zcmop, Zca, Zcf, Zcd and Zcb extensions ISA string
parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have
been left out since they target microcontrollers/embedded CPUs and are
not needed by RVA23U64.
Since Zc* extensions states that C implies Zca, Zcf (if F and RV32), Zcd
(if D), this series modifies the way ISA string is parsed and now does
it in two phases. First one parses the string and the second one
validates it for the final ISA description.
Link: https://lore.kernel.org/linux-riscv/20240404103254.1752834-1-cleger@rivosin… [1]
Link: https://lore.kernel.org/all/20240409143839.558784-1-cleger@rivosinc.com/ [2]
---
v5:
- Merged in Zimop to avoid any uneeded series dependencies
- Rework dependency resolution loop to loop on source isa first rather
than on all extensions.
- Disabled extensions in source isa once set in resolved isa
- Rename riscv_resolve_isa() parameters
v4:
- Modify validate() callbacks to return 0, -EPROBEDEFER or another
error.
- v3: https://lore.kernel.org/all/20240423124326.2532796-1-cleger@rivosinc.com/
v3:
- Fix typo "exists" -> "exist"
- Remove C implies Zca, Zcd, Zcf, dt-bindings rules
- Rework ISA string resolver to handle dependencies
- v2: https://lore.kernel.org/all/20240418124300.1387978-1-cleger@rivosinc.com/
v2:
- Add Zc* dependencies validation in dt-bindings
- v1: https://lore.kernel.org/lkml/20240410091106.749233-1-cleger@rivosinc.com/
Clément Léger (16):
dt-bindings: riscv: add Zimop ISA extension description
riscv: add ISA extension parsing for Zimop
riscv: hwprobe: export Zimop ISA extension
RISC-V: KVM: Allow Zimop extension for Guest/VM
KVM: riscv: selftests: Add Zimop extension to get-reg-list test
dt-bindings: riscv: add Zca, Zcf, Zcd and Zcb ISA extension
description
riscv: add ISA extensions validation callback
riscv: add ISA parsing for Zca, Zcf, Zcd and Zcb
riscv: hwprobe: export Zca, Zcf, Zcd and Zcb ISA extensions
RISC-V: KVM: Allow Zca, Zcf, Zcd and Zcb extensions for Guest/VM
KVM: riscv: selftests: Add some Zc* extensions to get-reg-list test
dt-bindings: riscv: add Zcmop ISA extension description
riscv: add ISA extension parsing for Zcmop
riscv: hwprobe: export Zcmop ISA extension
RISC-V: KVM: Allow Zcmop extension for Guest/VM
KVM: riscv: selftests: Add Zcmop extension to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 28 ++
.../devicetree/bindings/riscv/extensions.yaml | 95 +++++++
arch/riscv/include/asm/cpufeature.h | 26 +-
arch/riscv/include/asm/hwcap.h | 6 +
arch/riscv/include/uapi/asm/hwprobe.h | 6 +
arch/riscv/include/uapi/asm/kvm.h | 6 +
arch/riscv/kernel/cpufeature.c | 244 ++++++++++++------
arch/riscv/kernel/sys_hwprobe.c | 6 +
arch/riscv/kvm/vcpu_onereg.c | 12 +
.../selftests/kvm/riscv/get-reg-list.c | 24 ++
10 files changed, 366 insertions(+), 87 deletions(-)
--
2.43.0
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they all build
upon each other. Or the DT part can be merged through the DT tree to
reduce the dependencies.
Changes from v3 (https://lore.kernel.org/r/20230327222159.3509818-1-sboyd@kernel.org):
* No longer depend on Frank's series[1] because it was merged upstream[2]
* Use kunit_add_action_or_reset() to shorten code
* Skip tests properly when CONFIG_OF_OVERLAY isn't set
Changes from v2 (https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org):
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
[2] https://lore.kernel.org/r/20240308195737.GA1174908-robh@kernel.org
Stephen Boyd (10):
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 21 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../bindings/test/test,clk-fixed-rate.yaml | 35 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 140 ++++++
drivers/base/test/platform_kunit.c | 174 +++++++
drivers/clk/.kunitconfig | 2 +
drivers/clk/Kconfig | 9 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 377 +++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 198 ++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 451 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 1 +
drivers/of/Kconfig | 10 +
drivers/of/Makefile | 2 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 99 ++++
drivers/of/overlay_test.c | 115 +++++
include/kunit/clk.h | 28 ++
include/kunit/of.h | 94 ++++
include/kunit/platform_device.h | 15 +
30 files changed, 1967 insertions(+), 2 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,clk-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
base-commit: 4cece764965020c22cff7665b18a012006359095
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset contains some fixes and improvements for test_sockmap.
3-5: switching attachments to bpf_link as Jakub suggested in [1].
1-2, 6-8: Small fixes.
[1]
https://lore.kernel.org/bpf/87zfsiw3a3.fsf@cloudflare.com/
Geliang Tang (8):
selftests/bpf: Fix tx_prog_fd values in test_sockmap
selftests/bpf: Drop duplicate definition of i in test_sockmap
selftests/bpf: Use bpf_link attachments in test_sockmap
selftests/bpf: Replace tx_prog_fd with tx_prog in test_sockmap
selftests/bpf: Drop prog_fd array in test_sockmap
selftests/bpf: Fix size of map_fd in test_sockmap
selftests/bpf: Check length of recv in test_sockmap
selftests/bpf: Drop duplicate bpf_map_lookup_elem in test_sockmap
.../selftests/bpf/progs/test_sockmap_kern.h | 3 -
tools/testing/selftests/bpf/test_sockmap.c | 101 +++++++++---------
2 files changed, 51 insertions(+), 53 deletions(-)
--
2.43.0
Adapt the current test-livepatch.sh script to account the number of
applied livepatches and ensure that an atomic replace livepatch disables
all previously applied livepatches.
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
---
Changes since v1:
* Added checks in the existing test-livepatch.sh instead of creating a
new test file. (Joe)
* Fixed issues reported by ShellCheck (Joe)
---
.../testing/selftests/livepatch/test-livepatch.sh | 46 ++++++++++++++++++++--
1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/livepatch/test-livepatch.sh b/tools/testing/selftests/livepatch/test-livepatch.sh
index e3455a6b1158..d85405d18e54 100755
--- a/tools/testing/selftests/livepatch/test-livepatch.sh
+++ b/tools/testing/selftests/livepatch/test-livepatch.sh
@@ -107,9 +107,12 @@ livepatch: '$MOD_LIVEPATCH': unpatching complete
# - load a livepatch that modifies the output from /proc/cmdline and
# verify correct behavior
-# - load an atomic replace livepatch and verify that only the second is active
-# - remove the first livepatch and verify that the atomic replace livepatch
-# is still active
+# - load two addtional livepatches and check the number of livepatch modules
+# applied
+# - load an atomic replace livepatch and check that the other three modules were
+# disabled
+# - remove all livepatches besides the atomic replace one and verify that the
+# atomic replace livepatch is still active
# - remove the atomic replace livepatch and verify that none are active
start_test "atomic replace livepatch"
@@ -119,12 +122,31 @@ load_lp $MOD_LIVEPATCH
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
+for mod in test_klp_syscall test_klp_callbacks_demo; do
+ load_lp $mod
+done
+
+mods=(/sys/kernel/livepatch/*)
+nmods=${#mods[@]}
+if [ "$nmods" -ne 3 ]; then
+ die "Expecting three modules listed, found $nmods"
+fi
+
load_lp $MOD_REPLACE replace=1
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
-unload_lp $MOD_LIVEPATCH
+mods=(/sys/kernel/livepatch/*)
+nmods=${#mods[@]}
+if [ "$nmods" -ne 1 ]; then
+ die "Expecting only one moduled listed, found $nmods"
+fi
+
+# These modules were disabled by the atomic replace
+for mod in test_klp_callbacks_demo test_klp_syscall $MOD_LIVEPATCH; do
+ unload_lp "$mod"
+done
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
@@ -142,6 +164,20 @@ livepatch: '$MOD_LIVEPATCH': starting patching transition
livepatch: '$MOD_LIVEPATCH': completing patching transition
livepatch: '$MOD_LIVEPATCH': patching complete
$MOD_LIVEPATCH: this has been live patched
+% insmod test_modules/test_klp_syscall.ko
+livepatch: enabling patch 'test_klp_syscall'
+livepatch: 'test_klp_syscall': initializing patching transition
+livepatch: 'test_klp_syscall': starting patching transition
+livepatch: 'test_klp_syscall': completing patching transition
+livepatch: 'test_klp_syscall': patching complete
+% insmod test_modules/test_klp_callbacks_demo.ko
+livepatch: enabling patch 'test_klp_callbacks_demo'
+livepatch: 'test_klp_callbacks_demo': initializing patching transition
+test_klp_callbacks_demo: pre_patch_callback: vmlinux
+livepatch: 'test_klp_callbacks_demo': starting patching transition
+livepatch: 'test_klp_callbacks_demo': completing patching transition
+test_klp_callbacks_demo: post_patch_callback: vmlinux
+livepatch: 'test_klp_callbacks_demo': patching complete
% insmod test_modules/$MOD_REPLACE.ko replace=1
livepatch: enabling patch '$MOD_REPLACE'
livepatch: '$MOD_REPLACE': initializing patching transition
@@ -149,6 +185,8 @@ livepatch: '$MOD_REPLACE': starting patching transition
livepatch: '$MOD_REPLACE': completing patching transition
livepatch: '$MOD_REPLACE': patching complete
$MOD_REPLACE: this has been live patched
+% rmmod test_klp_callbacks_demo
+% rmmod test_klp_syscall
% rmmod $MOD_LIVEPATCH
$MOD_REPLACE: this has been live patched
% echo 0 > /sys/kernel/livepatch/$MOD_REPLACE/enabled
---
base-commit: 6d69b6c12fce479fde7bc06f686212451688a102
change-id: 20240525-lp-atomic-replace-90b33ed018dc
Best regards,
--
Marcos Paulo de Souza <mpdesouza(a)suse.com>
Fixed MAC addresses help with debugging as last four bytes identify the
network namespace.
Signed-off-by: Lukasz Majewski <lukma(a)denx.de>
---
tools/testing/selftests/net/hsr/hsr_ping.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/net/hsr/hsr_ping.sh b/tools/testing/selftests/net/hsr/hsr_ping.sh
index 3684b813b0f6..f5d207fc770a 100755
--- a/tools/testing/selftests/net/hsr/hsr_ping.sh
+++ b/tools/testing/selftests/net/hsr/hsr_ping.sh
@@ -152,6 +152,15 @@ setup_hsr_interfaces()
ip -net "$ns3" addr add 100.64.0.3/24 dev hsr3
ip -net "$ns3" addr add dead:beef:1::3/64 dev hsr3 nodad
+ ip -net "$ns1" link set address 00:11:22:00:01:01 dev ns1eth1
+ ip -net "$ns1" link set address 00:11:22:00:01:02 dev ns1eth2
+
+ ip -net "$ns2" link set address 00:11:22:00:02:01 dev ns2eth1
+ ip -net "$ns2" link set address 00:11:22:00:02:02 dev ns2eth2
+
+ ip -net "$ns3" link set address 00:11:22:00:03:01 dev ns3eth1
+ ip -net "$ns3" link set address 00:11:22:00:03:02 dev ns3eth2
+
# All Links up
ip -net "$ns1" link set ns1eth1 up
ip -net "$ns1" link set ns1eth2 up
--
2.20.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The function tracer is tested to see if pid filtering works. Add a test to
test function_graph tracer as well, but only if the function_graph tracer
is enabled for the top level or instance.
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../ftrace/test.d/ftrace/func-filter-pid.tc | 27 +++++++++++++++----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
index 2f7211254529..c6fc9d31a496 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-pid.tc
@@ -14,6 +14,11 @@ if [ ! -f options/function-fork ]; then
echo "no option for function-fork found. Option will not be tested."
fi
+if [ ! -f options/funcgraph-proc ]; then
+ do_funcgraph_proc=0
+ echo "no option for function-fork found. Option will not be tested."
+fi
+
read PID _ < /proc/self/stat
if [ $do_function_fork -eq 1 ]; then
@@ -21,12 +26,18 @@ if [ $do_function_fork -eq 1 ]; then
orig_value=`grep function-fork trace_options`
fi
+if [ $do_funcgraph_proc -eq 1 ]; then
+ orig_value2=`cat options/funcgraph-proc`
+fi
+
do_reset() {
- if [ $do_function_fork -eq 0 ]; then
- return
+ if [ $do_function_fork -eq 1 ]; then
+ echo $orig_value > trace_options
fi
- echo $orig_value > trace_options
+ if [ $do_funcgraph_proc -eq 1 ]; then
+ echo $orig_value2 > options/funcgraph-proc
+ fi
}
fail() { # msg
@@ -36,13 +47,15 @@ fail() { # msg
}
do_test() {
+ TRACER=$1
+
disable_tracing
echo do_execve* > set_ftrace_filter
echo $FUNCTION_FORK >> set_ftrace_filter
echo $PID > set_ftrace_pid
- echo function > current_tracer
+ echo $TRACER > current_tracer
if [ $do_function_fork -eq 1 ]; then
# don't allow children to be traced
@@ -82,7 +95,11 @@ do_test() {
fi
}
-do_test
+do_test function
+if grep -s function_graph available_tracers; then
+ do_test function_graph
+fi
+
do_reset
exit 0
--
2.43.0
From: Matteo Croce <teknoraver(a)meta.com>
Some programs need to know the size of the network buffers to operate
correctly, export the following sysctls read-only in network namespaces:
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
Matteo Croce (2):
net: make net.core.{r,w}mem_{default,max} namespaced
selftests: net: tests net.core.{r,w}mem_{default,max} sysctls in a
netns
changes from v1:
- added SPDX header to test
- rewrite test with more detailed error messages
net/core/sysctl_net_core.c | 75 ++++++++++++---------
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/netns-sysctl.sh | 40 +++++++++++
3 files changed, 83 insertions(+), 33 deletions(-)
create mode 100755 tools/testing/selftests/net/netns-sysctl.sh
--
2.45.1
Add CFcommon.arch for the various arch's need for rcutorture.
According to [1] and [2], this patch
Fixes: a6fda6dab93c ("rcutorture: Tweak kvm options") by moving
x86 specific kernel option CONFIG_HYPERVISOR_GUEST to CFcommon.x86
[1] https://lore.kernel.org/all/20240427005626.1365935-1-zhouzhouyi@gmail.com/
[2] https://lore.kernel.org/all/059d36ce-6453-42be-a31e-895abd35d590@paulmck-la…
Tested in x86_64 and PPC VM of Open Source Lab of Oregon State University.
Signed-off-by: Zhouyi Zhou <zhouzhouyi(a)gmail.com>
---
Hi Paul,
I tried very hard to find in Linux kernel on how to dig out
the x86 specific kernel option CONFIG_HYPERVISOR_GUEST before configcheck.sh
generates ConfigFragment.diags.
I can only find this functionality in scripts/kconfig/conf which travels
the Kconfig hierarchy.
But the output of scripts/kconfig/conf, which is .config
is also one of the input of configcheck.sh:
```
kvm-recheck.sh: configcheck.sh $i/.config $i/ConfigFragment > $i/ConfigFragment.diags 2>&1
```
I feel some logic paradox in it ;-)
So, I pick the simplest way.
One more thing, recent change in include/linux/bitmap.h cause the make
of allmodconfig fail because of warning on both x86 platforms, I am
going to do research on it.
Thank your for your guidance
Zhouyi
--
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 9 +++++++++
tools/testing/selftests/rcutorture/configs/rcu/CFcommon | 1 -
.../selftests/rcutorture/configs/rcu/CFcommon.x86 | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index b33cd8753689..5332224238ba 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -62,6 +62,15 @@ config_override_param () {
}
echo > $T/KcList
+if uname -m | grep -q 86
+# TODO: add other architecture-specific common configuration when needed
+then
+ if test -f $config_dir/CFcommon.x86
+ then
+ config_override_param "$config_dir/CFcommon.x86" KcList\
+ "`cat $config_dir/CFcommon.x86 2> /dev/null`"
+ fi
+fi
config_override_param "$config_dir/CFcommon" KcList "`cat $config_dir/CFcommon 2> /dev/null`"
config_override_param "$config_template" KcList "`cat $config_template 2> /dev/null`"
config_override_param "--gdb options" KcList "$TORTURE_KCONFIG_GDB_ARG"
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
index 0e92d85313aa..cf0387ae5358 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon
@@ -1,6 +1,5 @@
CONFIG_RCU_TORTURE_TEST=y
CONFIG_PRINTK_TIME=y
-CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
CONFIG_KVM_GUEST=y
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86 b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86
new file mode 100644
index 000000000000..2770560d56a0
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFcommon.x86
@@ -0,0 +1 @@
+CONFIG_HYPERVISOR_GUEST=y
--
2.25.1
Hi,
Here's a few fixes that are part of my effort to get all selftests
building cleanly under clang. Plus one that I noticed by inspection.
Changes since v2:
1) Added a sentence to the .PHONY patch, to show that it is removing
duplicate code.
2) Added the actual clang warning output to the commit description.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
2) Added Reviewed-by's.
...and it turns out that all three patches are still required, on -rc1,
in order to get a clean clang build.
Enjoy!
thanks,
John Hubbard
John Hubbard (3):
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
selftests/futex: pass _GNU_SOURCE without a value to the compiler
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
base-commit: b050496579632f86ee1ef7e7501906db579f3457
--
2.45.1
The purpose of this series is to rethink how HID-BPF is invoked.
Currently it implies a jmp table, a prog fd bpf_map, a preloaded tracing
bpf program and a lot of manual work for handling the bpf program
lifetime and addition/removal.
OTOH, bpf_struct_ops take care of most of the bpf handling leaving us
with a simple list of ops pointers, and we can directly call the
struct_ops program from the kernel as a regular function.
The net gain right now is in term of code simplicity and lines of code
removal (though is an API breakage), but udev-hid-bpf is able to handle
such breakages.
In the near future, we will be able to extend the HID-BPF struct_ops
with entrypoints for hid_hw_raw_request() and hid_hw_output_report(),
allowing for covering all of the initial use cases:
- firewalling a HID device
- fixing all of the HID device interactions (not just device events as
it is right now).
The matching user-space loader (udev-hid-bpf) MR is at
https://gitlab.freedesktop.org/libevdev/udev-hid-bpf/-/merge_requests/86
I'll put it out of draft once this is merged.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Benjamin Tissoires (13):
HID: rename struct hid_bpf_ops into hid_ops
HID: bpf: add hid_get/put_device() helpers
HID: bpf: implement HID-BPF through bpf_struct_ops
selftests/hid: convert the hid_bpf selftests with struct_ops
HID: samples: convert the 2 HID-BPF samples into struct_ops
HID: bpf: add defines for HID-BPF SEC in in-tree bpf fixes
HID: bpf: convert in-tree fixes into struct_ops
HID: bpf: remove tracing HID-BPF capability
selftests/hid: add subprog call test
Documentation: HID: amend HID-BPF for struct_ops
Documentation: HID: add a small blurb on udev-hid-bpf
HID: bpf: Artist24: remove unused variable
HID: bpf: error on warnings when compiling bpf objects
Documentation/hid/hid-bpf.rst | 162 +++---
drivers/hid/bpf/Makefile | 2 +-
drivers/hid/bpf/entrypoints/Makefile | 93 ----
drivers/hid/bpf/entrypoints/README | 4 -
drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 25 -
drivers/hid/bpf/entrypoints/entrypoints.lskel.h | 248 ---------
drivers/hid/bpf/hid_bpf_dispatch.c | 266 +++-------
drivers/hid/bpf/hid_bpf_dispatch.h | 12 +-
drivers/hid/bpf/hid_bpf_jmp_table.c | 565 ---------------------
drivers/hid/bpf/hid_bpf_struct_ops.c | 246 +++++++++
drivers/hid/bpf/progs/FR-TEC__Raptor-Mach-2.bpf.c | 9 +-
drivers/hid/bpf/progs/HP__Elite-Presenter.bpf.c | 6 +-
drivers/hid/bpf/progs/Huion__Kamvas-Pro-19.bpf.c | 9 +-
.../hid/bpf/progs/IOGEAR__Kaliber-MMOmentum.bpf.c | 6 +-
drivers/hid/bpf/progs/Makefile | 2 +-
.../hid/bpf/progs/Microsoft__XBox-Elite-2.bpf.c | 6 +-
drivers/hid/bpf/progs/Wacom__ArtPen.bpf.c | 6 +-
drivers/hid/bpf/progs/XPPen__Artist24.bpf.c | 10 +-
drivers/hid/bpf/progs/XPPen__ArtistPro16Gen2.bpf.c | 24 +-
drivers/hid/bpf/progs/hid_bpf.h | 5 +
drivers/hid/hid-core.c | 6 +-
include/linux/hid_bpf.h | 109 ++--
samples/hid/Makefile | 5 +-
samples/hid/hid_bpf_attach.bpf.c | 18 -
samples/hid/hid_bpf_attach.h | 14 -
samples/hid/hid_mouse.bpf.c | 26 +-
samples/hid/hid_mouse.c | 39 +-
samples/hid/hid_surface_dial.bpf.c | 10 +-
samples/hid/hid_surface_dial.c | 53 +-
tools/testing/selftests/hid/hid_bpf.c | 100 +++-
tools/testing/selftests/hid/progs/hid.c | 100 +++-
31 files changed, 744 insertions(+), 1442 deletions(-)
---
base-commit: 70ec81c2e2b4005465ad0d042e90b36087c36104
change-id: 20240513-hid_bpf_struct_ops-e3212a224555
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Here is a couple of patches to fix issues related to runing environment
and kernel configuration.
Thank you,
---
Masami Hiramatsu (Google) (2):
selftests/tracing: Fix event filter test to retry up to 10 times
selftests/tracing: Fix to check the required syscall event
.../ftrace/test.d/dynevent/test_duplicates.tc | 2 +-
.../ftrace/test.d/filter/event-filter-function.tc | 20 +++++++++++++++++++-
2 files changed, 20 insertions(+), 2 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
The kselftests may be built in a couple different ways:
make LLVM=1
make CC=clang
In order to handle both cases, set LLVM=1 if CC=clang. That way,the rest
of lib.mk, and any Makefiles that include lib.mk, can base decisions
solely on whether or not LLVM is set.
Then, build upon that to disable a pair of clang warnings that are
already silenced on gcc.
Doing it this way is much better than the piecemeal approach that I
started with in [1] and [2]. Thanks to Nathan Chancellor for the patch
reviews that led to this approach.
[1] https://lore.kernel.org/20240527214704.300444-1-jhubbard@nvidia.com
[2] https://lore.kernel.org/20240527213641.299458-1-jhubbard@nvidia.com
John Hubbard (2):
selftests/lib.mk: handle both LLVM=1 and CC=clang builds
selftests/lib.mk: silence some clang warnings that gcc already ignores
tools/testing/selftests/lib.mk | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 362 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 64 ++--
8 files changed, 287 insertions(+), 273 deletions(-)
--
2.39.2
Hi,
Just a bunch of build and warnings fixes that show up when
building with clang. Some of these depend on each other, so
I'm sending them as a series.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
Enjoy!
thanks,
John Hubbard
John Hubbard (6):
selftests/x86: build test_FISTTP.c with clang
selftests/x86: build fsgsbase_restore.c with clang
selftests/x86: build sysret_rip.c with clang
selftests/x86: avoid -no-pie warnings from clang during compilation
selftests/x86: remove (or use) unused variables and functions
selftests/x86: fix printk warnings reported by clang
tools/testing/selftests/x86/Makefile | 10 +++++++
tools/testing/selftests/x86/amx.c | 16 -----------
.../testing/selftests/x86/clang_helpers_32.S | 11 ++++++++
.../testing/selftests/x86/clang_helpers_64.S | 28 +++++++++++++++++++
tools/testing/selftests/x86/fsgsbase.c | 6 ----
.../testing/selftests/x86/fsgsbase_restore.c | 11 ++++----
tools/testing/selftests/x86/sigreturn.c | 2 +-
.../testing/selftests/x86/syscall_arg_fault.c | 1 -
tools/testing/selftests/x86/sysret_rip.c | 20 ++++---------
tools/testing/selftests/x86/test_FISTTP.c | 8 +++---
tools/testing/selftests/x86/test_vsyscall.c | 15 ++++------
tools/testing/selftests/x86/vdso_restorer.c | 2 ++
12 files changed, 72 insertions(+), 58 deletions(-)
create mode 100644 tools/testing/selftests/x86/clang_helpers_32.S
create mode 100644 tools/testing/selftests/x86/clang_helpers_64.S
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
prerequisite-patch-id: 39d606b9b165077aa1a3a3b0a3b396dba0c20070
--
2.45.1
Hi,
Here's a few fixes that are part of my effort to get all selftests
building cleanly under clang. Plus one that I noticed by inspection.
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
2) Added Reviewed-by's.
...and it turns out that all three patches are still required, on -rc1,
in order to get a clean clang build.
Enjoy!
thanks,
John Hubbard
John Hubbard (3):
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
selftests/futex: pass _GNU_SOURCE without a value to the compiler
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
base-commit: e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc
--
2.45.1
The MBM (Memory Bandwidth Monitoring) and MBA (Memory Bandwidth Allocation)
features are not enabled for AMD systems. The reason was lack of perf
counters to compare the resctrl test results.
Starting with the commit
25e56847821f ("perf/x86/amd/uncore: Add memory controller support"), AMD
now supports the UMC (Unified Memory Controller) perf events. These events
can be used to compare the test results.
This series adds the support to detect the UMC events and enable MBM/MBA
tests for AMD systems.
Babu Moger (4):
selftests/resctrl: Rename variable imcs and num_of_imcs() to generic
names
selftests/resctrl: Pass sysfs controller name of the vendor
selftests/resctrl: Add support for MBM and MBA tests on AMD
selftests/resctrl: Skip the tests if iMC/UMC counters are unavailable
tools/testing/selftests/resctrl/resctrl.h | 1 +
.../testing/selftests/resctrl/resctrl_tests.c | 16 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 105 ++++++++++++++----
3 files changed, 96 insertions(+), 26 deletions(-)
--
2.34.1
Currently if we request a feature that is not set in the Kernel
config we fail silently and return the available features. However, the
documentation indicates we should return an EINVAL.
We need to fix this issue since we can end up with a Kernel warning
should a program request the feature UFFD_FEATURE_WP_UNPOPULATED on
a kernel with the config not set with this feature.
Signed-off-by: Audra Mitchell <audra(a)redhat.com>
---
fs/userfaultfd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 60dcfafdc11a..17210558de79 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -2073,6 +2073,11 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
#endif
+
+ ret = -EINVAL;
+ if (features & ~uffdio_api.features)
+ goto err_out;
+
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
--
2.44.0
When building with clang via:
make LLVM=1 -C tools/testing/selftests
two distinct failures occur:
1) gcc requires -static-libasan in order to ensure that Address
Sanitizer's library is the first one loaded. However, this leads to
build failures on clang, when building via:
make LLVM=1 -C tools/testing/selftests
However, clang already does the right thing by default: it statically
links the Address Sanitizer if -fsanitize is specified. Therefore, fix
this by simply omitting -static-libasan for clang builds. And leave
behind a comment, because the whole reason for static linking might not
be obvious.
2) clang won't accept invocations of this form, but gcc will:
$(CC) file1.c header2.h
Fix this by using selftests/lib.mk facilities for tracking local header
file dependencies: add them to LOCAL_HDRS, leaving only the .c files to
be passed to the compiler.
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
tools/testing/selftests/openat2/Makefile | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/openat2/Makefile b/tools/testing/selftests/openat2/Makefile
index 254d676a2689..185dc76ebb5f 100644
--- a/tools/testing/selftests/openat2/Makefile
+++ b/tools/testing/selftests/openat2/Makefile
@@ -1,8 +1,18 @@
# SPDX-License-Identifier: GPL-2.0-or-later
-CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined -static-libasan
+CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test
+# gcc requires -static-libasan in order to ensure that Address Sanitizer's
+# library is the first one loaded. However, clang already statically links the
+# Address Sanitizer if -fsanitize is specified. Therefore, simply omit
+# -static-libasan for clang builds.
+ifeq ($(LLVM),)
+ CFLAGS += -static-libasan
+endif
+
+LOCAL_HDRS += helpers.h
+
include ../lib.mk
-$(TEST_GEN_PROGS): helpers.c helpers.h
+$(TEST_GEN_PROGS): helpers.c
base-commit: ddb4c3f25b7b95df3d6932db0b379d768a6ebdf7
prerequisite-patch-id: b901ece2a5b78503e2fb5480f20e304d36a0ea27
--
2.45.0
Fix build error on ppc64:
dev_in_maps.c: In function ‘get_file_dev_and_inode’:
dev_in_maps.c:60:59: error: format ‘%llu’ expects argument of type
‘long long unsigned int *’, but argument 7 has type ‘__u64 *’ {aka ‘long
unsigned int *’} [-Werror=format=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index 759f86e7d263..2862aae58b79 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <inttypes.h>
#include <unistd.h>
--
2.45.1
Fix warnings like:
openat2_test.c: In function ‘test_openat2_flags’:
openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type
‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long
unsigned int’} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/openat2/openat2_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index 9024754530b2..5790ab446527 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -5,6 +5,7 @@
*/
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <fcntl.h>
#include <sched.h>
#include <sys/stat.h>
--
2.45.1
Fix warnings like:
test_cachestat.c: In function ‘print_cachestat’:
test_cachestat.c:30:38: warning: format ‘%llu’ expects argument of
type ‘long long unsigned int’, but argument 2 has type ‘__u64’ {aka
‘long unsigned int’} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/cachestat/test_cachestat.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c
index b171fd53b004..632ab44737ec 100644
--- a/tools/testing/selftests/cachestat/test_cachestat.c
+++ b/tools/testing/selftests/cachestat/test_cachestat.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <stdio.h>
#include <stdbool.h>
--
2.45.1
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
net: page_pool: create hooks for custom page providers
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 230 +++++++++-
include/net/page_pool/helpers.h | 157 +++++--
include/net/page_pool/types.h | 33 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 29 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 376 ++++++++++++++++
net/core/gro.c | 8 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 107 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 364 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 254 ++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 10 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
46 files changed, 2756 insertions(+), 267 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.45.0.118.g7fe29c98d7-goog
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses post_socket_cb and post_connect_cb callbacks of struct
network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp
test dedicated code out of do_test() into test_dctcp().
v5:
- address Martin's comments in v4 (thanks)
- add patch 4, use start_server_str in test_dctcp_fallback too
- ASSERT_* is already used in settcpca, use this helper in cc_cb (patch 3)
and stg_post_socket_cb (patch 6)
- add ASSERT_* in stg_post_socket_cb in patch 6
v4:
- address Martin's comments in v3 (thanks).
- drop 2 patches, keep "type" as the individual arg to start_server_addr,
connect_to_addr and start_server_str.
v3:
- Add 4 new patches, 1-3 are cleanups. 4 adds a new helper.
- address Martin's comments in v2.
v2:
- rebased on commit "selftests/bpf: Add test for the use of new args in
cong_control"
Geliang Tang (7):
selftests/bpf: Drop struct post_socket_opts
selftests/bpf: Add start_server_str helper
selftests/bpf: Use post_socket_cb in connect_to_fd_opts
selftests/bpf: Use post_socket_cb in start_server_str
selftests/bpf: Use start_server_str in do_test in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add post_connect_cb callback
tools/testing/selftests/bpf/network_helpers.c | 39 +++--
tools/testing/selftests/bpf/network_helpers.h | 9 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 153 +++++++++++++-----
.../bpf/prog_tests/sockopt_inherit.c | 2 +-
.../bpf/test_tcp_check_syncookie_user.c | 4 +-
5 files changed, 145 insertions(+), 62 deletions(-)
--
2.43.0
In commit b5b73b26b3ca ("taprio: Fix allowing too small intervals"), a
comparison of user input against length_to_duration(q, ETH_ZLEN) was
introduced, to avoid RCU stalls due to frequent hrtimers.
The implementation of length_to_duration() depends on q->picos_per_byte
being set for the link speed. The blamed commit in the Fixes: tag has
moved this too late, so the checks introduced above are ineffective.
The q->picos_per_byte is zero at parse_taprio_schedule() ->
parse_sched_list() -> parse_sched_entry() -> fill_sched_entry() time.
Move the taprio_set_picos_per_byte() call as one of the first things in
taprio_change(), before the bulk of the netlink attribute parsing is
done. That's because it is needed there.
Add a selftest to make sure the issue doesn't get reintroduced.
Fixes: 09dbdf28f9f9 ("net/sched: taprio: fix calculation of maximum gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
net/sched/sch_taprio.c | 4 +++-
.../tc-testing/tc-tests/qdiscs/taprio.json | 22 +++++++++++++++++++
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 1ab17e8a7260..118915055360 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1848,6 +1848,9 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
}
q->flags = taprio_flags;
+ /* Needed for length_to_duration() during netlink attribute parsing */
+ taprio_set_picos_per_byte(dev, q);
+
err = taprio_parse_mqprio_opt(dev, mqprio, extack, q->flags);
if (err < 0)
return err;
@@ -1907,7 +1910,6 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
if (err < 0)
goto free_sched;
- taprio_set_picos_per_byte(dev, q);
taprio_update_queue_max_sdu(q, new_admin, stab);
if (FULL_OFFLOAD_IS_ENABLED(q->flags))
diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
index 12da0a939e3e..8f12f00a4f57 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json
@@ -132,6 +132,28 @@
"echo \"1\" > /sys/bus/netdevsim/del_device"
]
},
+ {
+ "id": "6f62",
+ "name": "Add taprio Qdisc with too short interval",
+ "category": [
+ "qdisc",
+ "taprio"
+ ],
+ "plugins": {
+ "requires": "nsPlugin"
+ },
+ "setup": [
+ "echo \"1 1 8\" > /sys/bus/netdevsim/new_device"
+ ],
+ "cmdUnderTest": "$TC qdisc add dev $ETH root handle 1: taprio num_tc 2 queues 1@0 1@1 sched-entry S 01 300 sched-entry S 02 1700 clockid CLOCK_TAI",
+ "expExitCode": "2",
+ "verifyCmd": "$TC qdisc show dev $ETH",
+ "matchPattern": "qdisc taprio 1: root refcnt",
+ "matchCount": "0",
+ "teardown": [
+ "echo \"1\" > /sys/bus/netdevsim/del_device"
+ ]
+ },
{
"id": "3e1e",
"name": "Add taprio Qdisc with an invalid cycle-time",
--
2.34.1
Hi,
Here's a few fixes that are part of my effort to get all selftests
building cleanly under clang. Plus one that I noticed by inspection.
Enjoy!
thanks,
John Hubbard
John Hubbard (3):
selftests/futex: don't redefine .PHONY targets (all, clean)
selftests/futex: don't pass a const char* to asprintf(3)
selftests/futex: pass _GNU_SOURCE without a value to the compiler
tools/testing/selftests/futex/Makefile | 2 --
tools/testing/selftests/futex/functional/Makefile | 2 +-
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
base-commit: f03359bca01bf4372cf2c118cd9a987a5951b1c8
prerequisite-patch-id: b901ece2a5b78503e2fb5480f20e304d36a0ea27
--
2.45.0
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns that "a variable sized type not at the end of a struct or
class is a GNU extension".
These cases are not easily changed, because they involve structs that
are part of the API. Fortunately, however, the tests seem to be doing
just fine (specifically, neither affected test runs any differently with
gcc vs. clang builds, on my test system) regardless of the warning. So,
all the warning is doing is preventing a clean build of selftests/net.
Fix this by suppressing this particular clang warning for the
selftests/net suite.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
thanks,
John Hubbard
tools/testing/selftests/net/Makefile | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index bd01e4a0be2c..9a3b766c8781 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,6 +6,10 @@ CFLAGS += -I../../../../usr/include/ $(KHDR_INCLUDES)
# Additional include paths needed by kselftest.h
CFLAGS += -I../
+ifneq ($(LLVM),)
+ CFLAGS += -Wno-gnu-variable-sized-type-not-at-end
+endif
+
TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh \
rtnetlink.sh xfrm_policy.sh test_blackhole_dev.sh
TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns about "taking address of packed member 'write_index' ".
This is not particularly helpful, because the test code really wants to
write to exactly this location, and if it ends up being unaligned, then
the test won't work (and may segfault, depending on the CPU type).
If that ever comes up, it will be obvious and can be fixed. But all it's
doing now is prevent a clean clang build, so disable the warning.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
thanks,
John Hubbard
tools/testing/selftests/user_events/Makefile | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/user_events/Makefile b/tools/testing/selftests/user_events/Makefile
index 10fcd0066203..617e94344711 100644
--- a/tools/testing/selftests/user_events/Makefile
+++ b/tools/testing/selftests/user_events/Makefile
@@ -1,5 +1,10 @@
# SPDX-License-Identifier: GPL-2.0
CFLAGS += -Wl,-no-as-needed -Wall $(KHDR_INCLUDES)
+
+ifneq ($(LLVM),)
+ CFLAGS += -Wno-address-of-packed-member
+endif
+
LDLIBS += -lrt -lpthread -lm
TEST_GEN_PROGS = ftrace_test dyn_test perf_test abi_test
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
Some subtests can be unstable, failing once every X runs. Fixing them
can take time: there could be an issue in the kernel or in the subtest,
and it is then important to do a proper analysis, not to hide real bugs.
To avoid creating noises on the different CIs where tests are more
unstable than on our side, some subtests have been marked as flaky. As a
result, errors with these subtests (if any) are ignored.
Note that the MPTCP CI will continue to track these flaky subtests. All
these unstable subtests are also tracked by our bug tracker.
These are fixes for the -net tree, because the instabilities are visible
there. The first patch introducing the flake support has no 'Fixes'
tags, mainly because it requires recent and important refactoring done
in all MPTCP selftests. Backporting that to old versions where the flaky
tests have been introduced would be too difficult, and probably not
worth it. The other patches, adding MPTCP_LIB_SUBTEST_FLAKY=1, have a
Fixes tag, simply to ease the backport of the future fixes removing them
along with the proper fix.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (4):
selftests: mptcp: lib: support flaky subtests
selftests: mptcp: simult flows: mark 'unbalanced' tests as flaky
selftests: mptcp: join: mark 'fastclose' tests as flaky
selftests: mptcp: join: mark 'fail' tests as flaky
tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +++++++-
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 30 +++++++++++++++++++++--
tools/testing/selftests/net/mptcp/simult_flows.sh | 6 ++---
3 files changed, 40 insertions(+), 6 deletions(-)
---
base-commit: 0b4f5add9fa59bfd42c1030f572db2e4c395181b
change-id: 20240524-upstream-net-20240524-selftests-mptcp-flaky-5b6836a59f72
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
When building with clang, via:
make LLVM=1 -C tools/testing/selftest
...clang warns about several cases of using a signed integer for the
priority argument to mq_receive(3), which expects an unsigned int.
Fix this by declaring the type as unsigned int in all cases.
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Changes since the first version:
1) Rebased onto Linux 6.10-rc1
2) Reviewed-by's added.
thanks,
John Hubbard
tools/testing/selftests/mqueue/mq_perf_tests.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mqueue/mq_perf_tests.c b/tools/testing/selftests/mqueue/mq_perf_tests.c
index 5c16159d0bcd..fb898850867c 100644
--- a/tools/testing/selftests/mqueue/mq_perf_tests.c
+++ b/tools/testing/selftests/mqueue/mq_perf_tests.c
@@ -323,7 +323,8 @@ void *fake_cont_thread(void *arg)
void *cont_thread(void *arg)
{
char buff[MSG_SIZE];
- int i, priority;
+ int i;
+ unsigned int priority;
for (i = 0; i < num_cpus_to_pin; i++)
if (cpu_threads[i] == pthread_self())
@@ -425,7 +426,8 @@ struct test test2[] = {
void *perf_test_thread(void *arg)
{
char buff[MSG_SIZE];
- int prio_out, prio_in;
+ int prio_out;
+ unsigned int prio_in;
int i;
clockid_t clock;
pthread_t *t;
base-commit: 2bfcfd584ff5ccc8bb7acde19b42570414bf880b
--
2.45.1
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7c76b841b17bb..21bde60c95230 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -183,6 +192,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7c76b841b17bb..21bde60c95230 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -183,6 +192,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 088fcad138c98..38c6e9f16f41e 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -71,7 +71,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -92,6 +91,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -189,6 +198,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -200,6 +210,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 334bdfeab9403..365a2c7a89bad 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -190,6 +199,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -201,6 +211,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5 ]
Recently, I frequently hit the following test failure:
[root@arch-fb-vm1 bpf]# ./test_progs -n 33/1
test_lookup_update:PASS:skel_open 0 nsec
[...]
test_lookup_update:PASS:sync_rcu 0 nsec
test_lookup_update:FAIL:map1_leak inner_map1 leaked!
#33/1 btf_map_in_map/lookup_update:FAIL
#33 btf_map_in_map:FAIL
In the test, after map is closed and then after two rcu grace periods,
it is assumed that map_id is not available to user space.
But the above assumption cannot be guaranteed. After zero or one
or two rcu grace periods in different siturations, the actual
freeing-map-work is put into a workqueue. Later on, when the work
is dequeued, the map will be actually freed.
See bpf_map_put() in kernel/bpf/syscall.c.
By using workqueue, there is no ganrantee that map will be actually
freed after a couple of rcu grace periods. This patch removed
such map leak detection and then the test can pass consistently.
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/prog_tests/btf_map_in_map.c | 26 +------------------
1 file changed, 1 insertion(+), 25 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
index a8b53b8736f01..f66ceccd7029c 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
@@ -25,7 +25,7 @@ static void test_lookup_update(void)
int map1_fd, map2_fd, map3_fd, map4_fd, map5_fd, map1_id, map2_id;
int outer_arr_fd, outer_hash_fd, outer_arr_dyn_fd;
struct test_btf_map_in_map *skel;
- int err, key = 0, val, i, fd;
+ int err, key = 0, val, i;
skel = test_btf_map_in_map__open_and_load();
if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n"))
@@ -102,30 +102,6 @@ static void test_lookup_update(void)
CHECK(map1_id == 0, "map1_id", "failed to get ID 1\n");
CHECK(map2_id == 0, "map2_id", "failed to get ID 2\n");
- test_btf_map_in_map__destroy(skel);
- skel = NULL;
-
- /* we need to either wait for or force synchronize_rcu(), before
- * checking for "still exists" condition, otherwise map could still be
- * resolvable by ID, causing false positives.
- *
- * Older kernels (5.8 and earlier) freed map only after two
- * synchronize_rcu()s, so trigger two, to be entirely sure.
- */
- CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
- CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
-
- fd = bpf_map_get_fd_by_id(map1_id);
- if (CHECK(fd >= 0, "map1_leak", "inner_map1 leaked!\n")) {
- close(fd);
- goto cleanup;
- }
- fd = bpf_map_get_fd_by_id(map2_id);
- if (CHECK(fd >= 0, "map2_leak", "inner_map2 leaked!\n")) {
- close(fd);
- goto cleanup;
- }
-
cleanup:
test_btf_map_in_map__destroy(skel);
}
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908a..7989ec6084545 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit 2d3b8dfd82d76b1295167c6453d683ab99e50794 ]
On slow machines the SND timestamp sometimes doesn't arrive before
we quit. The test only waits as long as the packet delay, so it's
easy for a race condition to happen.
Double the wait but do a bit of polling, once the SND timestamp
arrives there's no point to wait any longer.
This fixes the "TXTIME abs" failures on debug kernels, like:
Case ICMPv4 - TXTIME abs returned '', expected 'OK'
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Link: https://lore.kernel.org/r/20240510005705.43069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/cmsg_sender.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index c79e65581dc37..161db24e3c409 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -333,16 +333,17 @@ static const char *cs_ts_info2str(unsigned int info)
return "unknown";
}
-static void
+static unsigned long
cs_read_cmsg(int fd, struct msghdr *msg, char *cbuf, size_t cbuf_sz)
{
struct sock_extended_err *see;
struct scm_timestamping *ts;
+ unsigned long ts_seen = 0;
struct cmsghdr *cmsg;
int i, err;
if (!opt.ts.ena)
- return;
+ return 0;
msg->msg_control = cbuf;
msg->msg_controllen = cbuf_sz;
@@ -396,8 +397,11 @@ cs_read_cmsg(int fd, struct msghdr *msg, char *cbuf, size_t cbuf_sz)
printf(" %5s ts%d %lluus\n",
cs_ts_info2str(see->ee_info),
i, rel_time);
+ ts_seen |= 1 << see->ee_info;
}
}
+
+ return ts_seen;
}
static void ca_set_sockopts(int fd)
@@ -509,10 +513,16 @@ int main(int argc, char *argv[])
err = ERN_SUCCESS;
if (opt.ts.ena) {
- /* Make sure all timestamps have time to loop back */
- usleep(opt.txtime.delay);
+ unsigned long seen;
+ int i;
- cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ /* Make sure all timestamps have time to loop back */
+ for (i = 0; i < 40; i++) {
+ seen = cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ if (seen & (1 << SCM_TSTAMP_SND))
+ break;
+ usleep(opt.txtime.delay / 20);
+ }
}
err_out:
--
2.43.0
From: "Jose E. Marchesi" <jose.marchesi(a)oracle.com>
[ Upstream commit cd3fc3b9782130a5bc1dc3dfccffbc1657637a93 ]
[Changes from V1:
- The warning to disable is -Wmaybe-uninitialized, not -Wuninitialized.
- This warning is only supported in GCC.]
The BPF selftest verifier_global_subprogs.c contains code that
purposedly performs out of bounds access to memory, to check whether
the kernel verifier is able to catch them. For example:
__noinline int global_unsupp(const int *mem)
{
if (!mem)
return 0;
return mem[100]; /* BOOM */
}
With -O1 and higher and no inlining, GCC notices this fact and emits a
"maybe uninitialized" warning. This is by design. Note that the
emission of these warnings is highly dependent on the precise
optimizations that are performed.
This patch adds a compiler pragma to verifier_global_subprogs.c to
ignore these warnings.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi(a)oracle.com>
Cc: david.faust(a)oracle.com
Cc: cupertino.miranda(a)oracle.com
Cc: Yonghong Song <yonghong.song(a)linux.dev>
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Link: https://lore.kernel.org/r/20240507184756.1772-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../testing/selftests/bpf/progs/verifier_global_subprogs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c b/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
index 67dddd9418911..27f4b2da131b1 100644
--- a/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
+++ b/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
@@ -8,6 +8,13 @@
#include "xdp_metadata.h"
#include "bpf_kfuncs.h"
+/* The compiler may be able to detect the access to uninitialized
+ memory in the routines performing out of bound memory accesses and
+ emit warnings about it. This is the case of GCC. */
+#if !defined(__clang__)
+#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+#endif
+
int arr[1];
int unkn_idx;
const volatile bool call_dead_subprog = false;
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5 ]
Recently, I frequently hit the following test failure:
[root@arch-fb-vm1 bpf]# ./test_progs -n 33/1
test_lookup_update:PASS:skel_open 0 nsec
[...]
test_lookup_update:PASS:sync_rcu 0 nsec
test_lookup_update:FAIL:map1_leak inner_map1 leaked!
#33/1 btf_map_in_map/lookup_update:FAIL
#33 btf_map_in_map:FAIL
In the test, after map is closed and then after two rcu grace periods,
it is assumed that map_id is not available to user space.
But the above assumption cannot be guaranteed. After zero or one
or two rcu grace periods in different siturations, the actual
freeing-map-work is put into a workqueue. Later on, when the work
is dequeued, the map will be actually freed.
See bpf_map_put() in kernel/bpf/syscall.c.
By using workqueue, there is no ganrantee that map will be actually
freed after a couple of rcu grace periods. This patch removed
such map leak detection and then the test can pass consistently.
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/prog_tests/btf_map_in_map.c | 26 +------------------
1 file changed, 1 insertion(+), 25 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
index a8b53b8736f01..f66ceccd7029c 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
@@ -25,7 +25,7 @@ static void test_lookup_update(void)
int map1_fd, map2_fd, map3_fd, map4_fd, map5_fd, map1_id, map2_id;
int outer_arr_fd, outer_hash_fd, outer_arr_dyn_fd;
struct test_btf_map_in_map *skel;
- int err, key = 0, val, i, fd;
+ int err, key = 0, val, i;
skel = test_btf_map_in_map__open_and_load();
if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n"))
@@ -102,30 +102,6 @@ static void test_lookup_update(void)
CHECK(map1_id == 0, "map1_id", "failed to get ID 1\n");
CHECK(map2_id == 0, "map2_id", "failed to get ID 2\n");
- test_btf_map_in_map__destroy(skel);
- skel = NULL;
-
- /* we need to either wait for or force synchronize_rcu(), before
- * checking for "still exists" condition, otherwise map could still be
- * resolvable by ID, causing false positives.
- *
- * Older kernels (5.8 and earlier) freed map only after two
- * synchronize_rcu()s, so trigger two, to be entirely sure.
- */
- CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
- CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
-
- fd = bpf_map_get_fd_by_id(map1_id);
- if (CHECK(fd >= 0, "map1_leak", "inner_map1 leaked!\n")) {
- close(fd);
- goto cleanup;
- }
- fd = bpf_map_get_fd_by_id(map2_id);
- if (CHECK(fd >= 0, "map2_leak", "inner_map2 leaked!\n")) {
- close(fd);
- goto cleanup;
- }
-
cleanup:
test_btf_map_in_map__destroy(skel);
}
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908a..7989ec6084545 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit 2d3b8dfd82d76b1295167c6453d683ab99e50794 ]
On slow machines the SND timestamp sometimes doesn't arrive before
we quit. The test only waits as long as the packet delay, so it's
easy for a race condition to happen.
Double the wait but do a bit of polling, once the SND timestamp
arrives there's no point to wait any longer.
This fixes the "TXTIME abs" failures on debug kernels, like:
Case ICMPv4 - TXTIME abs returned '', expected 'OK'
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Link: https://lore.kernel.org/r/20240510005705.43069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/cmsg_sender.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index c79e65581dc37..161db24e3c409 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -333,16 +333,17 @@ static const char *cs_ts_info2str(unsigned int info)
return "unknown";
}
-static void
+static unsigned long
cs_read_cmsg(int fd, struct msghdr *msg, char *cbuf, size_t cbuf_sz)
{
struct sock_extended_err *see;
struct scm_timestamping *ts;
+ unsigned long ts_seen = 0;
struct cmsghdr *cmsg;
int i, err;
if (!opt.ts.ena)
- return;
+ return 0;
msg->msg_control = cbuf;
msg->msg_controllen = cbuf_sz;
@@ -396,8 +397,11 @@ cs_read_cmsg(int fd, struct msghdr *msg, char *cbuf, size_t cbuf_sz)
printf(" %5s ts%d %lluus\n",
cs_ts_info2str(see->ee_info),
i, rel_time);
+ ts_seen |= 1 << see->ee_info;
}
}
+
+ return ts_seen;
}
static void ca_set_sockopts(int fd)
@@ -509,10 +513,16 @@ int main(int argc, char *argv[])
err = ERN_SUCCESS;
if (opt.ts.ena) {
- /* Make sure all timestamps have time to loop back */
- usleep(opt.txtime.delay);
+ unsigned long seen;
+ int i;
- cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ /* Make sure all timestamps have time to loop back */
+ for (i = 0; i < 40; i++) {
+ seen = cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ if (seen & (1 << SCM_TSTAMP_SND))
+ break;
+ usleep(opt.txtime.delay / 20);
+ }
}
err_out:
--
2.43.0
From: "Jose E. Marchesi" <jose.marchesi(a)oracle.com>
[ Upstream commit cd3fc3b9782130a5bc1dc3dfccffbc1657637a93 ]
[Changes from V1:
- The warning to disable is -Wmaybe-uninitialized, not -Wuninitialized.
- This warning is only supported in GCC.]
The BPF selftest verifier_global_subprogs.c contains code that
purposedly performs out of bounds access to memory, to check whether
the kernel verifier is able to catch them. For example:
__noinline int global_unsupp(const int *mem)
{
if (!mem)
return 0;
return mem[100]; /* BOOM */
}
With -O1 and higher and no inlining, GCC notices this fact and emits a
"maybe uninitialized" warning. This is by design. Note that the
emission of these warnings is highly dependent on the precise
optimizations that are performed.
This patch adds a compiler pragma to verifier_global_subprogs.c to
ignore these warnings.
Tested in bpf-next master.
No regressions.
Signed-off-by: Jose E. Marchesi <jose.marchesi(a)oracle.com>
Cc: david.faust(a)oracle.com
Cc: cupertino.miranda(a)oracle.com
Cc: Yonghong Song <yonghong.song(a)linux.dev>
Cc: Eduard Zingerman <eddyz87(a)gmail.com>
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Link: https://lore.kernel.org/r/20240507184756.1772-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../testing/selftests/bpf/progs/verifier_global_subprogs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c b/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
index baff5ffe94051..a9fc30ed4d732 100644
--- a/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
+++ b/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
@@ -8,6 +8,13 @@
#include "xdp_metadata.h"
#include "bpf_kfuncs.h"
+/* The compiler may be able to detect the access to uninitialized
+ memory in the routines performing out of bound memory accesses and
+ emit warnings about it. This is the case of GCC. */
+#if !defined(__clang__)
+#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
+#endif
+
int arr[1];
int unkn_idx;
const volatile bool call_dead_subprog = false;
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5 ]
Recently, I frequently hit the following test failure:
[root@arch-fb-vm1 bpf]# ./test_progs -n 33/1
test_lookup_update:PASS:skel_open 0 nsec
[...]
test_lookup_update:PASS:sync_rcu 0 nsec
test_lookup_update:FAIL:map1_leak inner_map1 leaked!
#33/1 btf_map_in_map/lookup_update:FAIL
#33 btf_map_in_map:FAIL
In the test, after map is closed and then after two rcu grace periods,
it is assumed that map_id is not available to user space.
But the above assumption cannot be guaranteed. After zero or one
or two rcu grace periods in different siturations, the actual
freeing-map-work is put into a workqueue. Later on, when the work
is dequeued, the map will be actually freed.
See bpf_map_put() in kernel/bpf/syscall.c.
By using workqueue, there is no ganrantee that map will be actually
freed after a couple of rcu grace periods. This patch removed
such map leak detection and then the test can pass consistently.
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/prog_tests/btf_map_in_map.c | 26 +------------------
1 file changed, 1 insertion(+), 25 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
index a8b53b8736f01..f66ceccd7029c 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c
@@ -25,7 +25,7 @@ static void test_lookup_update(void)
int map1_fd, map2_fd, map3_fd, map4_fd, map5_fd, map1_id, map2_id;
int outer_arr_fd, outer_hash_fd, outer_arr_dyn_fd;
struct test_btf_map_in_map *skel;
- int err, key = 0, val, i, fd;
+ int err, key = 0, val, i;
skel = test_btf_map_in_map__open_and_load();
if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n"))
@@ -102,30 +102,6 @@ static void test_lookup_update(void)
CHECK(map1_id == 0, "map1_id", "failed to get ID 1\n");
CHECK(map2_id == 0, "map2_id", "failed to get ID 2\n");
- test_btf_map_in_map__destroy(skel);
- skel = NULL;
-
- /* we need to either wait for or force synchronize_rcu(), before
- * checking for "still exists" condition, otherwise map could still be
- * resolvable by ID, causing false positives.
- *
- * Older kernels (5.8 and earlier) freed map only after two
- * synchronize_rcu()s, so trigger two, to be entirely sure.
- */
- CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
- CHECK(kern_sync_rcu(), "sync_rcu", "failed\n");
-
- fd = bpf_map_get_fd_by_id(map1_id);
- if (CHECK(fd >= 0, "map1_leak", "inner_map1 leaked!\n")) {
- close(fd);
- goto cleanup;
- }
- fd = bpf_map_get_fd_by_id(map2_id);
- if (CHECK(fd >= 0, "map2_leak", "inner_map2 leaked!\n")) {
- close(fd);
- goto cleanup;
- }
-
cleanup:
test_btf_map_in_map__destroy(skel);
}
--
2.43.0
From: "Alessandro Carminati (Red Hat)" <alessandro.carminati(a)gmail.com>
[ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ]
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gm…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908a..7989ec6084545 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}
client_connect() {
@@ -93,6 +92,16 @@ verify_data() {
fi
}
+wait_for_port() {
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e
# no arguments: automated test, run all
@@ -193,6 +202,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data
@@ -204,6 +214,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect
if [[ "$tuntype" =~ "udp" ]]; then
--
2.43.0
Add several new test cases which assert corner cases on the eventfd
mechanism, for example, the supplied buffer is less than 8 bytes,
attempting to write a value that is too large, etc.
./eventfd_test
# Starting 9 tests from 1 test cases.
# RUN global.eventfd_check_flag_rdwr ...
# OK global.eventfd_check_flag_rdwr
ok 1 global.eventfd_check_flag_rdwr
# RUN global.eventfd_check_flag_cloexec ...
# OK global.eventfd_check_flag_cloexec
ok 2 global.eventfd_check_flag_cloexec
# RUN global.eventfd_check_flag_nonblock ...
# OK global.eventfd_check_flag_nonblock
ok 3 global.eventfd_check_flag_nonblock
# RUN global.eventfd_chek_flag_cloexec_and_nonblock ...
# OK global.eventfd_chek_flag_cloexec_and_nonblock
ok 4 global.eventfd_chek_flag_cloexec_and_nonblock
# RUN global.eventfd_check_flag_semaphore ...
# OK global.eventfd_check_flag_semaphore
ok 5 global.eventfd_check_flag_semaphore
# RUN global.eventfd_check_write ...
# OK global.eventfd_check_write
ok 6 global.eventfd_check_write
# RUN global.eventfd_check_read ...
# OK global.eventfd_check_read
ok 7 global.eventfd_check_read
# RUN global.eventfd_check_read_with_nonsemaphore ...
# OK global.eventfd_check_read_with_nonsemaphore
ok 8 global.eventfd_check_read_with_nonsemaphore
# RUN global.eventfd_check_read_with_semaphore ...
# OK global.eventfd_check_read_with_semaphore
ok 9 global.eventfd_check_read_with_semaphore
# PASSED: 9 / 9 tests passed.
# Totals: pass:9 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Wen Yang <wen.yang(a)linux.dev>
Cc: SShuah Khan <shuah(a)kernel.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Tim Bird <tim.bird(a)sony.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
v2: use strings which indicate what is being tested, that are useful to a human
.../filesystems/eventfd/eventfd_test.c | 136 +++++++++++++++++-
1 file changed, 131 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/filesystems/eventfd/eventfd_test.c b/tools/testing/selftests/filesystems/eventfd/eventfd_test.c
index f142a137526c..85acb4e3ef00 100644
--- a/tools/testing/selftests/filesystems/eventfd/eventfd_test.c
+++ b/tools/testing/selftests/filesystems/eventfd/eventfd_test.c
@@ -13,6 +13,8 @@
#include <sys/eventfd.h>
#include "../../kselftest_harness.h"
+#define EVENTFD_TEST_ITERATIONS 100000UL
+
struct error {
int code;
char msg[512];
@@ -40,7 +42,7 @@ static inline int sys_eventfd2(unsigned int count, int flags)
return syscall(__NR_eventfd2, count, flags);
}
-TEST(eventfd01)
+TEST(eventfd_check_flag_rdwr)
{
int fd, flags;
@@ -54,7 +56,7 @@ TEST(eventfd01)
close(fd);
}
-TEST(eventfd02)
+TEST(eventfd_check_flag_cloexec)
{
int fd, flags;
@@ -68,7 +70,7 @@ TEST(eventfd02)
close(fd);
}
-TEST(eventfd03)
+TEST(eventfd_check_flag_nonblock)
{
int fd, flags;
@@ -83,7 +85,7 @@ TEST(eventfd03)
close(fd);
}
-TEST(eventfd04)
+TEST(eventfd_chek_flag_cloexec_and_nonblock)
{
int fd, flags;
@@ -161,7 +163,7 @@ static int verify_fdinfo(int fd, struct error *err, const char *prefix,
return 0;
}
-TEST(eventfd05)
+TEST(eventfd_check_flag_semaphore)
{
struct error err = {0};
int fd, ret;
@@ -183,4 +185,128 @@ TEST(eventfd05)
close(fd);
}
+/*
+ * A write(2) fails with the error EINVAL if the size of the supplied buffer
+ * is less than 8 bytes, or if an attempt is made to write the value
+ * 0xffffffffffffffff.
+ */
+TEST(eventfd_check_write)
+{
+ uint64_t value = 1;
+ ssize_t size;
+ int fd;
+
+ fd = sys_eventfd2(0, 0);
+ ASSERT_GE(fd, 0);
+
+ size = write(fd, &value, sizeof(int));
+ EXPECT_EQ(size, -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ size = write(fd, &value, sizeof(value));
+ EXPECT_EQ(size, sizeof(value));
+
+ value = (uint64_t)-1;
+ size = write(fd, &value, sizeof(value));
+ EXPECT_EQ(size, -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ close(fd);
+}
+
+/*
+ * A read(2) fails with the error EINVAL if the size of the supplied buffer is
+ * less than 8 bytes.
+ */
+TEST(eventfd_check_read)
+{
+ uint64_t value;
+ ssize_t size;
+ int fd;
+
+ fd = sys_eventfd2(1, 0);
+ ASSERT_GE(fd, 0);
+
+ size = read(fd, &value, sizeof(int));
+ EXPECT_EQ(size, -1);
+ EXPECT_EQ(errno, EINVAL);
+
+ size = read(fd, &value, sizeof(value));
+ EXPECT_EQ(size, sizeof(value));
+ EXPECT_EQ(value, 1);
+
+ close(fd);
+}
+
+
+/*
+ * If EFD_SEMAPHORE was not specified and the eventfd counter has a nonzero
+ * value, then a read(2) returns 8 bytes containing that value, and the
+ * counter's value is reset to zero.
+ * If the eventfd counter is zero at the time of the call to read(2), then the
+ * call fails with the error EAGAIN if the file descriptor has been made nonblocking.
+ */
+TEST(eventfd_check_read_with_nonsemaphore)
+{
+ uint64_t value;
+ ssize_t size;
+ int fd;
+ int i;
+
+ fd = sys_eventfd2(0, EFD_NONBLOCK);
+ ASSERT_GE(fd, 0);
+
+ value = 1;
+ for (i = 0; i < EVENTFD_TEST_ITERATIONS; i++) {
+ size = write(fd, &value, sizeof(value));
+ EXPECT_EQ(size, sizeof(value));
+ }
+
+ size = read(fd, &value, sizeof(value));
+ EXPECT_EQ(size, sizeof(uint64_t));
+ EXPECT_EQ(value, EVENTFD_TEST_ITERATIONS);
+
+ size = read(fd, &value, sizeof(value));
+ EXPECT_EQ(size, -1);
+ EXPECT_EQ(errno, EAGAIN);
+
+ close(fd);
+}
+
+/*
+ * If EFD_SEMAPHORE was specified and the eventfd counter has a nonzero value,
+ * then a read(2) returns 8 bytes containing the value 1, and the counter's
+ * value is decremented by 1.
+ * If the eventfd counter is zero at the time of the call to read(2), then the
+ * call fails with the error EAGAIN if the file descriptor has been made nonblocking.
+ */
+TEST(eventfd_check_read_with_semaphore)
+{
+ uint64_t value;
+ ssize_t size;
+ int fd;
+ int i;
+
+ fd = sys_eventfd2(0, EFD_SEMAPHORE|EFD_NONBLOCK);
+ ASSERT_GE(fd, 0);
+
+ value = 1;
+ for (i = 0; i < EVENTFD_TEST_ITERATIONS; i++) {
+ size = write(fd, &value, sizeof(value));
+ EXPECT_EQ(size, sizeof(value));
+ }
+
+ for (i = 0; i < EVENTFD_TEST_ITERATIONS; i++) {
+ size = read(fd, &value, sizeof(value));
+ EXPECT_EQ(size, sizeof(value));
+ EXPECT_EQ(value, 1);
+ }
+
+ size = read(fd, &value, sizeof(value));
+ EXPECT_EQ(size, -1);
+ EXPECT_EQ(errno, EAGAIN);
+
+ close(fd);
+}
+
TEST_HARNESS_MAIN
--
2.25.1
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index eb5f39a2668b..87d8130b3376 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -71,6 +71,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..6f6587c7913c
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
+ if (fd < 0)
+ ksft_exit_fail_msg("Error opening file");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE
+ | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_msg("Error mapping memory");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_msg("Error writing to file");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ printf("No. Free pages before allocation : %d\n", free_hpage_b);
+ printf("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ printf("No. Free pages before allocation : %d\n", free_hpage_b);
+ printf("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+ return 0;
+}
+
--
2.39.3
Here is a couple of patches to fix some issues related to kconfig.
I found these issues when I built the kernel with
tools/testing/selftests/ftrace/config.
Thank you,
---
Masami Hiramatsu (Google) (2):
selftests/ftrace: Fix to check required event file
selftests/ftrace: Update required config
tools/testing/selftests/ftrace/config | 26 +++++++++++++++-----
.../ftrace/test.d/dynevent/test_duplicates.tc | 2 +-
2 files changed, 20 insertions(+), 8 deletions(-)
--
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses post_socket_cb and post_connect_cb callbacks of struct
network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp
test dedicated code out of do_test() into test_dctcp().
v4:
- address Martin's comments in v3 (thanks).
- drop 2 patches, keep "type" as the individual arg to start_server_addr,
connect_to_addr and start_server_str.
v3:
- Add 4 new patches, 1-3 are cleanups. 4 adds a new helper.
- address Martin's comments in v2.
v2:
- rebased on commit "selftests/bpf: Add test for the use of new args in
cong_control"
Geliang Tang (6):
selftests/bpf: Drop struct post_socket_opts
selftests/bpf: Add start_server_str helper
selftests/bpf: Use post_socket_cb in connect_to_fd_opts
selftests/bpf: Use start_server_str in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add post_connect_cb callback
tools/testing/selftests/bpf/network_helpers.c | 39 +++--
tools/testing/selftests/bpf/network_helpers.h | 9 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 138 +++++++++++++-----
.../bpf/prog_tests/sockopt_inherit.c | 2 +-
.../bpf/test_tcp_check_syncookie_user.c | 4 +-
5 files changed, 133 insertions(+), 59 deletions(-)
--
2.43.0
Sending out v3 for cpu assisted riscv user mode control flow integrity.
v2 [9] was sent a week ago for this riscv usermode control flow integrity
enabling. RFC patchset was (v1) early this year (January) [7].
changes in v3
--------------
envcfg:
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
dt-bindings:
As suggested, split into separate commit. fixed the messaging that spec is
in public review
arch_is_shadow_stack change:
arch_is_shadow_stack changed to vma_is_shadow_stack
hwprobe:
zicfiss / zicfilp if present will get enumerated in hwprobe
selftests:
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
changes in v2
---------------
As part of testing effort, compiled a rootfs with shadow stack and landing
pad enabled (libraries and binaries) and booted to shell. As part of long
running tests, I have been able to run some spec 2006 benchmarks [8] (here
link is provided only for list of benchmarks that were tested for long
running tests, excel sheet provided here actually is for some static stats
like code size growth on spec binaries). Thus converting from RFC to
regular patchset.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach
reach a target that they actually intend to reach.
- Securing backward control flow : All function returns must
return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward
control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
enforces that all indirect calls or jmps must land on a landing pad instr
and label embedded in landing pad instr must match a value programmed in
`x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
which can only be writeable via shadow stack instructions (sspush and
ssamoswap) and thus can't be tampered with via inadvertent stores. More
details about extension can be read from [2] and there are details in
documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
(specifically expected from dynamic loader). There has been a lot of earlier
discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
overall consensus had been to let dynamic loader (or usermode) to decide for
enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv. arm64 is expected
to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time
***********************
Spec changes
------------
- Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
`auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
Thus label width is 20bit.
- Shadow stack management instructions are reduced to
sspush - to push x1/x5 on shadow stack
sspopchk - pops from shadow stack and comapres with x1/x5.
ssamoswap - atomically swap value on shadow stack.
rdssp - reads current shadow stack pointer
- Shadow stack accesses on readonly memory always raise AMO/store page fault.
`sspopchk` is load but if underlying page is readonly, it'll raise a store
page fault. It simplifies hardware and kernel for COW handling for shadow
stack pages.
- riscv defines a new exception type `software check exception` and control flow
violations raise software check exception.
- enabling controls for shadow stack and landing are in xenvcfg CSR and controls
lower privilege mode enabling. As an example senvcfg controls enabling for U and
menvcfg controls enabling for S mode.
core mm shadow stack enabling
-----------------------------
Shadow stack for x86 usermode are now in mainline and thus this patch
series builds on top of that for arch-agnostic mm related changes. Big
thanks and shout out to Rick Edgecombe for that.
selftests
---------
Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
[2] - https://github.com/riscv/riscv-cfi
[3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#m…
[4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.co…
[5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRh…
[6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kerne…
[7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/
[8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_i…
[9] - https://lore.kernel.org/lkml/20240329044459.3990638-1-debug@rivosinc.com/
From: Jeff Xu <jeffxu(a)chromium.org>
This is V10 version, it rebases v9 patch to 6.9.rc3.
We also applied and tested mseal() in chrome and chromebook.
------------------------------------------------------------------
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with
following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.
Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.
Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this work is completed, all applications will be able to
automatically benefit from these new protections.
In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental
in shaping this patch:
Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
implementing mimmutable() in OpenBSD.
MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.
To measure the performance impact of this loop, two tests are developed.
[8]
The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.
The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
create 1000 mappings (1 page per VMA)
start the sampling
for (j = 0; j < 1000, j++)
mprotect one mapping
stop and save the sample
delete 1000 mappings
calculates all samples.
Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.
Based on the latest upstream code:
The first test (measuring time)
syscall__ vmas t t_mseal delta_ns per_vma %
munmap__ 1 909 944 35 35 104%
munmap__ 2 1398 1502 104 52 107%
munmap__ 4 2444 2594 149 37 106%
munmap__ 8 4029 4323 293 37 107%
munmap__ 16 6647 6935 288 18 104%
munmap__ 32 11811 12398 587 18 105%
mprotect 1 439 465 26 26 106%
mprotect 2 1659 1745 86 43 105%
mprotect 4 3747 3889 142 36 104%
mprotect 8 6755 6969 215 27 103%
mprotect 16 13748 14144 396 25 103%
mprotect 32 27827 28969 1142 36 104%
madvise_ 1 240 262 22 22 109%
madvise_ 2 366 442 76 38 121%
madvise_ 4 623 751 128 32 121%
madvise_ 8 1110 1324 215 27 119%
madvise_ 16 2127 2451 324 20 115%
madvise_ 32 4109 4642 534 17 113%
The second test (measuring cpu cycle)
syscall__ vmas cpu cmseal delta_cpu per_vma %
munmap__ 1 1790 1890 100 100 106%
munmap__ 2 2819 3033 214 107 108%
munmap__ 4 4959 5271 312 78 106%
munmap__ 8 8262 8745 483 60 106%
munmap__ 16 13099 14116 1017 64 108%
munmap__ 32 23221 24785 1565 49 107%
mprotect 1 906 967 62 62 107%
mprotect 2 3019 3203 184 92 106%
mprotect 4 6149 6569 420 105 107%
mprotect 8 9978 10524 545 68 105%
mprotect 16 20448 21427 979 61 105%
mprotect 32 40972 42935 1963 61 105%
madvise_ 1 434 497 63 63 115%
madvise_ 2 752 899 147 74 120%
madvise_ 4 1313 1513 200 50 115%
madvise_ 8 2271 2627 356 44 116%
madvise_ 16 4312 4883 571 36 113%
madvise_ 32 8376 9319 943 29 111%
Based on the result, for 6.8 kernel, sealing check adds
20-40 nano seconds, or around 50-100 CPU cycles, per VMA.
In addition, I applied the sealing to 5.10 kernel:
The first test (measuring time)
syscall__ vmas t tmseal delta_ns per_vma %
munmap__ 1 357 390 33 33 109%
munmap__ 2 442 463 21 11 105%
munmap__ 4 614 634 20 5 103%
munmap__ 8 1017 1137 120 15 112%
munmap__ 16 1889 2153 263 16 114%
munmap__ 32 4109 4088 -21 -1 99%
mprotect 1 235 227 -7 -7 97%
mprotect 2 495 464 -30 -15 94%
mprotect 4 741 764 24 6 103%
mprotect 8 1434 1437 2 0 100%
mprotect 16 2958 2991 33 2 101%
mprotect 32 6431 6608 177 6 103%
madvise_ 1 191 208 16 16 109%
madvise_ 2 300 324 24 12 108%
madvise_ 4 450 473 23 6 105%
madvise_ 8 753 806 53 7 107%
madvise_ 16 1467 1592 125 8 108%
madvise_ 32 2795 3405 610 19 122%
The second test (measuring cpu cycle)
syscall__ nbr_vma cpu cmseal delta_cpu per_vma %
munmap__ 1 684 715 31 31 105%
munmap__ 2 861 898 38 19 104%
munmap__ 4 1183 1235 51 13 104%
munmap__ 8 1999 2045 46 6 102%
munmap__ 16 3839 3816 -23 -1 99%
munmap__ 32 7672 7887 216 7 103%
mprotect 1 397 443 46 46 112%
mprotect 2 738 788 50 25 107%
mprotect 4 1221 1256 35 9 103%
mprotect 8 2356 2429 72 9 103%
mprotect 16 4961 4935 -26 -2 99%
mprotect 32 9882 10172 291 9 103%
madvise_ 1 351 380 29 29 108%
madvise_ 2 565 615 49 25 109%
madvise_ 4 872 933 61 15 107%
madvise_ 8 1508 1640 132 16 109%
madvise_ 16 3078 3323 245 15 108%
madvise_ 32 5893 6704 811 25 114%
For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
CPU cycles, there is even decrease in some cases.
It might be interesting to compare 5.10 and 6.8 kernel
The first test (measuring time)
syscall__ vmas t_5_10 t_6_8 delta_ns per_vma %
munmap__ 1 357 909 552 552 254%
munmap__ 2 442 1398 956 478 316%
munmap__ 4 614 2444 1830 458 398%
munmap__ 8 1017 4029 3012 377 396%
munmap__ 16 1889 6647 4758 297 352%
munmap__ 32 4109 11811 7702 241 287%
mprotect 1 235 439 204 204 187%
mprotect 2 495 1659 1164 582 335%
mprotect 4 741 3747 3006 752 506%
mprotect 8 1434 6755 5320 665 471%
mprotect 16 2958 13748 10790 674 465%
mprotect 32 6431 27827 21397 669 433%
madvise_ 1 191 240 49 49 125%
madvise_ 2 300 366 67 33 122%
madvise_ 4 450 623 173 43 138%
madvise_ 8 753 1110 357 45 147%
madvise_ 16 1467 2127 660 41 145%
madvise_ 32 2795 4109 1314 41 147%
The second test (measuring cpu cycle)
syscall__ vmas cpu_5_10 c_6_8 delta_cpu per_vma %
munmap__ 1 684 1790 1106 1106 262%
munmap__ 2 861 2819 1958 979 327%
munmap__ 4 1183 4959 3776 944 419%
munmap__ 8 1999 8262 6263 783 413%
munmap__ 16 3839 13099 9260 579 341%
munmap__ 32 7672 23221 15549 486 303%
mprotect 1 397 906 509 509 228%
mprotect 2 738 3019 2281 1140 409%
mprotect 4 1221 6149 4929 1232 504%
mprotect 8 2356 9978 7622 953 423%
mprotect 16 4961 20448 15487 968 412%
mprotect 32 9882 40972 31091 972 415%
madvise_ 1 351 434 82 82 123%
madvise_ 2 565 752 186 93 133%
madvise_ 4 872 1313 442 110 151%
madvise_ 8 1508 2271 763 95 151%
madvise_ 16 3078 4312 1234 77 140%
madvise_ 32 5893 8376 2483 78 142%
From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.
In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten
times greater for munmap and mprotect.
When I discuss the mm performance with Brian Makin, an engineer worked
on performance, it was brought to my attention that such a performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service. Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different. It might be best
to take this data with a grain of salt.
Change history:
===============
V10:
- rebase to 6.9.rc3 (no code change, resolve conflict only)
- Stephen Röttger applied mseal() in Chrome code, and I tested it on
chromebook, the mseal() is working as designed.
V9:
- remove mmap(PROT_SEAL) and mmap(MAP_SEALABLE) (Linus, Theo de Raadt)
- Update mseal_test to check for prot bit (Liam R. Howlett)
- Update documentation to give more detail on sealing check (Liam R. Howlett)
- Add seal_elf test.
- Add performance measure data.
- mseal_test: fix arm build.
https://lore.kernel.org/all/20240214151130.616240-1-jeffxu@chromium.org/
V8:
- perf optimization in mmap. (Liam R. Howlett)
- add one testcase (test_seal_zero_address)
- Update mseal.rst to add note for MAP_SEALABLE.
https://lore.kernel.org/lkml/20240131175027.3287009-1-jeffxu@chromium.org/
V7:
- fix index.rst (Randy Dunlap)
- fix arm build (Randy Dunlap)
- return EPERM for blocked operations (Theo de Raadt)
https://lore.kernel.org/linux-mm/20240122152905.2220849-2-jeffxu@chromium.o…
V6:
- Drop RFC from subject, Given Linus's general approval.
- Adjust syscall number for mseal (main Jan.11/2024)
- Code style fix (Matthew Wilcox)
- selftest: use ksft macros (Muhammad Usama Anjum)
- Document fix. (Randy Dunlap)
https://lore.kernel.org/all/20240111234142.2944934-1-jeffxu@chromium.org/
V5:
- fix build issue in mseal-Wire-up-mseal-syscall
(Suggested by Linus Torvalds, and Greg KH)
- updates on selftest.
https://lore.kernel.org/lkml/20240109154547.1839886-1-jeffxu@chromium.org/#r
V4:
(Suggested by Linus Torvalds)
- new signature: mseal(start,len,flags)
- 32 bit is not supported. vm_seal is removed, use vm_flags instead.
- single bit in vm_flags for sealed state.
- CONFIG_MSEAL kernel config is removed.
- single bit of PROT_SEAL in the "Prot" field of mmap().
Other changes:
- update selftest (Suggested by Muhammad Usama Anjum)
- update documentation.
https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/
V3:
- Abandon per-syscall approach, (Suggested by Linus Torvalds).
- Organize sealing types around their functionality, such as
MM_SEAL_BASE, MM_SEAL_PROT_PKEY.
- Extend the scope of sealing from calls originated in userspace to
both kernel and userspace. (Suggested by Linus Torvalds)
- Add seal type support in mmap(). (Suggested by Pedro Falcato)
- Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent
destructive operations of madvise. (Suggested by Jann Horn and
Stephen Röttger)
- Make sealed VMAs mergeable. (Suggested by Jann Horn)
- Add MAP_SEALABLE to mmap()
- Add documentation - mseal.rst
https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o…
v2:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add a detailed commit message.
https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/
v1:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/
----------------------------------------------------------------
[1] https://kernelnewbies.org/Linux_2_6_8
[2] https://v8.dev/blog/control-flow-integrity
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b…
[4] https://man.openbsd.org/mimmutable.2
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge…
[6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf…
[7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/
[8] https://github.com/peaktocreek/mmperf
Jeff Xu (5):
mseal: Wire up mseal syscall
mseal: add mseal syscall
selftest mm/mseal memory sealing
mseal:add documentation
selftest mm/mseal read-only elf memory segment
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mseal.rst | 199 ++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/sys_ni.c | 1 +
mm/Makefile | 4 +
mm/internal.h | 37 +
mm/madvise.c | 12 +
mm/mmap.c | 31 +-
mm/mprotect.c | 10 +
mm/mremap.c | 31 +
mm/mseal.c | 307 ++++
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/mseal_test.c | 1836 +++++++++++++++++++
tools/testing/selftests/mm/seal_elf.c | 183 ++
33 files changed, 2678 insertions(+), 3 deletions(-)
create mode 100644 Documentation/userspace-api/mseal.rst
create mode 100644 mm/mseal.c
create mode 100644 tools/testing/selftests/mm/mseal_test.c
create mode 100644 tools/testing/selftests/mm/seal_elf.c
--
2.44.0.683.g7961c838ac-goog
Hello,
this was reported in https://lore.kernel.org/all/202404151340.5b152d96-lkp@intel.com/
since we still observed same failure after the commit is merged in mainline,
we just report again FYI.
kernel test robot noticed "kunit.VCAP_API_DebugFS_Testsuite.vcap_api_show_admin_raw_test.fail" on:
commit: 3a35c13007dea132a65f07de05c26b87837fadc2 ("kunit: Handle test faults")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed in linus/master 6e51b4b5bbc07e52b226017936874715629932d1]
[test failed on linux-next/master 632483ea8004edfadd035de36e1ab2c7c4f53158]
in testcase: kunit
version:
with following parameters:
group: group-03
compiler: gcc-13
test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang(a)intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405241710.148db8b0-oliver.sang@intel.com
[ 116.216583] # vcap_api_show_admin_raw_test: EXPECTATION FAILED at drivers/net/ethernet/microchip/vcap/vcap_api_debugfs_kunit.c:377
Expected test_expected == test_pr_buffer[0], but
test_expected == " addr: 786, X6 rule, keysets: VCAP_KFS_MAC_ETYPE
"
test_pr_buffer[0] == ""
[ 116.222467] not ok 2 vcap_api_show_admin_raw_test
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240524/202405241710.148db8b0-oliv…
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses post_socket_cb and post_connect_cb callbacks of struct
network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp
test dedicated code out of do_test() into test_dctcp().
v3:
- Add 4 new patches, 1-3 are cleanups. 4 adds a new helper.
- address Martin's comments in v2.
v2:
- rebased on commit "selftests/bpf: Add test for the use of new args in
cong_control"
Geliang Tang (8):
selftests/bpf: Drop struct post_socket_opts
selftests/bpf: Drop type parameter of start_server_addr
selftests/bpf: Drop type parameter of connect_to_addr
selftests/bpf: Add start_server_str helper
selftests/bpf: Use post_socket_cb in connect_to_fd_opts
selftests/bpf: Use start_server_str in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add post_connect_cb callback
tools/testing/selftests/bpf/network_helpers.c | 56 ++++---
tools/testing/selftests/bpf/network_helpers.h | 13 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 138 +++++++++++++-----
.../selftests/bpf/prog_tests/cls_redirect.c | 7 +-
.../testing/selftests/bpf/prog_tests/mptcp.c | 2 +-
.../selftests/bpf/prog_tests/sk_assign.c | 13 +-
.../selftests/bpf/prog_tests/sock_addr.c | 23 ++-
.../bpf/prog_tests/sockopt_inherit.c | 4 +-
.../bpf/test_tcp_check_syncookie_user.c | 10 +-
9 files changed, 179 insertions(+), 87 deletions(-)
--
2.43.0
Currrentl a 32 bit 1u value is being shifted more than 32 bits causing
overflow and incorrect checking of bits 32-63. Fix this by using the
BIT_ULL macro for shifting bits.
Detected by cppcheck:
sev_init2_tests.c:108:34: error: Shifting 32-bit value by 63 bits is
undefined behaviour [shiftTooManyBits]
Fixes: dfc083a181ba ("selftests: kvm: add tests for KVM_SEV_INIT2")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/kvm/x86_64/sev_init2_tests.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
index 7a4a61be119b..ea09f7a06aa4 100644
--- a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
+++ b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
@@ -105,11 +105,11 @@ void test_features(uint32_t vm_type, uint64_t supported_features)
int i;
for (i = 0; i < 64; i++) {
- if (!(supported_features & (1u << i)))
+ if (!(supported_features & BIT_ULL(i)))
test_init2_invalid(vm_type,
&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) },
"unknown feature");
- else if (KNOWN_FEATURES & (1u << i))
+ else if (KNOWN_FEATURES & BIT_ULL(u))
test_init2(vm_type,
&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) });
}
--
2.39.2
Dear Kernel Community,
This patch introduces a `.gitlab-ci` file along with a `ci/` folder, defining a
basic test pipeline triggered by code pushes to a GitLab-CI instance. This
initial version includes static checks (checkpatch and smatch for now) and build
tests across various architectures and configurations. It leverages an
integrated cache for efficient build times and introduces a flexible 'scenarios'
mechanism for subsystem-specific extensions.
tl;dr: check this video to see a quick demo: https://youtu.be/TWiTjhjOuzg,
but don't forget to check the "Motivation for this work" below. Your feedback,
whether a simple thumbs up or down, is crucial to determine if it is worthwhile
to pursue this initiative.
GitLab is an Open Source platform that includes integrated CI/CD. The pipeline
provided in this patch is designed to work out-of-the-box with any GitLab
instance, including the gitlab.com Free Tier. If you reach the limits of the
Free Tier, consider using community instances like https://gitlab.freedesktop.org/.
Alternatively, you can set up a local runner for more flexibility. The
bootstrap-gitlab-runner.sh script included with this patch simplifies this
process, enabling you to run tests on your preferred infrastructure, including
your own machine.
For detailed information, please refer to the documentation included in the
patch, or check the rendered version here: https://koike.pages.collabora.com/-/linux/-/jobs/298498/artifacts/artifacts… .
Motivation for this Work
========================
We all know tests are a major topic in the community, so let's mention the
specificities of this approach:
1. **Built-in User Interface:** GitLab CI/CD is growing in popularity and has an
user-friendly interface. Our experience with the upstream DRM-CI in the kernel
tree (see this blog post [https://www.collabora.com/news-and-blog/blog/2024/02/08/drm-ci-a-gitlab-ci-…] )
has provided insights into how such a system can benefit the wider community.
2. **Distributed Infrastructure:**
The proposed GitLab-CI pipeline is designed with a distributed infrastructure
model, being possible to run in any gitlab instance.
3. **Reduce regressions:** Fostering a culture where people habitually run
validated tests and post their results can prevent many issues in post-merge
tests.
4. **Collaborative Testing Environment:** The kernel community is already
engaged in numerous testing efforts, including various GitLab-CI pipelines such
as DRM-CI, which I maintain, along with other solutions like KernelCI and
BPF-CI. This proposal is designed to further stimulate contributions to the
evolving testing landscape. Our goal is to establish a comprehensive suite of
common tools and files.
5. **Ownership of QA:**
Discrepancies between kernel code and outdated tests often lead to misattributed
failures, complicating regression tracking. This issue, often arising from
neglected or deprioritized test updates, creates uncertainty about the source of
failures. Adopting an "always green pipeline" approach, as detailed in this
patch's documentation, encourages timely maintenance and validation of tests.
This ensures that testing accurately reflects the current state of the kernel,
thereby improving the effectiveness of our QA processes.
Additionally, if we discover that this method isn't working for us, we can
easily remove it from the codebase, as it is primarily contained within the ci/
folder.
Future Work
===========
**Expanding Static Checks:**
We have the opportunity to integrate a variety of static analysis tools,
including:
- dtbs_checks
- sparse
- yamllint
- dt-doc-validate
- coccicheck
**Adding Userspace Tests on VMs:**
To further our testing, we can implement userspace tests that run on virtual
machines (VMs), such as:
- kselftests
- kunit tests
- Subsystem-specific tests, customizable in the scenarios.
**Leveraging External Test Labs:**
We can extend our testing to external labs, similar to what DRM-CI currently
does. This includes:
- Lava labs
- Bare metal labs
- Using KernelCI-provided labs
**Other integrations**
- Submit results to KCIDB
**Lightweight Implementation for All Developers:**
We aim to design these tests to be lightweight, ensuring developers with limited
computing resources can still run essential tests. Resource-intensive tests can
be set to trigger manually, rather than automatically, to accommodate diverse
development environments.
Chat Discussions
================
For those interested in further discussions:
**Join Our Slack Channel:**
We have a Slack channel, #gitlab-ci, on the KernelCI Slack instance https://kernelci.slack.com/ .
Feel free to join and contribute to the conversation. The KernelCI team has
weekly calls where we also discuss the GitLab-CI pipeline.
**Acknowledgments:**
A special thanks to Nikolai Kondrashov, Tales da Aparecida - both from Red Hat -
and KernelCI community for their valuable feedback and support in this proposal.
I eagerly await your thoughts and suggestions on this initiative.
Also, if you want to see this initiave move faster, we are happy to discuss
funding options.
Best regards,
Helen Koike
Helen Koike (3):
kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
kci-gitlab: Add documentation
kci-gitlab: docs: Add images
.gitlab-ci.yml | 2 +
Documentation/ci/gitlab-ci/gitlab-ci.rst | 404 ++++++++++++++++++
.../ci/gitlab-ci/images/job-matrix.png | Bin 0 -> 159752 bytes
.../gitlab-ci/images/new-project-runner.png | Bin 0 -> 607737 bytes
.../ci/gitlab-ci/images/pipelines-on-push.png | Bin 0 -> 532143 bytes
.../ci/gitlab-ci/images/the-pipeline.png | Bin 0 -> 91675 bytes
.../ci/gitlab-ci/images/variables.png | Bin 0 -> 277518 bytes
Documentation/index.rst | 7 +
MAINTAINERS | 9 +
ci/gitlab-ci/bootstrap-gitlab-runner.sh | 55 +++
ci/gitlab-ci/ci-scripts/build-docs.sh | 35 ++
ci/gitlab-ci/ci-scripts/build-kernel.sh | 35 ++
ci/gitlab-ci/ci-scripts/ici-functions.sh | 104 +++++
ci/gitlab-ci/ci-scripts/install-smatch.sh | 13 +
.../ci-scripts/parse_commit_message.sh | 27 ++
ci/gitlab-ci/ci-scripts/run-checkpatch.sh | 19 +
ci/gitlab-ci/ci-scripts/run-smatch.sh | 45 ++
ci/gitlab-ci/docker-compose.yaml | 18 +
ci/gitlab-ci/linux.code-workspace | 11 +
ci/gitlab-ci/yml/build.yml | 43 ++
ci/gitlab-ci/yml/cache.yml | 26 ++
ci/gitlab-ci/yml/container.yml | 36 ++
ci/gitlab-ci/yml/gitlab-ci.yml | 71 +++
ci/gitlab-ci/yml/kernel-combinations.yml | 18 +
ci/gitlab-ci/yml/scenarios.yml | 12 +
ci/gitlab-ci/yml/scenarios/file-systems.yml | 21 +
ci/gitlab-ci/yml/scenarios/media.yml | 21 +
ci/gitlab-ci/yml/scenarios/network.yml | 21 +
ci/gitlab-ci/yml/static-checks.yml | 21 +
29 files changed, 1074 insertions(+)
create mode 100644 .gitlab-ci.yml
create mode 100644 Documentation/ci/gitlab-ci/gitlab-ci.rst
create mode 100644 Documentation/ci/gitlab-ci/images/job-matrix.png
create mode 100644 Documentation/ci/gitlab-ci/images/new-project-runner.png
create mode 100644 Documentation/ci/gitlab-ci/images/pipelines-on-push.png
create mode 100644 Documentation/ci/gitlab-ci/images/the-pipeline.png
create mode 100644 Documentation/ci/gitlab-ci/images/variables.png
create mode 100755 ci/gitlab-ci/bootstrap-gitlab-runner.sh
create mode 100755 ci/gitlab-ci/ci-scripts/build-docs.sh
create mode 100755 ci/gitlab-ci/ci-scripts/build-kernel.sh
create mode 100644 ci/gitlab-ci/ci-scripts/ici-functions.sh
create mode 100755 ci/gitlab-ci/ci-scripts/install-smatch.sh
create mode 100755 ci/gitlab-ci/ci-scripts/parse_commit_message.sh
create mode 100755 ci/gitlab-ci/ci-scripts/run-checkpatch.sh
create mode 100755 ci/gitlab-ci/ci-scripts/run-smatch.sh
create mode 100644 ci/gitlab-ci/docker-compose.yaml
create mode 100644 ci/gitlab-ci/linux.code-workspace
create mode 100644 ci/gitlab-ci/yml/build.yml
create mode 100644 ci/gitlab-ci/yml/cache.yml
create mode 100644 ci/gitlab-ci/yml/container.yml
create mode 100644 ci/gitlab-ci/yml/gitlab-ci.yml
create mode 100644 ci/gitlab-ci/yml/kernel-combinations.yml
create mode 100644 ci/gitlab-ci/yml/scenarios.yml
create mode 100644 ci/gitlab-ci/yml/scenarios/file-systems.yml
create mode 100644 ci/gitlab-ci/yml/scenarios/media.yml
create mode 100644 ci/gitlab-ci/yml/scenarios/network.yml
create mode 100644 ci/gitlab-ci/yml/static-checks.yml
--
2.40.1
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 907f33028871fa7c9a3db1efd467b78ef82cce20 ]
The standard library perror() function provides a convenient way to print
an error message based on the current errno but this doesn't play nicely
with KTAP output. Provide a helper which does an equivalent thing in a KTAP
compatible format.
nolibc doesn't have a strerror() and adding the table of strings required
doesn't seem like a good fit for what it's trying to do so when we're using
that only print the errno.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Stable-dep-of: 071af0c9e582 ("selftests: timers: Convert posix_timers test to generate KTAP output")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
---
tools/testing/selftests/kselftest.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index e8eecbc83a60..ad7b97e16f37 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -48,6 +48,7 @@
#include <stdlib.h>
#include <unistd.h>
#include <stdarg.h>
+#include <string.h>
#include <stdio.h>
#include <sys/utsname.h>
#endif
@@ -156,6 +157,19 @@ static inline void ksft_print_msg(const char *msg, ...)
va_end(args);
}
+static inline void ksft_perror(const char *msg)
+{
+#ifndef NOLIBC
+ ksft_print_msg("%s: %s (%d)\n", msg, strerror(errno), errno);
+#else
+ /*
+ * nolibc doesn't provide strerror() and it seems
+ * inappropriate to add one, just print the errno.
+ */
+ ksft_print_msg("%s: %d)\n", msg, errno);
+#endif
+}
+
static inline void ksft_test_result_pass(const char *msg, ...)
{
int saved_errno = errno;
--
2.45.0.215.g3402c0e53f-goog
Currently array buf is not being initialized and so garbage values
on the stack are being used in the mq_send calls. Initialize the
values in the array to zero.
Cleans up cppcheck warning:
mq_perf_tests.c:334:25: error: Uninitialized variable: buff [uninitvar]
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mqueue/mq_perf_tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mqueue/mq_perf_tests.c b/tools/testing/selftests/mqueue/mq_perf_tests.c
index 5c16159d0bcd..bd561dc785d8 100644
--- a/tools/testing/selftests/mqueue/mq_perf_tests.c
+++ b/tools/testing/selftests/mqueue/mq_perf_tests.c
@@ -322,7 +322,7 @@ void *fake_cont_thread(void *arg)
void *cont_thread(void *arg)
{
- char buff[MSG_SIZE];
+ char buff[MSG_SIZE] = { };
int i, priority;
for (i = 0; i < num_cpus_to_pin; i++)
--
2.39.2
Testing a network device that has large numbers of bytes/packets may
overflow. Using stats64 when comparing fixes this problem.
I tripped on this while iterating on a qstats patch for mlx5. See below
for confirmation without my added code that this is a bug.
Before this patch (with added debugging output):
$ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
rstat: 481708634 qstat: 666201639514 key: tx-bytes
not ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
Note the huge delta above ^^^ in the rtnl vs qstats.
After this patch:
$ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
It looks like rtnl_fill_stats in net/core/rtnetlink.c will attempt to
copy the 64bit stats into a 32bit structure which is probably why this
behavior is occurring.
To show this is happening, you can get the underlying stats that the
stats.py test uses like this:
$ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \
--do getlink --json '{"ifi-index": 7}'
And examine the output (heavily snipped to show relevant fields):
'stats': {
'multicast': 3739197,
'rx-bytes': 1201525399,
'rx-packets': 56807158,
'tx-bytes': 492404458,
'tx-packets': 1200285371,
'stats64': {
'multicast': 3739197,
'rx-bytes': 35561263767,
'rx-packets': 56807158,
'tx-bytes': 666212335338,
'tx-packets': 1200285371,
The stats.py test prior to this patch was using the 'stats' structure
above, which matches the failure output on my system.
Comparing side by side, rx-bytes and tx-bytes, and getting ethtool -S
output:
rx-bytes stats: 1201525399
rx-bytes stats64: 35561263767
rx-bytes ethtool: 36203402638
tx-bytes stats: 492404458
tx-bytes stats64: 666212335338
tx-bytes ethtool: 666215360113
Note that the above was taken from a system with an mlx5 NIC, which only
exposes ndo_get_stats64.
Based on the ethtool output and qstat output, it appears that stats.py
should be updated to use the 'stats64' structure for accurate
comparisons when packet/byte counters get very large.
To confirm that this was not related to the qstats code I was iterating
on, I booted a kernel without my driver changes and re-ran the test
which shows the qstats are skipped (as they don't exist for mlx5):
NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum # SKIP qstats not supported by the device
ok 4 stats.qstat_by_ifindex # SKIP No ifindex supports qstats
But, fetching the stats using the CLI
$ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \
--do getlink --json '{"ifi-index": 7}'
Shows the same issue (heavily snipped for relevant fields only):
'stats': {
'multicast': 105489,
'rx-bytes': 530879526,
'rx-packets': 751415,
'tx-bytes': 2510191396,
'tx-packets': 27700323,
'stats64': {
'multicast': 105489,
'rx-bytes': 530879526,
'rx-packets': 751415,
'tx-bytes': 15395093284,
'tx-packets': 27700323,
Comparing side by side with ethtool -S on the unmodified mlx5 driver:
tx-bytes stats: 2510191396
tx-bytes stats64: 15395093284
tx-bytes ethtool: 17718435810
Fixes: f0e6c86e4bab ("testing: net-drv: add a driver test for stats reporting")
Signed-off-by: Joe Damato <jdamato(a)fastly.com>
---
tools/testing/selftests/drivers/net/stats.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py
index 7a7b16b180e2..820b8e0a22c6 100755
--- a/tools/testing/selftests/drivers/net/stats.py
+++ b/tools/testing/selftests/drivers/net/stats.py
@@ -69,7 +69,7 @@ def pkt_byte_sum(cfg) -> None:
return 0
for _ in range(10):
- rtstat = rtnl.getlink({"ifi-index": cfg.ifindex})['stats']
+ rtstat = rtnl.getlink({"ifi-index": cfg.ifindex})['stats64']
if stat_cmp(rtstat, qstat) < 0:
raise Exception("RTNL stats are lower, fetched later")
qstat = get_qstat(cfg)
--
2.25.1
The cbo and which-cpu hwprobe selftests leave their artifacts in the
kernel tree and end up being tracked by git. Add the binaries to the
hwprobe selftest .gitignore so this no longer happens.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
Fixes: a29e2a48afe3 ("RISC-V: selftests: Add CBO tests")
Fixes: ef7d6abb2cf5 ("RISC-V: selftests: Add which-cpus hwprobe test")
---
tools/testing/selftests/riscv/hwprobe/.gitignore | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/riscv/hwprobe/.gitignore b/tools/testing/selftests/riscv/hwprobe/.gitignore
index 8113dc3bdd03..6e384e80ea1a 100644
--- a/tools/testing/selftests/riscv/hwprobe/.gitignore
+++ b/tools/testing/selftests/riscv/hwprobe/.gitignore
@@ -1 +1,3 @@
hwprobe
+cbo
+which-cpus
---
base-commit: ed30a4a51bb196781c8058073ea720133a65596f
change-id: 20240425-gitignore_hwprobe_artifacts-fb0f2cd3509c
--
- Charlie
Hi all,
We are students from the State University of Campinas with an interest in contributing to the kernel. We are part of LKCAMP, a student group that focuses on researching and contributing to open source software. Our group has organized kernel hackathons in the past [1] that resulted in sucessful contributions, and we would like to continue the effort this year.
This time, we were thinking about writing KUnit tests for data structures in `lib/` (or converting existing lib test code), similarly to our previous hackathon. We are currently considering a few candidates:
- lib/kfifo.c
- lib/llist.c
- tools/testing/scatterlist
- tools/testing/radix-tree
We would like to know if these are good candidates, and also ask for suggestions of other code that could benefit from having KUnit tests.
Thanks!
Artur Alves
[1] https://lore.kernel.org/dri-devel/20211011152333.gm5jkaog6b6nbv5w@notapiano/
From: Geliang Tang <tanggeliang(a)kylinos.cn>
bpf_prog5 and bpf_prog7 are removed from progs/test_sockmap_kern.h in
commit d79a32129b21 ("bpf: Selftests, remove prints from sockmap tests"),
now there are only 9 progs in it, not 11:
SEC("sk_skb1")
int bpf_prog1(struct __sk_buff *skb)
SEC("sk_skb2")
int bpf_prog2(struct __sk_buff *skb)
SEC("sk_skb3")
int bpf_prog3(struct __sk_buff *skb)
SEC("sockops")
int bpf_sockmap(struct bpf_sock_ops *skops)
SEC("sk_msg1")
int bpf_prog4(struct sk_msg_md *msg)
SEC("sk_msg2")
int bpf_prog6(struct sk_msg_md *msg)
SEC("sk_msg3")
int bpf_prog8(struct sk_msg_md *msg)
SEC("sk_msg4")
int bpf_prog9(struct sk_msg_md *msg)
SEC("sk_msg5")
int bpf_prog10(struct sk_msg_md *msg)
This patch updates the array sizes of prog_fd[], prog_attach_type[] and
prog_type[] from 11 to 9 accordingly.
Fixes: d79a32129b21 ("bpf: Selftests, remove prints from sockmap tests")
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
tools/testing/selftests/bpf/test_sockmap.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 92752f5eeded..4499b3cfc3a6 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -63,7 +63,7 @@ int passed;
int failed;
int map_fd[9];
struct bpf_map *maps[9];
-int prog_fd[11];
+int prog_fd[9];
int txmsg_pass;
int txmsg_redir;
@@ -1793,8 +1793,6 @@ int prog_attach_type[] = {
BPF_SK_MSG_VERDICT,
BPF_SK_MSG_VERDICT,
BPF_SK_MSG_VERDICT,
- BPF_SK_MSG_VERDICT,
- BPF_SK_MSG_VERDICT,
};
int prog_type[] = {
@@ -1807,8 +1805,6 @@ int prog_type[] = {
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_SK_MSG,
- BPF_PROG_TYPE_SK_MSG,
- BPF_PROG_TYPE_SK_MSG,
};
static int populate_progs(char *bpf_file)
--
2.43.0
The va_high_addr_switch memory selftest tests out some corner cases
related to allocation and page/hugepage faulting around the switch
boundary. Currently, the page size and hugepage size have been statically
defined. Post FEAT_LPA2, the Aarch64 Linux kernel adds support for 4k and
16k translation granules on higher addresses; we restructure the test to
support the same. In addition, we avoid invocation of the binary twice,
in the shell script, to reduce test noise.
Dev Jain (2):
selftests/mm: va_high_addr_switch: Reduce test noise
selftests/mm: va_high_addr_switch: Dynamically initialize testcases to
enable LPA2 testing
.../selftests/mm/va_high_addr_switch.c | 454 +++++++++---------
.../selftests/mm/va_high_addr_switch.sh | 4 -
2 files changed, 232 insertions(+), 226 deletions(-)
--
2.34.1
From: donsheng <dongsheng.x.zhang(a)intel.com>
If the host was booted with the "default_hugepagesz=1G" kernel command-line
parameter, running the NX hugepage test will fail with error "Invalid argument"
at the TEST_ASSERT line in kvm_util.c's __vm_mem_region_delete() function:
static void __vm_mem_region_delete(struct kvm_vm *vm,
struct userspace_mem_region *region,
bool unlink)
{
int ret;
...
ret = munmap(region->mmap_start, region->mmap_size);
TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
...
}
NX hugepage test creates a VM with a data slot of 6M size backed with huge
pages. If the default hugetlb page size is set to 1G, calling mmap() with
MAP_HUGETLB and a length of 6M will succeed but calling its matching munmap()
will fail. Documentation/admin-guide/mm/hugetlbpage.rst specifies this behavior:
"Syscalls that operate on memory backed by hugetlb pages only have their lengths
aligned to the native page size of the processor; they will normally fail with
errno set to EINVAL or exclude hugetlb pages that extend beyond the length if
not hugepage aligned. For example, munmap(2) will fail if memory is backed by
a hugetlb page and the length is smaller than the hugepage size."
Explicitly use MAP_HUGE_2MB in conjunction with MAP_HUGETLB to fix the issue.
Signed-off-by: donsheng <dongsheng.x.zhang(a)intel.com>
Suggested-by: Zide Chen <zide.chen(a)intel.com>
---
tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
index 17bbb96fc4df..146e9033e206 100644
--- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
@@ -129,7 +129,7 @@ void run_test(int reclaim_period_ms, bool disable_nx_huge_pages,
vcpu = vm_vcpu_add(vm, 0, guest_code);
- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_HUGETLB,
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_HUGETLB_2MB,
HPAGE_GPA, HPAGE_SLOT,
HPAGE_SLOT_NPAGES, 0);
--
2.43.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This patchset uses post_socket_cb and post_connect_cb callbacks of struct
network_helper_opts to refactor do_test() in bpf_tcp_ca.c to move dctcp
test dedicated code out of do_test() into test_dctcp().
Patch 3 adds a new member in post_socket_opts and patch 4 adds a new
callback in network_helper_opts. I'm not sure if this is going too far.
v2:
- rebased on commit "selftests/bpf: Add test for the use of new args in
cong_control"
Geliang Tang (4):
selftests/bpf: Use post_socket_cb in connect_to_fd_opts
selftests/bpf: Use start_server_addr in bpf_tcp_ca
selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
selftests/bpf: Add post_connect_cb callback
tools/testing/selftests/bpf/network_helpers.c | 13 +-
tools/testing/selftests/bpf/network_helpers.h | 8 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 111 ++++++++++++------
3 files changed, 86 insertions(+), 46 deletions(-)
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The kprobe_eventname.tc test checks if a function with .isra. can have a
kprobe attached to it. It loops through the kallsyms file for all the
functions that have the .isra. name, and checks if it exists in the
available_filter_functions file, and if it does, it uses it to attach a
kprobe to it.
The issue is that kprobes can not attach to functions that are listed more
than once in available_filter_functions. With the latest kernel, the
function that is found is: rapl_event_update.isra.0
# grep rapl_event_update.isra.0 /sys/kernel/tracing/available_filter_functions
rapl_event_update.isra.0
rapl_event_update.isra.0
It is listed twice. This causes the attached kprobe to it to fail which in
turn fails the test. Instead of just picking the function function that is
found in available_filter_functions, pick the first one that is listed
only once in available_filter_functions.
Cc: stable(a)vger.kernel.org
Fixes: 604e3548236de ("selftests/ftrace: Select an existing function in kprobe_eventname test")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
index 1f6981ef7afa..ba19b81cef39 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
@@ -30,7 +30,8 @@ find_dot_func() {
fi
grep " [tT] .*\.isra\..*" /proc/kallsyms | cut -f 3 -d " " | while read f; do
- if grep -s $f available_filter_functions; then
+ cnt=`grep -s $f available_filter_functions | wc -l`;
+ if [ $cnt -eq 1 ]; then
echo $f
break
fi
--
2.43.0
Post FEAT_LPA2, Aarch64 extends the 4KB and 16KB translation granule to
large virtual addresses. Currently, the test is being skipped for said
granule sizes, because the page sizes have been statically defined; to
work around that would mean breaking the nice array of structs used for
adding testcases. Instead, don't skip the test, and encourage the user
to manually change the macros.
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
.../testing/selftests/mm/va_high_addr_switch.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/va_high_addr_switch.c b/tools/testing/selftests/mm/va_high_addr_switch.c
index cfbc501290d3..ba862f51d395 100644
--- a/tools/testing/selftests/mm/va_high_addr_switch.c
+++ b/tools/testing/selftests/mm/va_high_addr_switch.c
@@ -292,12 +292,24 @@ static int supported_arch(void)
#elif defined(__x86_64__)
return 1;
#elif defined(__aarch64__)
- return getpagesize() == PAGE_SIZE;
+ return 1;
#else
return 0;
#endif
}
+#if defined(__aarch64__)
+void failure_message(void)
+{
+ printf("TEST MAY FAIL: Are you running on a pagesize other than 64K?\n");
+ printf("If yes, please change macros manually. Ensure to change the\n");
+ printf("address macros too if running defconfig on 16K pagesize,\n");
+ printf("since userspace VA = 47 bits post FEAT_LPA2.\n");
+}
+#else
+void failure_message(void) {}
+#endif
+
int main(int argc, char **argv)
{
int ret;
@@ -308,5 +320,8 @@ int main(int argc, char **argv)
ret = run_test(testcases, ARRAY_SIZE(testcases));
if (argc == 2 && !strcmp(argv[1], "--run-hugetlb"))
ret = run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases));
+
+ if (ret)
+ failure_message();
return ret;
}
--
2.39.2
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This series applies on top of the stable 6.9 kernel.
Changes in v2:
- Handle an unsigned long number of hugepages
- Combine the first patch (previously standalone) with this series
Link to v1:
https://lore.kernel.org/all/20240513082842.4117782-1-dev.jain@arm.com/https://lore.kernel.org/all/20240515093633.54814-1-dev.jain@arm.com/
Dev Jain (3):
selftests/mm: compaction_test: Fix bogus test success on Aarch64
selftests/mm: compaction_test: Fix incorrect write of zero to
nr_hugepages
selftests/mm: compaction_test: Fix bogus test success and reduce
probability of OOM-killer invocation
tools/testing/selftests/mm/compaction_test.c | 85 ++++++++++++++------
1 file changed, 60 insertions(+), 25 deletions(-)
--
2.34.1
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
https://bugzilla.kernel.org/attachment.cgi?id=306250
Testing Done:
Added a selftest to test the Idle HLT intercept functionality.
Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Add an interface to read the data of named vcpu stat
KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 15 +++-
tools/testing/selftests/kvm/Makefile | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 66 ++++++++++++++
.../selftests/kvm/include/x86_64/processor.h | 18 ++++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 +++++++
.../selftests/kvm/x86_64/svm_idle_hlt_test.c | 87 +++++++++++++++++++
9 files changed, 220 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: 2489e6c9ebb57d6d0e98936479b5f586201379c7
--
2.34.1
Hi Linus,
Please pull this urgent kselftest fixes update for Linux 6.10-rc1.
This kselftest fixes update for Linux 6.10-rc1 consists of
reverts framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h and follow-on
changes to cgroup and sgx test as they are causing build
failures and warnings.
These three reverts have bee in next for a few days prior
to a rebase before generating the pull request.
diff for pull request is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit ea5f6ad9ad9645733b72ab53a98e719b460d36a6:
Merge tag 'platform-drivers-x86-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 (2024-05-16 09:14:50 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.10-rc1-fixes
for you to fetch changes up to a97853f25b06f71c23b2d7a59fbd40f3f42d55ac:
Revert "selftests/cgroup: Drop define _GNU_SOURCE" (2024-05-20 09:00:15 -0600)
----------------------------------------------------------------
linux_kselftest-next-6.10-rc1-fixes
This kselftest fixes update for Linux 6.10-rc1 consists of
reverts framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h and follow-on
changes to cgroup and sgx test as they are causing build
failures and warnings.
----------------------------------------------------------------
Shuah Khan (3):
Revert "selftests: Compile kselftest headers with -D_GNU_SOURCE"
Revert "selftests/sgx: Include KHDR_INCLUDES in Makefile"
Revert "selftests/cgroup: Drop define _GNU_SOURCE"
tools/testing/selftests/Makefile | 4 ++--
tools/testing/selftests/cgroup/cgroup_util.c | 3 +++
tools/testing/selftests/cgroup/test_core.c | 2 ++
tools/testing/selftests/cgroup/test_cpu.c | 2 ++
tools/testing/selftests/cgroup/test_hugetlb_memcg.c | 2 ++
tools/testing/selftests/cgroup/test_kmem.c | 2 ++
tools/testing/selftests/cgroup/test_memcontrol.c | 2 ++
tools/testing/selftests/cgroup/test_zswap.c | 2 ++
tools/testing/selftests/kselftest_harness.h | 2 +-
tools/testing/selftests/lib.mk | 2 +-
tools/testing/selftests/sgx/Makefile | 2 +-
tools/testing/selftests/sgx/sigstruct.c | 1 +
12 files changed, 21 insertions(+), 5 deletions(-)
----------------------------------------------------------------
The pcmtest driver tests use the kselftest harness which requires that
_GNU_SOURCE is defined but nothing causes it to be defined. Since the
KHDR_INCLUDES Makefile variable has had the required define added let's
use that, this should provide some futureproofing.
Fixes: daef47b89efd ("selftests: Compile kselftest headers with -D_GNU_SOURCE")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/alsa/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/Makefile b/tools/testing/selftests/alsa/Makefile
index 5af9ba8a4645..c1ce39874e2b 100644
--- a/tools/testing/selftests/alsa/Makefile
+++ b/tools/testing/selftests/alsa/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
#
-CFLAGS += $(shell pkg-config --cflags alsa)
+CFLAGS += $(shell pkg-config --cflags alsa) $(KHDR_INCLUDES)
LDLIBS += $(shell pkg-config --libs alsa)
ifeq ($(LDLIBS),)
LDLIBS += -lasound
---
base-commit: 3c999d1ae3c75991902a1a7dad0cb62c2a3008b4
change-id: 20240516-kselftest-fix-gnu-source-81ddd00870a8
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi all,
This series does a number of cleanups into resctrl_val() and
generalizes it by removing test name specific handling from the
function.
One of the changes improves MBA/MBM measurement by narrowing down the
period the resctrl FS derived memory bandwidth numbers are measured
over. My feel is it didn't cause noticeable difference into the numbers
because they're generally good anyway except for the small number of
outliers. To see the impact on outliers, I'd need to setup a test to
run large number of replications and do a statistical analysis, which
I've not spent my time on. Even without the statistical analysis, the
new way to measure seems obviously better and makes sense even if I
cannot see a major improvement with the setup I'm using.
v4:
- Merged close fix into IMC READ+WRITE rework patch
- Add loop to reset imc_counters_config fds to -1 to be able know which
need closing
- Introduce perf_close_imc_mem_bw() to close fds
- Open resctrl mem bw file (twice) beforehand to avoid opening it during
the test
- Remove MBM .mongrp setup
- Remove mongrp from CMT test
v3:
- Rename init functions to <testname>_init()
- Replace for loops with READ+WRITE statements for clarity
- Don't drop Return: entry from perf_open_imc_mem_bw() func comment
- New patch: Fix closing of IMC fds in case of error
- New patch: Make "bandwidth" consistent in comments & prints
- New patch: Simplify mem bandwidth file code
- Remove wrong comment
- Changed grp_name check to return -1 on fail (internal sanity check)
v2:
- Resolved conflicts with kselftest/next
- Spaces -> tabs correction
Ilpo Järvinen (16):
selftests/resctrl: Fix closing IMC fds on error and open-code R+W
instead of loops
selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1)
only
selftests/resctrl: Make "bandwidth" consistent in comments & prints
selftests/resctrl: Consolidate get_domain_id() into resctrl_val()
selftests/resctrl: Use correct type for pids
selftests/resctrl: Cleanup bm_pid and ppid usage & limit scope
selftests/resctrl: Rename measure_vals() to measure_mem_bw_vals() &
document
selftests/resctrl: Simplify mem bandwidth file code for MBA & MBM
tests
selftests/resctrl: Add ->measure() callback to resctrl_val_param
selftests/resctrl: Add ->init() callback into resctrl_val_param
selftests/resctrl: Simplify bandwidth report type handling
selftests/resctrl: Make some strings passed to resctrlfs functions
const
selftests/resctrl: Convert ctrlgrp & mongrp to pointers
selftests/resctrl: Remove mongrp from MBA test
selftests/resctrl: Remove mongrp from CMT test
selftests/resctrl: Remove test name comparing from
write_bm_pid_to_resctrl()
tools/testing/selftests/resctrl/cache.c | 6 +-
tools/testing/selftests/resctrl/cat_test.c | 5 +-
tools/testing/selftests/resctrl/cmt_test.c | 22 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 26 +-
tools/testing/selftests/resctrl/resctrl.h | 49 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 362 ++++++++----------
tools/testing/selftests/resctrl/resctrlfs.c | 64 ++--
8 files changed, 287 insertions(+), 273 deletions(-)
--
2.39.2
The amt.sh requires smcrouted for multicasting routing.
So, it starts smcrouted before forwarding tests.
It must be stopped after all tests, but it isn't.
To fix this issue, it kills smcrouted in the cleanup logic.
Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073(a)gmail.com>
---
The v1 patch is here:
https://lore.kernel.org/netdev/20240508040643.229383-1-ap420073@gmail.com/
v3
- Do not change shebang.
v2
- Headline change.
- Kill smcrouted process only if amt.pid exists.
- Do not remove the return value.
- Remove timeout logic because it was already fixed by following commit
4c639b6a7b9d ("selftests: net: move amt to socat for better compatibility")
- Fix shebang.
tools/testing/selftests/net/amt.sh | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/amt.sh b/tools/testing/selftests/net/amt.sh
index 5175a42cbe8a..7e7ed6c558da 100755
--- a/tools/testing/selftests/net/amt.sh
+++ b/tools/testing/selftests/net/amt.sh
@@ -77,6 +77,7 @@ readonly LISTENER=$(mktemp -u listener-XXXXXXXX)
readonly GATEWAY=$(mktemp -u gateway-XXXXXXXX)
readonly RELAY=$(mktemp -u relay-XXXXXXXX)
readonly SOURCE=$(mktemp -u source-XXXXXXXX)
+readonly SMCROUTEDIR="$(mktemp -d)"
ERR=4
err=0
@@ -85,6 +86,11 @@ exit_cleanup()
for ns in "$@"; do
ip netns delete "${ns}" 2>/dev/null || true
done
+ if [ -f "$SMCROUTEDIR/amt.pid" ]; then
+ smcpid=$(< $SMCROUTEDIR/amt.pid)
+ kill $smcpid
+ fi
+ rm -rf $SMCROUTEDIR
exit $ERR
}
@@ -167,7 +173,7 @@ setup_iptables()
setup_mcast_routing()
{
- ip netns exec "${RELAY}" smcrouted
+ ip netns exec "${RELAY}" smcrouted -P $SMCROUTEDIR/amt.pid
ip netns exec "${RELAY}" smcroutectl a relay_src \
172.17.0.2 239.0.0.1 amtr
ip netns exec "${RELAY}" smcroutectl a relay_src \
--
2.34.1
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
First off, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This series applies on top of the stable 6.9 kernel.
Dev Jain (2):
selftests/mm: compaction_test: Fix incorrect write of zero to
nr_hugepages
selftests/mm: compaction_test: Fix trivial test success and reduce
probability of OOM-killer invocation
tools/testing/selftests/mm/compaction_test.c | 70 ++++++++++++++------
1 file changed, 50 insertions(+), 20 deletions(-)
--
2.30.2