KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1-7 re-structured the get-reg-list test in aarch64 to make some
of the code as common test framework that can be shared by riscv.
Patch 8 move reject_set check logic to a function so as to check for
different errno for different registers.
Patch 9 move finalize_vcpu back to run_test so that riscv can implement
its specific operation.
Patch 10 change to do the get/set operation only on present-blessed list.
Patch 11 add the skip_set facilities so that riscv can skip set operation
on some registers.
Patch 12 enabled the KVM_GET_REG_LIST API in riscv.
patch 13 added the corresponding kselftest for checking possible
register regressions.
The get-reg-list kvm selftest was ported from aarch64 and tested with
Linux v6.5-rc3 on a Qemu riscv64 virt machine.
---
Changed since v5:
* Rebase to v6.5-rc3
* Minor fix for Andrew's comments
Andrew Jones (7):
KVM: arm64: selftests: Replace str_with_index with strdup_printf
KVM: arm64: selftests: Drop SVE cap check in print_reg
KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
KVM: arm64: selftests: Delete core_reg_fixup
KVM: arm64: selftests: Split get-reg-list test code
KVM: arm64: selftests: Finish generalizing get-reg-list
Haibo Xu (6):
KVM: arm64: selftests: Move reject_set check logic to a function
KVM: arm64: selftests: Move finalize_vcpu back to run_test
KVM: selftests: Only do get/set tests on present blessed list
KVM: selftests: Add skip_set facility to get_reg_list test
KVM: riscv: Add KVM_GET_REG_LIST API support
KVM: riscv: selftests: Add get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 375 +++++++++
tools/testing/selftests/kvm/Makefile | 13 +-
.../selftests/kvm/aarch64/get-reg-list.c | 554 ++-----------
tools/testing/selftests/kvm/get-reg-list.c | 401 +++++++++
.../selftests/kvm/include/kvm_util_base.h | 21 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 15 +
.../selftests/kvm/riscv/get-reg-list.c | 780 ++++++++++++++++++
10 files changed, 1670 insertions(+), 496 deletions(-)
create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
Hello,
This patch series adds a new x86 arch specific BPF helper, bpf_rdtsc()
which can be used for reading the hardware time stamp counter (TSC.)
Currently the same counter is directly accessible from userspace
(using RDTSC instruction), and kernel space using various rdtsc_*()
APIs, however eBPF lacks the support.
The main usage for the TSC counter is for various profiling and timing
purposes, getting accurate cycle counter values. The counter can be
currently read from BPF programs by using the existing perf subsystem
services (bpf_perf_event_read()), however its usage is cumbersome at
best. Additionally, the perf subsystem provides relative value only
for the counter, but absolute values are desired by some use cases
like Wult [1]. The absolute value of TSC can be read with BPF programs
currently via some kprobe / bpf_core_read() magic (see [2], [3], [4] for
example), but this relies on accessing kernel internals and is not
stable API, and is pretty cumbersome. Thus, this patch proposes a new
arch x86 specific BPF helper to avoid the above issues.
-Tero
[1] https://github.com/intel/wult
[2] https://github.com/intel/wult/blob/c92237c95b898498faf41e6644983102d1fe5156…
[3] https://github.com/intel/wult/blob/c92237c95b898498faf41e6644983102d1fe5156…
[4] https://github.com/intel/wult/blob/c92237c95b898498faf41e6644983102d1fe5156…
This extension allows to use F_UNLCK on query, which currently returns
EINVAL. Instead it can be used to query the locks on a particular fd -
something that is not currently possible. The basic idea is that on
F_OFD_GETLK, F_UNLCK would "conflict" with (or query) any types of the
lock on the same fd, and ignore any locks on other fds.
Use-cases:
1. CRIU-alike scenario when you want to read the locking info from an
fd for the later reconstruction. This can now be done by setting
l_start and l_len to 0 to cover entire file range, and do F_OFD_GETLK.
In the loop you need to advance l_start past the returned lock ranges,
to eventually collect all locked ranges.
2. Implementing the lock checking/enforcing policy.
Say you want to implement an "auditor" module in your program,
that checks that the I/O is done only after the proper locking is
applied on a file region. In this case you need to know if the
particular region is locked on that fd, and if so - with what type
of the lock. If you would do that currently (without this extension)
then you can only check for the write locks, and for that you need to
probe the lock on your fd and then open the same file via another fd and
probe there. That way you can identify the write lock on a particular
fd, but such trick is non-atomic and complex. As for finding out the
read lock on a particular fd - impossible.
This extension allows to do such queries without any extra efforts.
3. Implementing the mandatory locking policy.
Suppose you want to make a policy where the write lock inhibits any
unlocked readers and writers. Currently you need to check if the
write lock is present on some other fd, and if it is not there - allow
the I/O operation. But because the write lock can appear at any moment,
you need to do that under some global lock, which can be released only
when the I/O operation is finished.
With the proposed extension you can instead just check the write lock
on your own fd first, and if it is there - allow the I/O operation on
that fd without using any global lock. Only if there is no write lock
on this fd, then you need to take global lock and check for a write
lock on other fds.
The second patch adds a test-case for OFD locks.
It tests both the generic things and the proposed extension.
The third patch is a proposed man page update for fcntl(2)
(not for the linux source tree)
Changes in v2:
- Dropped the l_pid extension patch and updated test-case accordingly.
Stas Sergeev (2):
fs/locks: F_UNLCK extension for F_OFD_GETLK
selftests: add OFD lock tests
fs/locks.c | 23 +++-
tools/testing/selftests/locking/Makefile | 2 +
tools/testing/selftests/locking/ofdlocks.c | 132 +++++++++++++++++++++
3 files changed, 154 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/locking/ofdlocks.c
CC: Jeff Layton <jlayton(a)kernel.org>
CC: Chuck Lever <chuck.lever(a)oracle.com>
CC: Alexander Viro <viro(a)zeniv.linux.org.uk>
CC: Christian Brauner <brauner(a)kernel.org>
CC: linux-fsdevel(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
CC: linux-api(a)vger.kernel.org
--
2.39.2
A missing break in kms_tests leads to kselftest hang when the
parameter -s is used.
In current code flow because of missing break in -s, -t parses
args spilled from -s and as -t accepts only valid values as 0,1
so any arg in -s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t,
the next case -M would immediately break out of the switch
statement but that is no longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM")
Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
---
v1 -> v2
collect Reviewed-by from David
Updated Fixes tag from commit 9e7cb94ca218 to 07115fcc15b4
tools/testing/selftests/mm/ksm_tests.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
index 435acebdc325..380b691d3eb9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -831,6 +831,7 @@ int main(int argc, char *argv[])
printf("Size must be greater than 0\n");
return KSFT_FAIL;
}
+ break;
case 't':
{
int tmp = atoi(optarg);
--
2.34.1
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.
I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.
Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v6:
- Reject unhashed UDP sockets in bpf_sk_assign to avoid ref leak
- Link to v5: https://lore.kernel.org/r/20230613-so-reuseport-v5-0-f6686a0dbce0@isovalent…
Changes in v5:
- Drop reuse_sk == sk check in inet[6]_steal_stock (Kuniyuki)
- Link to v4: https://lore.kernel.org/r/20230613-so-reuseport-v4-0-4ece76708bba@isovalent…
Changes in v4:
- WARN_ON_ONCE if reuseport socket is refcounted (Kuniyuki)
- Use inet[6]_ehashfn_t to shorten function declarations (Kuniyuki)
- Shuffle documentation patch around (Kuniyuki)
- Update commit message to explain why IPv6 needs EXPORT_SYMBOL
- Link to v3: https://lore.kernel.org/r/20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent…
Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent…
Changes in v2:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)
---
Daniel Borkmann (1):
selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
Lorenz Bauer (7):
udp: re-score reuseport groups when connected sockets are present
bpf: reject unhashed sockets in bpf_sk_assign
net: export inet_lookup_reuseport and inet6_lookup_reuseport
net: remove duplicate reuseport_lookup functions
net: document inet[6]_lookup_reuseport sk_state requirements
net: remove duplicate sk_lookup helpers
bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
include/net/inet6_hashtables.h | 81 ++++++++-
include/net/inet_hashtables.h | 74 +++++++-
include/net/sock.h | 7 +-
include/uapi/linux/bpf.h | 3 -
net/core/filter.c | 4 +-
net/ipv4/inet_hashtables.c | 68 ++++---
net/ipv4/udp.c | 88 ++++-----
net/ipv6/inet6_hashtables.c | 71 +++++---
net/ipv6/udp.c | 98 ++++------
tools/include/uapi/linux/bpf.h | 3 -
tools/testing/selftests/bpf/network_helpers.c | 3 +
.../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++
.../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++
13 files changed, 660 insertions(+), 179 deletions(-)
---
base-commit: 6f5a630d7c57cd79b1f526a95e757311e32d41e5
change-id: 20230613-so-reuseport-e92c526173ee
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
From: Zi Yan <ziy(a)nvidia.com>
Hi all,
File folio supports any order and people would like to support flexible orders
for anonymous folio[1] too. Currently, split_huge_page() only splits a huge
page to order-0 pages, but splitting to orders higher than 0 is also useful.
This patchset adds support for splitting a huge page to any lower order pages
and uses it during folio truncate operations.
The patchset is on top of mm-everything-2023-03-27-21-20.
Changelog from v1
===
1. Changed split_page_memcg() and split_page_owner() parameter to use order
2. Used folio_test_pmd_mappable() in place of the equivalent code
Details
===
* Patch 1 changes split_page_memcg() to use order instead of nr_pages
* Patch 2 changes split_page_owner() to use order instead of nr_pages
* Patch 3 and 4 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 5 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio
truncation instead of split the large folio all the way down to order-0.
* Patch 7 adds a test API to debugfs and test cases in
split_huge_page_test selftests.
Comments and/or suggestions are welcome.
[1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/
Zi Yan (7):
mm/memcg: use order instead of nr in split_page_memcg()
mm/page_owner: use order instead of nr in split_page_owner()
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages.
mm: truncate: split huge page cache page to a non-zero order if
possible.
mm: huge_memory: enable debugfs to split huge pages to any order.
include/linux/huge_mm.h | 10 +-
include/linux/memcontrol.h | 4 +-
include/linux/page_owner.h | 10 +-
mm/huge_memory.c | 137 ++++++++---
mm/memcontrol.c | 10 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 10 +-
mm/truncate.c | 21 +-
.../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++-
9 files changed, 366 insertions(+), 69 deletions(-)
--
2.39.2
In the Segment Routing (SR) architecture a list of instructions, called
segments, can be added to the packet headers to influence the forwarding and
processing of the packets in an SR enabled network.
Considering the Segment Routing over IPv6 data plane (SRv6) [1], the segment
identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID
List) is carried in the Segment Routing Header (SRH). A segment may correspond
to a "behavior" that is executed by a node when the packet is received.
The Linux kernel currently supports a large subset of the behaviors described
in [2] (e.g., End, End.X, End.T and so on).
In some SRv6 scenarios, the number of segments carried by the SID List may
increase dramatically, reducing the MTU (Maximum Transfer Unit) size and/or
limiting the processing power of legacy hardware devices (due to longer IPv6
headers).
The NEXT-C-SID mechanism [3] extends the SRv6 architecture by providing several
ways to efficiently represent the SID List.
By leveraging the NEXT-C-SID, is it possible to encode several SRv6 segments
within a single 128 bit SID address (also referenced as Compressed SID
Container). In this way, the length of the SID List can be drastically reduced.
The NEXT-C-SID mechanism is built upon the "flavors" framework defined in [2].
This framework is already supported by the Linux SRv6 subsystem and is used to
modify and/or extend a subset of existing behaviors.
In this patchset, we extend the SRv6 End.X behavior in order to support the
NEXT-C-SID mechanism.
In details, the patchset is made of:
- patch 1/2: add NEXT-C-SID support for SRv6 End.X behavior;
- patch 2/2: add selftest for NEXT-C-SID in SRv6 End.X behavior.
From the user space perspective, we do not need to change the iproute2 code to
support the NEXT-C-SID flavor for the SRv6 End.X behavior. However, we will
update the man page considering the NEXT-C-SID flavor applied to the SRv6 End.X
behavior in a separate patch.
Comments, improvements and suggestions are always appreciated.
Thank you all,
Andrea
[1] - https://datatracker.ietf.org/doc/html/rfc8754
[2] - https://datatracker.ietf.org/doc/html/rfc8986
[3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression
Andrea Mayer (1):
seg6: add NEXT-C-SID support for SRv6 End.X behavior
Paolo Lungaroni (1):
selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End.X
behavior
net/ipv6/seg6_local.c | 108 +-
tools/testing/selftests/net/Makefile | 1 +
.../net/srv6_end_x_next_csid_l3vpn_test.sh | 1213 +++++++++++++++++
3 files changed, 1302 insertions(+), 20 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh
--
2.20.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v8:
- Fix RV32 and the RV32 compat mode of RV64
- Extract out addr and base from the mmap macros
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 28 ++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 257 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.41.0
Here is a series with some fixes and cleanups to resctrl selftests.
v5:
- Improve changelogs
- Close fd_lm only in cat_val()
- Improve unmount error handling
v4:
- Move resctrlfs (unconditional) umount after resctrl fs support check
v3:
- Don't include rewritten CAT test into this series!
- Tweak wildcard style in Makefile
- Fix many changelog typos, remove some wrong claims, and generally
improve them.
- Add fix to PARENT_EXIT() to unmount resctrl FS
- Add unmounting resctrl FS before starting any tests
- Add fix for buf leak
- Add fix for perf fd closing
- Split mount/remount/umount patches differently
- Use size_t and %zu for span
- Keep MBM print as MB, only internally use span in bytes
- Drop start_buf global from fill_buf
v2 (was sent with CAT test rewrite which is no longer included in v3):
- Rebased on top of next to solve the conflicts
- Added 2 patches related to resctrl FS mount/umount (fix + cleanup)
- Consistently use "alloc" in cache_alloc_size()
- CAT test error handling tweaked
- Remove a spurious newline change from the CAT patch
- Small improvements to changelogs
Ilpo Järvinen (19):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Don't leak buffer in fill_cache()
selftests/resctrl: Unmount resctrl FS if child fails to run benchmark
selftests/resctrl: Close perf value read fd on errors
selftests/resctrl: Unmount resctrl FS before starting the first test
selftests/resctrl: Move resctrl FS mount/umount to higher level
selftests/resctrl: Refactor remount_resctrl(bool mum_resctrlfs) to
mount_resctrl()
selftests/resctrl: Remove mum_resctrlfs from struct resctrl_val_param
selftests/resctrl: Convert span to size_t
selftests/resctrl: Express span internally in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Remove unnecessary startptr global from fill_buf
selftests/resctrl: Improve parameter consistency in fill_buf
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Move CAT/CMT test global vars to function they are
used in
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Remove test type checks from cat_val()
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 66 +++++++-------
tools/testing/selftests/resctrl/cat_test.c | 28 ++----
tools/testing/selftests/resctrl/cmt_test.c | 29 ++-----
tools/testing/selftests/resctrl/fill_buf.c | 87 +++++++------------
tools/testing/selftests/resctrl/mba_test.c | 9 +-
tools/testing/selftests/resctrl/mbm_test.c | 17 ++--
tools/testing/selftests/resctrl/resctrl.h | 17 ++--
.../testing/selftests/resctrl/resctrl_tests.c | 83 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 7 +-
tools/testing/selftests/resctrl/resctrlfs.c | 64 +++++++-------
11 files changed, 178 insertions(+), 231 deletions(-)
--
2.30.2