This series creates a new PMU scheme on ARM, a partitioned PMU that
allows reserving a subset of counters for more direct guest access,
significantly reducing overhead. More details, including performance
benchmarks, can be read in the v1 cover letter linked below.
v4:
* Apply Mark Brown's non-UNDEF FGT control commit to the PMU FGT
controls and calculate those controls with the others in
kvm_calculate_traps()
* Introduce lazy context swaps for guests that only turns on for
guests that have enabled partitioning and accessed PMU registers.
* Rename pmu-part.c to pmu-direct.c because future features might
achieve direct PMU access without partitioning.
* Better explain certain commits, such as why the untrapped registers
are safe to untrap.
* Reduce the PMU include cleanup down to only what is still necessary
and explain why.
v3:
https://lore.kernel.org/kvm/20250626200459.1153955-1-coltonlewis@google.com/
v2:
https://lore.kernel.org/kvm/20250620221326.1261128-1-coltonlewis@google.com/
v1:
https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/
Colton Lewis (21):
arm64: cpufeature: Add cpucap for HPMN0
KVM: arm64: Reorganize PMU functions
perf: arm_pmuv3: Introduce method to partition the PMU
perf: arm_pmuv3: Generalize counter bitmasks
perf: arm_pmuv3: Keep out of guest counter partition
KVM: arm64: Account for partitioning in kvm_pmu_get_max_counters()
KVM: arm64: Set up FGT for Partitioned PMU
KVM: arm64: Writethrough trapped PMEVTYPER register
KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned
KVM: arm64: Writethrough trapped PMOVS register
KVM: arm64: Write fast path PMU register handlers
KVM: arm64: Setup MDCR_EL2 to handle a partitioned PMU
KVM: arm64: Account for partitioning in PMCR_EL0 access
KVM: arm64: Context swap Partitioned PMU guest registers
KVM: arm64: Enforce PMU event filter at vcpu_load()
KVM: arm64: Extract enum debug_owner to enum vcpu_register_owner
KVM: arm64: Implement lazy PMU context swaps
perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
KVM: arm64: Inject recorded guest interrupts
KVM: arm64: Add ioctl to partition the PMU when supported
KVM: arm64: selftests: Add test case for partitioned PMU
Marc Zyngier (1):
KVM: arm64: Reorganize PMU includes
Mark Brown (1):
KVM: arm64: Introduce non-UNDEF FGT control
Documentation/virt/kvm/api.rst | 21 +
arch/arm/include/asm/arm_pmuv3.h | 38 +
arch/arm64/include/asm/arm_pmuv3.h | 61 +-
arch/arm64/include/asm/kvm_host.h | 34 +-
arch/arm64/include/asm/kvm_pmu.h | 123 +++
arch/arm64/include/asm/kvm_types.h | 7 +-
arch/arm64/kernel/cpufeature.c | 8 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 22 +
arch/arm64/kvm/debug.c | 33 +-
arch/arm64/kvm/hyp/include/hyp/debug-sr.h | 6 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 181 ++++-
arch/arm64/kvm/pmu-direct.c | 395 ++++++++++
arch/arm64/kvm/pmu-emul.c | 674 +---------------
arch/arm64/kvm/pmu.c | 725 ++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 137 +++-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 6 +-
drivers/perf/arm_pmuv3.c | 128 +++-
include/linux/perf/arm_pmu.h | 1 +
include/linux/perf/arm_pmuv3.h | 14 +-
include/uapi/linux/kvm.h | 4 +
tools/include/uapi/linux/kvm.h | 2 +
.../selftests/kvm/arm64/vpmu_counter_access.c | 62 +-
24 files changed, 1910 insertions(+), 775 deletions(-)
create mode 100644 arch/arm64/kvm/pmu-direct.c
base-commit: 79150772457f4d45e38b842d786240c36bb1f97f
--
2.50.0.727.gbf7dc18ff4-goog
This patchset uses kpageflags to get after-split folio orders for a better
split_huge_page_test result check[1]. The added gather_folio_orders() scans
through a VPN range and collects the numbers of folios at different orders.
check_folio_orders() compares the result of gather_folio_orders() to
a given list of numbers of different orders.
This patchset also added new order and in folio offset to the split huge
page debugfs's pr_debug()s;
Changelog
===
From V2[3]:
1. Added two missing free()s in check_folio_orders().
2. Reimplemented is_backed_by_thp() to use kpageflags to get precise
folio order information and renamed it to is_backed_by_folio() in new
Patch 3.
3. Renamed *_file to *_fd in Patch 2.
4. Indentation fixes.
5. Fixed vaddr stepping issue in gather_folio_orders() when a compound
tail page is encountered.
6. Used pmd_order in place of max_order in split_huge_page_test.c.
7. Documented gather_folio_orders().
From V1[2]:
1. Dropped split_huge_pages_pid() for loop step change to avoid messing
up with PTE-mapped THP handling. split_huge_page_test.c is changed to
perform split at [addr, addr + pagesize) range to limit one
folio_split() per folio.
2. Moved pr_debug changes in Patch 2 to Patch 1.
3. Moved KPF_* to vm_util.h and used PAGEMAP_PFN instead of local PFN_MASK.
4. Used pagemap_get_pfn() helper.
5. Used char *vaddr and size_t len as inputs to gather_folio_orders() and
check_folio_orders() instead of vpn and nr_pages.
6. Removed variable length variables and used malloc instead.
[1] https://lore.kernel.org/linux-mm/e2f32bdb-e4a4-447c-867c-31405cbba151@redha…
[2] https://lore.kernel.org/linux-mm/20250806022045.342824-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20250808190144.797076-1-ziy@nvidia.com/
Zi Yan (4):
mm/huge_memory: add new_order and offset to split_huge_pages*()
pr_debug.
selftests/mm: add check_folio_orders() helper.
selftests/mm: reimplement is_backed_by_thp() with more precise check
selftests/mm: check after-split folio orders in split_huge_page_test.
mm/huge_memory.c | 8 +-
.../selftests/mm/split_huge_page_test.c | 154 +++++++++++-----
tools/testing/selftests/mm/vm_util.c | 173 ++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 8 +
4 files changed, 292 insertions(+), 51 deletions(-)
--
2.47.2
With /proc/pid/maps now being read under per-vma lock protection we can
reuse parts of that code to execute PROCMAP_QUERY ioctl also without
taking mmap_lock. The change is designed to reduce mmap_lock contention
and prevent PROCMAP_QUERY ioctl calls from blocking address space updates.
This patchset was split out of the original patchset [1] that introduced
per-vma lock usage for /proc/pid/maps reading. It contains PROCMAP_QUERY
tests, code refactoring patch to simplify the main change and the actual
transition to per-vma lock.
Changes since v3 [2]
- change lock_vma_range()/unlock_vma_range() parameters,
per Lorenzo Stoakes
- minimize priv->lock_ctx dereferences by storing it in a local variable,
per Lorenzo Stoakes
- rename unlock_vma to unlock_ctx_vma, per Lorenzo Stoakes
- factored out reset_lock_ctx(), per Lorenzo Stoakes
- reset lock_ctx->mmap_locked inside query_vma_teardown(),
per Lorenzo Stoakes
- add clarifying comments in query_vma_find_by_addr() and
procfs_procmap_ioctl(), per Lorenzo Stoakes
- refactored error handling code inside query_vma_find_by_addr(),
per Lorenzo Stoakes
- add Acked-by as changes were cosmetic, per SeongJae Park
[1] https://lore.kernel.org/all/20250704060727.724817-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250806155905.824388-1-surenb@google.com/
Suren Baghdasaryan (3):
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
fs/proc/task_mmu: factor out proc_maps_private fields used by
PROCMAP_QUERY
fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks
fs/proc/internal.h | 15 +-
fs/proc/task_mmu.c | 184 ++++++++++++------
fs/proc/task_nommu.c | 14 +-
tools/testing/selftests/proc/proc-maps-race.c | 65 +++++++
4 files changed, 210 insertions(+), 68 deletions(-)
base-commit: c2144e09b922d422346a44d72b674bf61dbd84c0
--
2.50.1.703.g449372360f-goog
From: Xu Kuohai <xukuohai(a)huawei.com>
When the bpf ring buffer is full, new events can not be recorded util
the consumer consumes some events to free space. This may cause critical
events to be discarded, such as in fault diagnostic, where recent events
are more critical than older ones.
So add ovewrite mode for bpf ring buffer. In this mode, the new event
overwrites the oldest event when the buffer is full.
Xu Kuohai (4):
bpf: Add overwrite mode for bpf ring buffer
libbpf: ringbuf: Add overwrite ring buffer process
selftests/bpf: Add test for overwrite ring buffer
selftests/bpf/benchs: Add overwrite mode bench for rb-libbpf
include/uapi/linux/bpf.h | 4 +
kernel/bpf/ringbuf.c | 159 +++++++++++++++---
tools/include/uapi/linux/bpf.h | 4 +
tools/lib/bpf/ringbuf.c | 103 +++++++++++-
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/benchs/bench_ringbufs.c | 22 ++-
.../bpf/benchs/run_bench_ringbufs.sh | 4 +
.../selftests/bpf/prog_tests/ringbuf.c | 74 ++++++++
.../bpf/progs/test_ringbuf_overwrite.c | 98 +++++++++++
9 files changed, 442 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c
--
2.43.0
For now, the tcp socket lookup will terminate if the socket is reuse port
in inet_lhash2_lookup(), which makes the socket is not the best match.
For example, we have socket1 and socket2 both listen on "0.0.0.0:1234",
but socket1 bind on "eth0". We create socket1 first, and then socket2.
Then, all connections will goto socket2, which is not expected, as socket1
has higher priority.
The 1st patch fix this problem, and the 2nd patch is a selftests for this
problem. Without the 1st patch, the selftests will fail with:
$ ./tcp_reuseport.py
TAP version 13
1..1
FAIL: wrong assignment
not ok 1 tcp_reuseport.test_reuseport_select
Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
With the 1st patch, it will success:
$ ./tcp_reuseport.py
TAP version 13
1..1
SUCCESS: assigned properly: (<socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 33787), raddr=('127.0.0.1', 43140)>, ('127.0.0.1', 43140))
SUCCESS: assigned properly: (<socket.socket fd=5, family=2, type=1, proto=0, laddr=('127.0.0.1', 33787), raddr=('127.0.0.1', 43146)>, ('127.0.0.1', 43146))
SUCCESS: assigned properly: (<socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 33787), raddr=('127.0.0.1', 43162)>, ('127.0.0.1', 43162))
ok 1 tcp_reuseport.test_reuseport_select
Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Changes since V2:
* use the approach in V1
* add the Fixes tag in the 1st patch
* introduce the selftests
Menglong Dong (2):
net: tcp: lookup the best matched listen socket
selftests/net: test TCP reuseport socket selection
net/ipv4/inet_hashtables.c | 13 +++----
net/ipv6/inet6_hashtables.c | 13 +++----
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/tcp_reuseport.py | 36 ++++++++++++++++++++
4 files changed, 51 insertions(+), 12 deletions(-)
create mode 100755 tools/testing/selftests/net/tcp_reuseport.py
--
2.50.1
This is series 2a/5 of the migration to `core::ffi::CStr`[0].
20250704-core-cstr-prepare-v1-0-a91524037783(a)gmail.com.
This series depends on the prior series[0] and is intended to go through
the rust tree to reduce the number of release cycles required to
complete the work.
Subsystem maintainers: I would appreciate your `Acked-by`s so that this
can be taken through Miguel's tree (where the other series must go).
[0] https://lore.kernel.org/all/20250704-core-cstr-prepare-v1-0-a91524037783@gm…
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v3:
- Add a patch to address new code in device.rs.
- Drop incorrectly applied Acked-by tags from Danilo.
- Link to v2: https://lore.kernel.org/r/20250719-core-cstr-fanout-1-v2-0-1ab5ba189c6e@gma…
Changes in v2:
- Rebase on rust-next.
- Drop pin-init patch, which is no longer needed.
- Link to v1: https://lore.kernel.org/r/20250709-core-cstr-fanout-1-v1-0-64308e7203fc@gma…
---
Tamir Duberstein (9):
gpu: nova-core: use `kernel::{fmt,prelude::fmt!}`
rust: alloc: use `kernel::{fmt,prelude::fmt!}`
rust: block: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
rust: file: use `kernel::{fmt,prelude::fmt!}`
rust: kunit: use `kernel::{fmt,prelude::fmt!}`
rust: seq_file: use `kernel::{fmt,prelude::fmt!}`
rust: sync: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
drivers/block/rnull.rs | 2 +-
drivers/gpu/nova-core/gpu.rs | 3 +--
drivers/gpu/nova-core/regs/macros.rs | 6 +++---
rust/kernel/alloc/kbox.rs | 2 +-
rust/kernel/alloc/kvec.rs | 2 +-
rust/kernel/alloc/kvec/errors.rs | 2 +-
rust/kernel/block/mq.rs | 2 +-
rust/kernel/block/mq/gen_disk.rs | 2 +-
rust/kernel/block/mq/raw_writer.rs | 3 +--
rust/kernel/device.rs | 6 +++---
rust/kernel/device/property.rs | 23 ++++++++++++-----------
rust/kernel/fs/file.rs | 5 +++--
rust/kernel/kunit.rs | 8 ++++----
rust/kernel/seq_file.rs | 6 +++---
rust/kernel/sync/arc.rs | 2 +-
scripts/rustdoc_test_gen.rs | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250709-core-cstr-fanout-1-f20611832272
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>