From: Hui Zhu <zhuhui(a)kylinos.cn>
This series adds BPF struct_ops support to the memory controller,
enabling dynamic control over memory pressure through the
memcg_nr_pages_over_high mechanism. This allows administrators to
suppress low-priority cgroups' memory usage based on custom
policies implemented in BPF programs.
Background and Motivation
The memory controller provides memory.high limits to throttle
cgroups exceeding their soft limit. However, the current
implementation applies the same policy across all cgroups
without considering priority or workload characteristics.
This series introduces a BPF hook that allows reporting
additional "pages over high" for specific cgroups, effectively
increasing memory pressure and throttling for lower-priority
workloads when higher-priority cgroups need resources.
Use Case: Priority-Based Memory Management
Consider a system running both latency-sensitive services and
batch processing workloads. When the high-priority service
experiences memory pressure (detected via page scan events),
the BPF program can artificially inflate the "over high" count
for low-priority cgroups, causing them to be throttled more
aggressively and freeing up memory for the critical workload.
Implementation
This series builds upon Roman Gushchin's BPF OOM patch series in [1].
The implementation adds:
1. A memcg_bpf_ops struct_ops type with memcg_nr_pages_over_high
hook
2. Integration into memory pressure calculation paths
3. Cgroup hierarchy management (inheritance during online/offline)
4. SRCU protection for safe concurrent access
Why Not PSI?
This implementation does not use PSI for triggering, as discussed
in [2].
Instead, the sample code monitors PGSCAN events via tracepoints,
which provides more direct feedback on memory pressure.
Example Results
Testing on x86_64 QEMU (10 CPU, 4GB RAM, cache=none swap):
root@ubuntu:~# cat /proc/sys/vm/swappiness
60
root@ubuntu:~# mkdir /sys/fs/cgroup/high
root@ubuntu:~# mkdir /sys/fs/cgroup/low
root@ubuntu:~# ./memcg /sys/fs/cgroup/low /sys/fs/cgroup/high 100 1024
Successfully attached!
root@ubuntu:~# cgexec -g memory:low stress-ng --vm 4 --vm-keep --vm-bytes 80% \
--vm-method all --seed 2025 --metrics -t 60 \
& cgexec -g memory:high stress-ng --vm 4 --vm-keep --vm-bytes 80% \
--vm-method all --seed 2025 --metrics -t 60
[1] 1075
stress-ng: info: [1075] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [1076] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [1075] dispatching hogs: 4 vm
stress-ng: info: [1076] dispatching hogs: 4 vm
stress-ng: metrc: [1076] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [1076] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [1076] vm 21033377 60.47 158.04 3.66 347825.55 130076.67 66.85 834836
stress-ng: info: [1076] skipped: 0
stress-ng: info: [1076] passed: 4: vm (4)
stress-ng: info: [1076] failed: 0
stress-ng: info: [1076] metrics untrustworthy: 0
stress-ng: info: [1076] successful run completed in 1 min, 0.72 secs
root@ubuntu:~# stress-ng: metrc: [1075] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [1075] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [1075] vm 11568 65.05 0.00 0.21 177.83 56123.74 0.08 3200
stress-ng: info: [1075] skipped: 0
stress-ng: info: [1075] passed: 4: vm (4)
stress-ng: info: [1075] failed: 0
stress-ng: info: [1075] metrics untrustworthy: 0
stress-ng: info: [1075] successful run completed in 1 min, 5.06 secs
Results show the low-priority cgroup (/sys/fs/cgroup/low) was
significantly throttled:
- High-priority cgroup: 21,033,377 bogo ops at 347,825 ops/s
- Low-priority cgroup: 11,568 bogo ops at 177 ops/s
The stress-ng process in the low-priority cgroup experienced a
~99.9% slowdown in memory operations compared to the
high-priority cgroup, demonstrating effective priority
enforcement through BPF-controlled memory pressure.
Patch Overview
PATCH 1/3: Core kernel implementation
- Adds memcg_bpf_ops struct_ops support
- Implements cgroup lifecycle management
- Integrates hook into pressure calculation
PATCH 2/3: Selftest suite
- Validates attach/detach behavior
- Tests hierarchy inheritance
- Verifies throttling effectiveness
PATCH 3/3: Sample programs
- Demonstrates PGSCAN-based triggering
- Shows priority-based throttling
- Provides reference implementation
Changelog:
v2:
According to the comments of Tejun Heo, rebased on Roman Gushchin's BPF
OOM patch series [1] and added hierarchical delegation support.
According to the comments of Roman Gushchin and Michal Hocko, Designed
concrete use case scenarios and provided test results.
[1] https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.d…
[2] https://lore.kernel.org/lkml/1d9a162605a3f32ac215430131f7745488deaa34@linux…
Hui Zhu (3):
mm: memcontrol: Add BPF struct_ops for memory pressure control
selftests/bpf: Add tests for memcg_bpf_ops
samples/bpf: Add memcg priority control example
MAINTAINERS | 5 +
include/linux/memcontrol.h | 2 +
mm/bpf_memcontrol.c | 241 ++++++++++++-
mm/bpf_memcontrol.h | 73 ++++
mm/memcontrol.c | 27 +-
samples/bpf/.gitignore | 1 +
samples/bpf/Makefile | 9 +-
samples/bpf/memcg.bpf.c | 95 +++++
samples/bpf/memcg.c | 204 +++++++++++
.../selftests/bpf/prog_tests/memcg_ops.c | 340 ++++++++++++++++++
.../selftests/bpf/progs/memcg_ops_over_high.c | 95 +++++
11 files changed, 1082 insertions(+), 10 deletions(-)
create mode 100644 mm/bpf_memcontrol.h
create mode 100644 samples/bpf/memcg.bpf.c
create mode 100644 samples/bpf/memcg.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c
create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops_over_high.c
--
2.43.0
Hi Linus,
Please pull the kselftest fixes update for Linux 6.19-rc4.
linux_kselftest-fixes-6.19-rc4
-- Fix for build failures in tests that use an empty FIXTURE() seen in
Android's build environment, which uses -D_FORTIFY_SOURCE=3), a build
failure occurs in tests that use an empty FIXTURE().
-- Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920 board
resulting from including transient trace file name in checksum compare.
-- Fix to remove available_events requirement from toplevel-enable for
instance as it isn't a valid requirement for this test.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 8f0b4cce4481fb22653697cced8d0d04027cb1e8:
Linux 6.19-rc1 (2025-12-14 16:05:07 +1200)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.19-rc4
for you to fetch changes up to 19b8a76cd99bde6d299e60490f3e62b8d3df3997:
kselftest/harness: Use helper to avoid zero-size memset warning (2025-12-31 13:27:36 -0700)
----------------------------------------------------------------
linux_kselftest-fixes-6.19-rc4
-- Fix for build failures in tests that use an empty FIXTURE() seen in
Android's build environment, which uses -D_FORTIFY_SOURCE=3), a build
failure occurs in tests that use an empty FIXTURE().
-- Fix func_traceonoff_triggers.tc sometimes failures on Kunpeng-920 board
resulting from including transient trace file name in checksum compare.
-- Fix to remove available_events requirement from toplevel-enable for
instance as it isn't a valid requirement for this test.
----------------------------------------------------------------
Wake Liu (1):
kselftest/harness: Use helper to avoid zero-size memset warning
Yipeng Zou (1):
selftests/ftrace: traceonoff_triggers: strip off names
Zheng Yejian (1):
selftests/ftrace: Test toplevel-enable for instance
tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc | 3 ++-
.../selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc | 5 +++--
tools/testing/selftests/kselftest_harness.h | 8 +++++++-
3 files changed, 12 insertions(+), 4 deletions(-)
----------------------------------------------------------------
This is a simple ipvtap test to test handling
IP-address add/remove on ipvlan interface.
It creates a veth-interface and then creates several
network-namespace with ipvlan0 interface in it linked to veth.
Then it starts to add/remove addresses on ipvlan0 interfaces
in several threads.
At finish, it checks that there is no duplicated addresses.
Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry(a)huawei.com>
---
v4:
- Removed unneeded modprobe
- Number of threads is 8, if KSFT_MACHINE_SLOW==yes.
It is needed, since on debug-build test may take more than 15 minutes.
- Now veth is created in own namespace
- Added comment about why test adds/removes random ip
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 2 +
tools/testing/selftests/net/ipvtap_test.sh | 166 +++++++++++++++++++++
3 files changed, 169 insertions(+)
create mode 100755 tools/testing/selftests/net/ipvtap_test.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index b66ba04f19d9..45c4ea381bc3 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -48,6 +48,7 @@ TEST_PROGS := \
ipv6_flowlabel.sh \
ipv6_force_forwarding.sh \
ipv6_route_update_soft_lockup.sh \
+ ipvtap_test.sh \
l2_tos_ttl_inherit.sh \
l2tp.sh \
link_netns.py \
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index 1e1f253118f5..5702ab8fa5ad 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -48,6 +48,7 @@ CONFIG_IPV6_SEG6_LWTUNNEL=y
CONFIG_IPV6_SIT=y
CONFIG_IPV6_VTI=y
CONFIG_IPVLAN=m
+CONFIG_IPVTAP=m
CONFIG_KALLSYMS=y
CONFIG_L2TP=m
CONFIG_L2TP_ETH=m
@@ -122,6 +123,7 @@ CONFIG_TEST_BPF=m
CONFIG_TLS=m
CONFIG_TRACEPOINTS=y
CONFIG_TUN=y
+CONFIG_TAP=m
CONFIG_USER_NS=y
CONFIG_VETH=y
CONFIG_VLAN_8021Q=y
diff --git a/tools/testing/selftests/net/ipvtap_test.sh b/tools/testing/selftests/net/ipvtap_test.sh
new file mode 100755
index 000000000000..b4e18fc7ada0
--- /dev/null
+++ b/tools/testing/selftests/net/ipvtap_test.sh
@@ -0,0 +1,167 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Simple tests for ipvtap
+
+
+#
+# The testing environment looks this way:
+#
+# |------HNS-------| |------PHY-------|
+# | veth<----------------->veth |
+# |------|--|------| |----------------|
+# | |
+# | | |-----TST0-------|
+# | |------------|----ipvlan |
+# | |----------------|
+# |
+# | |-----TST1-------|
+# |---------------|----ipvlan |
+# |----------------|
+#
+
+ALL_TESTS="
+ test_ip_set
+"
+
+source lib.sh
+
+DEBUG=0
+
+VETH_HOST=vethtst.h
+VETH_PHY=vethtst.p
+
+NS_COUNT=32
+IP_ITERATIONS=1024
+[ "$KSFT_MACHINE_SLOW" = "yes" ] && NS_COUNT=8
+
+ns_run() {
+ ns=$1
+ shift
+ if [[ "$ns" == "global" ]]; then
+ "$@" >/dev/null
+ else
+ ip netns exec "$ns" "$@" >/dev/null
+ fi
+}
+
+test_ip_setup_env() {
+ setup_ns NS_PHY
+ setup_ns HST_NS
+
+ # setup simulated other-host (phy) and host itself
+ ns_run "$HST_NS" ip link add $VETH_HOST type veth peer name $VETH_PHY \
+ netns "$NS_PHY" >/dev/null
+ ns_run "$HST_NS" ip link set $VETH_HOST up
+ ns_run "$NS_PHY" ip link set $VETH_PHY up
+
+ for ((i=0; i<NS_COUNT; i++)); do
+ setup_ns ipvlan_ns_$i
+ ns="ipvlan_ns_$i"
+ if [ "$DEBUG" = "1" ]; then
+ echo "created NS ${!ns}"
+ fi
+ if ! ns_run "$HST_NS" ip link add netns ${!ns} ipvlan0 \
+ link $VETH_HOST \
+ type ipvtap mode l2 bridge; then
+ exit_error "FAIL: Failed to configure ipvlan link."
+ fi
+ done
+}
+
+test_ip_cleanup_env() {
+ ns_run "$HST_NS" ip link del $VETH_HOST
+ cleanup_all_ns
+}
+
+exit_error() {
+ echo "$1"
+ exit $ksft_fail
+}
+
+rnd() {
+ echo $(( RANDOM % 32 + 16 ))
+}
+
+test_ip_set_thread() {
+ # Here we are trying to create some IP conflicts between namespaces.
+ # If just add/remove IP, nothing interesting will happen.
+ # But if add random IP and then remove random IP,
+ # eventually conflicts start to apear.
+ ip link set ipvlan0 up
+ for ((i=0; i<IP_ITERATIONS; i++)); do
+ v=$(rnd)
+ ip a a "172.25.0.$v/24" dev ipvlan0 2>/dev/null
+ ip a a "fc00::$v/64" dev ipvlan0 2>/dev/null
+ v=$(rnd)
+ ip a d "172.25.0.$v/24" dev ipvlan0 2>/dev/null
+ ip a d "fc00::$v/64" dev ipvlan0 2>/dev/null
+ done
+}
+
+test_ip_set() {
+ RET=0
+
+ trap test_ip_cleanup_env EXIT
+
+ test_ip_setup_env
+
+ declare -A ns_pids
+ for ((i=0; i<NS_COUNT; i++)); do
+ ns="ipvlan_ns_$i"
+ ns_run ${!ns} bash -c "$0 test_ip_set_thread"&
+ ns_pids[$i]=$!
+ done
+
+ for ((i=0; i<NS_COUNT; i++)); do
+ wait "${ns_pids[$i]}"
+ done
+
+ declare -A all_ips
+ for ((i=0; i<NS_COUNT; i++)); do
+ ns="ipvlan_ns_$i"
+ ip_output=$(ip netns exec ${!ns} ip a l dev ipvlan0 | grep inet)
+ while IFS= read -r nsip_out; do
+ if [[ -z $nsip_out ]]; then
+ continue;
+ fi
+ nsip=$(awk '{print $2}' <<< "$nsip_out")
+ if [[ -v all_ips[$nsip] ]]; then
+ RET=$ksft_fail
+ log_test "conflict for $nsip"
+ return "$RET"
+ else
+ all_ips[$nsip]=$i
+ fi
+ done <<< "$ip_output"
+ done
+
+ if [ "$DEBUG" = "1" ]; then
+ for key in "${!all_ips[@]}"; do
+ echo "$key: ${all_ips[$key]}"
+ done
+ fi
+
+ trap - EXIT
+ test_ip_cleanup_env
+
+ log_test "test multithreaded ip set"
+}
+
+if [[ "$1" == "-d" ]]; then
+ DEBUG=1
+ shift
+fi
+
+if [[ "$1" == "-t" ]]; then
+ shift
+ TESTS="$*"
+fi
+
+if [[ "$1" == "test_ip_set_thread" ]]; then
+ test_ip_set_thread
+else
+ require_command ip
+
+ tests_run
+fi
--
2.43.0
struct bpf_struct_ops's cfi_stubs field is used as a readonly pointer
but has type void *. Change its type to void const * to allow it to
point to readonly global memory. Update the struct_ops implementations
to declare their cfi_stubs global variables as const.
Caleb Sander Mateos (5):
bpf: use const pointer for struct_ops cfi_stubs
HID: bpf: make __bpf_hid_bpf_ops const
sched_ext: make __bpf_ops_sched_ext_ops const
net: make cfi_stubs globals const
selftests/bpf: make cfi_stubs globals const
drivers/hid/bpf/hid_bpf_struct_ops.c | 2 +-
include/linux/bpf.h | 2 +-
kernel/bpf/bpf_struct_ops.c | 6 +++---
kernel/sched/ext.c | 2 +-
net/bpf/bpf_dummy_struct_ops.c | 2 +-
net/ipv4/bpf_tcp_ca.c | 2 +-
net/sched/bpf_qdisc.c | 2 +-
net/smc/smc_hs_bpf.c | 2 +-
.../testing/selftests/bpf/test_kmods/bpf_test_no_cfi.c | 2 +-
tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 10 +++++-----
10 files changed, 16 insertions(+), 16 deletions(-)
--
2.45.2
From: Fred Griffoul <fgriffo(a)amazon.co.uk>
This patch series addresses both performance and correctness issues in
nested VMX when handling guest memory.
During nested VMX operations, L0 (KVM) accesses specific L1 guest pages
to manage L2 execution. These pages fall into two categories: pages
accessed only by L0 (such as the L1 MSR bitmap page or the eVMCS page),
and pages passed to the L2 guest via vmcs02 (such as APIC access,
virtual APIC, and posted interrupt descriptor pages).
The current implementation uses kvm_vcpu_map/unmap, which causes two
issues.
First, the current approach is missing proper invalidation handling in
critical scenarios. Enlightened VMCS (eVMCS) pages can become stale when
memslots are modified, as there is no mechanism to invalidate the cached
mappings. Similarly, APIC access and virtual APIC pages can be migrated
by the host, but without proper notification through mmu_notifier
callbacks, the mappings become invalid and can lead to incorrect
behavior.
Second, for unmanaged guest memory (memory not directly mapped by the
kernel, such as memory passed with the mem= parameter or guest_memfd for
non-CoCo VMs), this workflow invokes expensive memremap/memunmap
operations on every L2 VM entry/exit cycle. This creates significant
overhead that impacts nested virtualization performance.
This series replaces kvm_host_map with gfn_to_pfn_cache in nested VMX.
The pfncache infrastructure maintains persistent mappings as long as the
page GPA does not change, eliminating the memremap/memunmap overhead on
every VM entry/exit cycle. Additionally, pfncache provides proper
invalidation handling via mmu_notifier callbacks and memslots generation
check, ensuring that mappings are correctly updated during both memslot
updates and page migration events.
As an example, a microbenchmark using memslot_perf_test with 8192
memslots demonstrates huge improvements in nested VMX operations with
unmanaged guest memory (this is a synthetic benchmark run on
AWS EC2 Nitro instances, and the results are not representative of
typical nested virtualization workloads):
Before After Improvement
map: 26.12s 1.54s ~17x faster
unmap: 40.00s 0.017s ~2353x faster
unmap chunked: 10.07s 0.005s ~2014x faster
The series is organized as follows:
Patches 1-5 handle the L1 MSR bitmap page and system pages (APIC access,
virtual APIC, and posted interrupt descriptor). Patch 1 converts the MSR
bitmap to use gfn_to_pfn_cache. Patches 2-3 restore and complete
"guest-uses-pfn" support in pfncache. Patch 4 converts the system pages
to use gfn_to_pfn_cache. Patch 5 adds a selftest for cache invalidation
and memslot updates.
Patches 6-7 add enlightened VMCS support. Patch 6 avoids accessing eVMCS
fields after they are copied into the cached vmcs12 structure. Patch 7
converts eVMCS page mapping to use gfn_to_pfn_cache.
Patches 8-10 implement persistent nested context to handle L2 vCPU
multiplexing and migration between L1 vCPUs. Patch 8 introduces the
nested context management infrastructure. Patch 9 integrates pfncache
with persistent nested context. Patch 10 adds a selftest for this L2
vCPU context switching.
v4:
- Rebase on kvm/next required additional vapic handling in patch 4
and a tiny fix in patch 5.
- Fix patch 9 to re-assign vcpu to pfncache if the nested
context has been recycled, and to clear the vcpu context in
free_nested().
v3:
- fixed warnings reported by kernel test robot in patches 7 and 8.
v2:
- Extended series to support enlightened VMCS (eVMCS).
- Added persistent nested context for improved L2 vCPU handling.
- Added additional selftests.
Suggested-by: dwmw(a)amazon.co.uk
Fred Griffoul (10):
KVM: nVMX: Implement cache for L1 MSR bitmap
KVM: pfncache: Restore guest-uses-pfn support
KVM: x86: Add nested state validation for pfncache support
KVM: nVMX: Implement cache for L1 APIC pages
KVM: selftests: Add nested VMX APIC cache invalidation test
KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry
KVM: nVMX: Replace evmcs kvm_host_map with pfncache
KVM: x86: Add nested context management
KVM: nVMX: Use nested context for pfncache persistence
KVM: selftests: Add L2 vcpu context switch test
arch/x86/include/asm/kvm_host.h | 32 ++
arch/x86/include/uapi/asm/kvm.h | 2 +
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/nested.c | 199 +++++++
arch/x86/kvm/vmx/hyperv.c | 5 +-
arch/x86/kvm/vmx/hyperv.h | 33 +-
arch/x86/kvm/vmx/nested.c | 499 ++++++++++++++----
arch/x86/kvm/vmx/vmx.c | 8 +
arch/x86/kvm/vmx/vmx.h | 16 +-
arch/x86/kvm/x86.c | 19 +-
include/linux/kvm_host.h | 34 +-
include/linux/kvm_types.h | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../selftests/kvm/x86/vmx_apic_update_test.c | 302 +++++++++++
.../selftests/kvm/x86/vmx_l2_switch_test.c | 416 +++++++++++++++
virt/kvm/kvm_main.c | 3 +-
virt/kvm/kvm_mm.h | 6 +-
virt/kvm/pfncache.c | 43 +-
18 files changed, 1496 insertions(+), 126 deletions(-)
create mode 100644 arch/x86/kvm/nested.c
create mode 100644 tools/testing/selftests/kvm/x86/vmx_apic_update_test.c
create mode 100644 tools/testing/selftests/kvm/x86/vmx_l2_switch_test.c
base-commit: 0499add8efd72456514c6218c062911ccc922a99
--
2.43.0
The cache parameter of getcpu() is useless nowadays for various reasons.
* It is never passed by userspace for either the vDSO or syscalls.
* It is never used by the kernel.
* It could not be made to work on the current vDSO architecture.
* The structure definition is not part of the UAPI headers.
* vdso_getcpu() is superseded by restartable sequences in any case.
Remove the struct and its header.
As a side-effect we get rid of an unwanted inclusion of the linux/
header namespace from vDSO code.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Rebase on v6.19-rc1
- Fix conflict with UML vdso_getcpu() removal
- Flesh out commit message
- Link to v2: https://lore.kernel.org/r/20251013-getcpu_cache-v2-1-880fbfa3b7cc@linutroni…
Changes in v2:
- Rebase on v6.18-rc1
- Link to v1: https://lore.kernel.org/r/20250826-getcpu_cache-v1-1-8748318f6141@linutroni…
---
We could also completely remove the parameter, but I am not sure if
that is a good idea for syscalls and vDSO entrypoints.
---
arch/loongarch/vdso/vgetcpu.c | 5 ++---
arch/s390/kernel/vdso/getcpu.c | 3 +--
arch/s390/kernel/vdso/vdso.h | 4 +---
arch/x86/entry/vdso/vgetcpu.c | 5 ++---
arch/x86/include/asm/vdso/processor.h | 4 +---
include/linux/getcpu.h | 19 -------------------
include/linux/syscalls.h | 3 +--
kernel/sys.c | 4 +---
tools/testing/selftests/vDSO/vdso_test_getcpu.c | 4 +---
9 files changed, 10 insertions(+), 41 deletions(-)
diff --git a/arch/loongarch/vdso/vgetcpu.c b/arch/loongarch/vdso/vgetcpu.c
index 73af49242ecd..6f054ec898c7 100644
--- a/arch/loongarch/vdso/vgetcpu.c
+++ b/arch/loongarch/vdso/vgetcpu.c
@@ -4,7 +4,6 @@
*/
#include <asm/vdso.h>
-#include <linux/getcpu.h>
static __always_inline int read_cpu_id(void)
{
@@ -28,8 +27,8 @@ static __always_inline int read_cpu_id(void)
}
extern
-int __vdso_getcpu(unsigned int *cpu, unsigned int *node, struct getcpu_cache *unused);
-int __vdso_getcpu(unsigned int *cpu, unsigned int *node, struct getcpu_cache *unused)
+int __vdso_getcpu(unsigned int *cpu, unsigned int *node, void *unused);
+int __vdso_getcpu(unsigned int *cpu, unsigned int *node, void *unused)
{
int cpu_id;
diff --git a/arch/s390/kernel/vdso/getcpu.c b/arch/s390/kernel/vdso/getcpu.c
index 5c5d4a848b76..1e17665616c5 100644
--- a/arch/s390/kernel/vdso/getcpu.c
+++ b/arch/s390/kernel/vdso/getcpu.c
@@ -2,11 +2,10 @@
/* Copyright IBM Corp. 2020 */
#include <linux/compiler.h>
-#include <linux/getcpu.h>
#include <asm/timex.h>
#include "vdso.h"
-int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
+int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, void *unused)
{
union tod_clock clk;
diff --git a/arch/s390/kernel/vdso/vdso.h b/arch/s390/kernel/vdso/vdso.h
index 8cff033dd854..1fe52a6f5a56 100644
--- a/arch/s390/kernel/vdso/vdso.h
+++ b/arch/s390/kernel/vdso/vdso.h
@@ -4,9 +4,7 @@
#include <vdso/datapage.h>
-struct getcpu_cache;
-
-int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused);
+int __s390_vdso_getcpu(unsigned *cpu, unsigned *node, void *unused);
int __s390_vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz);
int __s390_vdso_clock_gettime(clockid_t clock, struct __kernel_timespec *ts);
int __s390_vdso_clock_getres(clockid_t clock, struct __kernel_timespec *ts);
diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index e4640306b2e3..6381b472b7c5 100644
--- a/arch/x86/entry/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
@@ -6,17 +6,16 @@
*/
#include <linux/kernel.h>
-#include <linux/getcpu.h>
#include <asm/segment.h>
#include <vdso/processor.h>
notrace long
-__vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
+__vdso_getcpu(unsigned *cpu, unsigned *node, void *unused)
{
vdso_read_cpunode(cpu, node);
return 0;
}
-long getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
+long getcpu(unsigned *cpu, unsigned *node, void *tcache)
__attribute__((weak, alias("__vdso_getcpu")));
diff --git a/arch/x86/include/asm/vdso/processor.h b/arch/x86/include/asm/vdso/processor.h
index 7000aeb59aa2..93e0e24e5cb4 100644
--- a/arch/x86/include/asm/vdso/processor.h
+++ b/arch/x86/include/asm/vdso/processor.h
@@ -18,9 +18,7 @@ static __always_inline void cpu_relax(void)
native_pause();
}
-struct getcpu_cache;
-
-notrace long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused);
+notrace long __vdso_getcpu(unsigned *cpu, unsigned *node, void *unused);
#endif /* __ASSEMBLER__ */
diff --git a/include/linux/getcpu.h b/include/linux/getcpu.h
deleted file mode 100644
index c304dcdb4eac..000000000000
--- a/include/linux/getcpu.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_GETCPU_H
-#define _LINUX_GETCPU_H 1
-
-/* Cache for getcpu() to speed it up. Results might be a short time
- out of date, but will be faster.
-
- User programs should not refer to the contents of this structure.
- I repeat they should not refer to it. If they do they will break
- in future kernels.
-
- It is only a private cache for vgetcpu(). It will change in future kernels.
- The user program must store this information per thread (__thread)
- If you want 100% accurate information pass NULL instead. */
-struct getcpu_cache {
- unsigned long blob[128 / sizeof(long)];
-};
-
-#endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cf84d98964b2..23704e006afd 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -59,7 +59,6 @@ struct compat_stat;
struct old_timeval32;
struct robust_list_head;
struct futex_waitv;
-struct getcpu_cache;
struct old_linux_dirent;
struct perf_event_attr;
struct file_handle;
@@ -718,7 +717,7 @@ asmlinkage long sys_getrusage(int who, struct rusage __user *ru);
asmlinkage long sys_umask(int mask);
asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
-asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
+asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, void __user *cache);
asmlinkage long sys_gettimeofday(struct __kernel_old_timeval __user *tv,
struct timezone __user *tz);
asmlinkage long sys_settimeofday(struct __kernel_old_timeval __user *tv,
diff --git a/kernel/sys.c b/kernel/sys.c
index 8b58eece4e58..f1780ab132a3 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -31,7 +31,6 @@
#include <linux/tty.h>
#include <linux/signal.h>
#include <linux/cn_proc.h>
-#include <linux/getcpu.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/seccomp.h>
#include <linux/cpu.h>
@@ -2876,8 +2875,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
return error;
}
-SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep,
- struct getcpu_cache __user *, unused)
+SYSCALL_DEFINE3(getcpu, unsigned __user *, cpup, unsigned __user *, nodep, void __user *, unused)
{
int err = 0;
int cpu = raw_smp_processor_id();
diff --git a/tools/testing/selftests/vDSO/vdso_test_getcpu.c b/tools/testing/selftests/vDSO/vdso_test_getcpu.c
index bea8ad54da11..3fe49cbdae98 100644
--- a/tools/testing/selftests/vDSO/vdso_test_getcpu.c
+++ b/tools/testing/selftests/vDSO/vdso_test_getcpu.c
@@ -16,9 +16,7 @@
#include "vdso_config.h"
#include "vdso_call.h"
-struct getcpu_cache;
-typedef long (*getcpu_t)(unsigned int *, unsigned int *,
- struct getcpu_cache *);
+typedef long (*getcpu_t)(unsigned int *, unsigned int *, void *);
int main(int argc, char **argv)
{
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20250825-getcpu_cache-3abcd2e65437
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The `FIXTURE(args)` macro defines an empty `struct _test_data_args`,
leading to `sizeof(struct _test_data_args)` evaluating to 0. This
caused a build error due to a compiler warning on a `memset` call
with a zero size argument.
Adding a dummy member to the struct ensures its size is non-zero,
resolving the build issue.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/futex/functional/futex_requeue_pi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/futex/functional/futex_requeue_pi.c b/tools/testing/selftests/futex/functional/futex_requeue_pi.c
index f299d75848cd..000fec468835 100644
--- a/tools/testing/selftests/futex/functional/futex_requeue_pi.c
+++ b/tools/testing/selftests/futex/functional/futex_requeue_pi.c
@@ -52,6 +52,7 @@ struct thread_arg {
FIXTURE(args)
{
+ char dummy;
};
FIXTURE_SETUP(args)
--
2.52.0.rc1.455.g30608eb744-goog
'available_events' is actually not required by
'test.d/event/toplevel-enable.tc' and its Existence has been tested in
'test.d/00basic/basic4.tc'.
So the require of 'available_events' can be dropped and then we can add
'instance' flag to test 'test.d/event/toplevel-enable.tc' for instance.
Test result show as below:
# ./ftracetest test.d/event/toplevel-enable.tc
=== Ftrace unit tests ===
[1] event tracing - enable/disable with top level files [PASS]
[2] (instance) event tracing - enable/disable with top level files [PASS]
# of passed: 2
# of failed: 0
# of unresolved: 0
# of untested: 0
# of unsupported: 0
# of xfailed: 0
# of undefined(test bug): 0
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
---
tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
index 93c10ea42a68..8b8e1aea985b 100644
--- a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc
@@ -1,7 +1,8 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: event tracing - enable/disable with top level files
-# requires: available_events set_event events/enable
+# requires: set_event events/enable
+# flags: instance
do_reset() {
echo > set_event
--
2.25.1