When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array and PC-relative addressing mode for global variable,
e.g. "1@-96(%rbp,%rax,8)" and "-1@4+t1(%rip)".
The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).
This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
`parse_usdt_arg`.
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
argument spec.
- change the global variable t1 to a local variable, to avoid compiler
generating PC-relative addressing mode for it.
Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.
The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how
Change since v2:
- fix the `scale` uninitialized error
Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
and pass all test cases.
Change since v4:
- split the patch into two parts, one for the fix and the other for the
test
Change since v5:
- Only enable optimization for x86 architecture to generate SIB addressing
usdt argument spec.
Do we need to add support for PC-relative USDT argument spec handling in
libbpf? I have some interest in this question, but currently have no
ideas. Getting offsets based on symbols requires dependency on the symbol
table. However, once the binary file is stripped, the symtab will also be
removed, which will cause this approach to fail. Does anyone have any
thoughts on this?
Jiawei Zhao (2):
libbpf: fix USDT SIB argument handling causing unrecognized register
error
selftests/bpf: Force -O2 for USDT selftests to cover SIB handling
logic
tools/lib/bpf/usdt.bpf.h | 33 +++++++++++++-
tools/lib/bpf/usdt.c | 43 ++++++++++++++++---
tools/testing/selftests/bpf/Makefile | 8 ++++
tools/testing/selftests/bpf/prog_tests/usdt.c | 18 +++++---
4 files changed, 89 insertions(+), 13 deletions(-)
--
2.43.0
Userspace generally expects APIs that return -EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the -EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead. While we're at it, refactor some fscontext_read() into a
separate helper to make the ring buffer logic a bit easier to read.
Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v2:
- Refactor message fetching to fetch_message_locked() which returns
ERR_PTR() in error cases. [Al Viro]
- v1: <https://lore.kernel.org/r/20250806-fscontext-log-cleanups-v1-0-880597d42a5a…>
---
Aleksa Sarai (2):
fscontext: do not consume log entries when returning -EMSGSIZE
selftests/filesystems: add basic fscontext log tests
fs/fsopen.c | 54 +++++-----
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
tools/testing/selftests/filesystems/fclog.c | 135 +++++++++++++++++++++++++
4 files changed, 167 insertions(+), 25 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250806-fscontext-log-cleanups-50f0143674ae
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
Userspace generally expects APIs that return EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead.
Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Aleksa Sarai (2):
fscontext: do not consume log entries for -EMSGSIZE case
selftests/filesystems: add basic fscontext log tests
fs/fsopen.c | 22 ++--
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
tools/testing/selftests/filesystems/fclog.c | 137 +++++++++++++++++++++++++
4 files changed, 153 insertions(+), 9 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250806-fscontext-log-cleanups-50f0143674ae
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
With /proc/pid/maps now being read under per-vma lock protection we can
reuse parts of that code to execute PROCMAP_QUERY ioctl also without
taking mmap_lock. The change is designed to reduce mmap_lock contention
and prevent PROCMAP_QUERY ioctl calls from blocking address space updates.
This patchset was split out of the original patchset [1] that introduced
per-vma lock usage for /proc/pid/maps reading. It contains PROCMAP_QUERY
tests, code refactoring patch to simplify the main change and the actual
transition to per-vma lock.
Changes since v1 [2]
- Added Tested-by and Acked-by, per SeongJae Park
- Fixed NOMMU case, per Vlastimil Babka
- Renamed proc_maps_query_data to proc_maps_locking_ctx,
per Vlastimil Babka
[1] https://lore.kernel.org/all/20250704060727.724817-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250731220024.702621-1-surenb@google.com/
Suren Baghdasaryan (3):
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
fs/proc/task_mmu: factor out proc_maps_private fields used by
PROCMAP_QUERY
fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks
fs/proc/internal.h | 15 +-
fs/proc/task_mmu.c | 149 ++++++++++++------
fs/proc/task_nommu.c | 14 +-
tools/testing/selftests/proc/proc-maps-race.c | 65 ++++++++
4 files changed, 181 insertions(+), 62 deletions(-)
base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302
--
2.50.1.565.gc32cd1483b-goog
With joint effort from the upstream KVM community, we come up with the
4th version of mediated vPMU for x86. We have made the following changes
on top of the previous RFC v3.
v3 -> v4
- Rebase whole patchset on 6.14-rc3 base.
- Address Peter's comments on Perf part.
- Address Sean's comments on KVM part.
* Change key word "passthrough" to "mediated" in all patches
* Change static enabling to user space dynamic enabling via KVM_CAP_PMU_CAPABILITY.
* Only support GLOBAL_CTRL save/restore with VMCS exec_ctrl, drop the MSR
save/retore list support for GLOBAL_CTRL, thus the support of mediated
vPMU is constrained to SapphireRapids and later CPUs on Intel side.
* Merge some small changes into a single patch.
- Address Sandipan's comment on invalid pmu pointer.
- Add back "eventsel_hw" and "fixed_ctr_ctrl_hw" to avoid to directly
manipulate pmc->eventsel and pmu->fixed_ctr_ctrl.
Testing (Intel side):
- Perf-based legacy vPMU (force emulation on/off)
* Kselftests pmu_counters_test, pmu_event_filter_test and
vmx_pmu_caps_test pass.
* KUT PMU tests pmu, pmu_lbr, pmu_pebs pass.
* Basic perf counting/sampling tests in 3 scenarios, guest-only,
host-only and host-guest coexistence all pass.
- Mediated vPMU (force emulation on/off)
* Kselftests pmu_counters_test, pmu_event_filter_test and
vmx_pmu_caps_test pass.
* KUT PMU tests pmu, pmu_lbr, pmu_pebs pass.
* Basic perf counting/sampling tests in 3 scenarios, guest-only,
host-only and host-guest coexistence all pass.
- Failures. All above tests passed on Intel Granite Rapids as well
except a failure on KUT/pmu_pebs.
* GP counter 0 (0xfffffffffffe): PEBS record (written seq 0)
is verified (including size, counters and cfg).
* The pebs_data_cfg (0xb500000000) doesn't match with the
effective MSR_PEBS_DATA_CFG (0x0).
* This failure has nothing to do with this mediated vPMU patch set. The
failure is caused by Granite Rapids supported timed PEBS which needs
extra support on Qemu and KUT/pmu_pebs. These extra support would be
sent in separate patches later.
Testing (AMD side):
- Kselftests pmu_counters_test, pmu_event_filter_test and
vmx_pmu_caps_test all pass
- legacy guest with KUT/pmu:
* qmeu option: -cpu host, -perfctr-core
* when set force_emulation_prefix=1, passes
* when set force_emulation_prefix=0, passes
- perfmon-v1 guest with KUT/pmu:
* qmeu option: -cpu host, -perfmon-v2
* when set force_emulation_prefix=1, passes
* when set force_emulation_prefix=0, passes
- perfmon-v2 guest with KUT/pmu:
* qmeu option: -cpu host
* when set force_emulation_prefix=1, passes
* when set force_emulation_prefix=0, passes
- perf_fuzzer (perfmon-v2):
* fails with soft lockup in guest in current version.
* culprit could be between 6.13 ~ 6.14-rc3 within KVM
* Series tested on 6.12 and 6.13 without issue.
Note: a QEMU series is needed to run mediated vPMU v4:
- https://lore.kernel.org/all/20250324123712.34096-1-dapeng1.mi@linux.intel.c…
History:
- RFC v3: https://lore.kernel.org/all/20240801045907.4010984-1-mizhang@google.com/
- RFC v2: https://lore.kernel.org/all/20240506053020.3911940-1-mizhang@google.com/
- RFC v1: https://lore.kernel.org/all/20240126085444.324918-1-xiong.y.zhang@linux.int…
Dapeng Mi (18):
KVM: x86/pmu: Introduce enable_mediated_pmu global parameter
KVM: x86/pmu: Check PMU cpuid configuration from user space
KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers
KVM: x86/pmu: Add perf_capabilities field in struct kvm_host_values{}
KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header
KVM: VMX: Add macros to wrap around
{secondary,tertiary}_exec_controls_changebit()
KVM: x86/pmu: Check if mediated vPMU can intercept rdpmc
KVM: x86/pmu/vmx: Save/load guest IA32_PERF_GLOBAL_CTRL with
vm_exit/entry_ctrl
KVM: x86/pmu: Optimize intel/amd_pmu_refresh() helpers
KVM: x86/pmu: Setup PMU MSRs' interception mode
KVM: x86/pmu: Handle PMU MSRs interception and event filtering
KVM: x86/pmu: Switch host/guest PMU context at vm-exit/vm-entry
KVM: x86/pmu: Handle emulated instruction for mediated vPMU
KVM: nVMX: Add macros to simplify nested MSR interception setting
KVM: selftests: Add mediated vPMU supported for pmu tests
KVM: Selftests: Support mediated vPMU for vmx_pmu_caps_test
KVM: Selftests: Fix pmu_counters_test error for mediated vPMU
KVM: x86/pmu: Expose enable_mediated_pmu parameter to user space
Kan Liang (8):
perf: Support get/put mediated PMU interfaces
perf: Skip pmu_ctx based on event_type
perf: Clean up perf ctx time
perf: Add a EVENT_GUEST flag
perf: Add generic exclude_guest support
perf: Add switch_guest_ctx() interface
perf/x86: Support switch_guest_ctx interface
perf/x86/intel: Support PERF_PMU_CAP_MEDIATED_VPMU
Mingwei Zhang (5):
perf/x86: Forbid PMI handler when guest own PMU
perf/x86/core: Plumb mediated PMU capability from x86_pmu to
x86_pmu_cap
KVM: x86/pmu: Exclude PMU MSRs in vmx_get_passthrough_msr_slot()
KVM: x86/pmu: introduce eventsel_hw to prepare for pmu event filtering
KVM: nVMX: Add nested virtualization support for mediated PMU
Sandipan Das (4):
perf/x86/core: Do not set bit width for unavailable counters
KVM: x86/pmu: Add AMD PMU registers to direct access list
KVM: x86/pmu/svm: Set GuestOnly bit and clear HostOnly bit when guest
write to event selectors
perf/x86/amd: Support PERF_PMU_CAP_MEDIATED_VPMU for AMD host
Xiong Zhang (3):
x86/irq: Factor out common code for installing kvm irq handler
perf: core/x86: Register a new vector for KVM GUEST PMI
KVM: x86/pmu: Register KVM_GUEST_PMI_VECTOR handler
arch/x86/events/amd/core.c | 2 +
arch/x86/events/core.c | 40 +-
arch/x86/events/intel/core.c | 5 +
arch/x86/include/asm/hardirq.h | 1 +
arch/x86/include/asm/idtentry.h | 1 +
arch/x86/include/asm/irq.h | 2 +-
arch/x86/include/asm/irq_vectors.h | 5 +-
arch/x86/include/asm/kvm-x86-pmu-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 10 +
arch/x86/include/asm/msr-index.h | 18 +-
arch/x86/include/asm/perf_event.h | 1 +
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/irq.c | 39 +-
arch/x86/kvm/cpuid.c | 15 +
arch/x86/kvm/pmu.c | 254 ++++++++-
arch/x86/kvm/pmu.h | 45 ++
arch/x86/kvm/svm/pmu.c | 148 ++++-
arch/x86/kvm/svm/svm.c | 26 +
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/capabilities.h | 11 +-
arch/x86/kvm/vmx/nested.c | 68 ++-
arch/x86/kvm/vmx/pmu_intel.c | 224 ++++++--
arch/x86/kvm/vmx/vmx.c | 89 +--
arch/x86/kvm/vmx/vmx.h | 11 +-
arch/x86/kvm/x86.c | 63 ++-
arch/x86/kvm/x86.h | 2 +
include/linux/perf_event.h | 47 +-
kernel/events/core.c | 519 ++++++++++++++----
.../beauty/arch/x86/include/asm/irq_vectors.h | 5 +-
.../selftests/kvm/include/kvm_test_harness.h | 13 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/include/x86/processor.h | 8 +
tools/testing/selftests/kvm/lib/kvm_util.c | 23 +
.../selftests/kvm/x86/pmu_counters_test.c | 24 +-
.../selftests/kvm/x86/pmu_event_filter_test.c | 8 +-
.../selftests/kvm/x86/vmx_pmu_caps_test.c | 2 +-
37 files changed, 1480 insertions(+), 258 deletions(-)
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
--
2.49.0.395.g12beb8f557-goog
We keep seeing flakes on packetdrill on debug kernels, while
non-debug kernels are stable, not a single flake in 200 runs.
Time to give up, debug kernels appear to suffer from 10msec
latency spikes and any timing-sensitive test is bound to flake.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: willemb(a)google.com
CC: matttbe(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
---
.../selftests/net/packetdrill/ksft_runner.sh | 19 +------------------
1 file changed, 1 insertion(+), 18 deletions(-)
diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
index c5b01e1bd4c7..a7e790af38ff 100755
--- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh
+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -35,24 +35,7 @@ failfunc=ktap_test_fail
if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
optargs+=('--tolerance_usecs=14000')
-
- # xfail tests that are known flaky with dbg config, not fixable.
- # still run them for coverage (and expect 100% pass without dbg).
- declare -ar xfail_list=(
- "tcp_blocking_blocking-connect.pkt"
- "tcp_blocking_blocking-read.pkt"
- "tcp_eor_no-coalesce-retrans.pkt"
- "tcp_fast_recovery_prr-ss.*.pkt"
- "tcp_sack_sack-route-refresh-ip-tos.pkt"
- "tcp_slow_start_slow-start-after-win-update.pkt"
- "tcp_timestamping.*.pkt"
- "tcp_user_timeout_user-timeout-probe.pkt"
- "tcp_zerocopy_cl.*.pkt"
- "tcp_zerocopy_epoll_.*.pkt"
- "tcp_tcp_info_tcp-info-.*-limited.pkt"
- )
- readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$"
- [[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail
+ failfunc=ktap_test_xfail
fi
ktap_print_header
--
2.50.1
This change resolves non literal string format warning invoked
for proc-maps-race.c while compiling.
proc-maps-race.c:205:17: warning: format not a string literal and no format arguments [-Wformat-security]
205 | printf(text);
| ^~~~~~
proc-maps-race.c:209:17: warning: format not a string literal and no format arguments [-Wformat-security]
209 | printf(text);
| ^~~~~~
proc-maps-race.c: In function ‘print_last_lines’:
proc-maps-race.c:224:9: warning: format not a string literal and no format arguments [-Wformat-security]
224 | printf(start);
| ^~~~~~
Added string format specifier %s for the printf calls
in both print_first_lines() and print_last_lines() thus
resolving the warnings invoked.
The test executes fine after this change thus causing no
affect to the functional behavior of the test.
Fixes: aadc099c480f ("selftests/proc: add verbose mode for /proc/pid/maps tearing tests")
Signed-off-by: Sukrut Heroorkar <hsukrut3(a)gmail.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
---
Changes since v1:
- Added Fixes tag
- Included Acked-by Suren Baghdasaryan
https://lore.kernel.org/all/CAHCkknoxpKV80-S3jByY1xnRXd1Pr=v=D2a0ZcgnY0-Hny…
tools/testing/selftests/proc/proc-maps-race.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/proc/proc-maps-race.c b/tools/testing/selftests/proc/proc-maps-race.c
index 66773685a047..94bba4553130 100644
--- a/tools/testing/selftests/proc/proc-maps-race.c
+++ b/tools/testing/selftests/proc/proc-maps-race.c
@@ -202,11 +202,11 @@ static void print_first_lines(char *text, int nr)
int offs = end - text;
text[offs] = '\0';
- printf(text);
+ printf("%s", text);
text[offs] = '\n';
printf("\n");
} else {
- printf(text);
+ printf("%s", text);
}
}
@@ -221,7 +221,7 @@ static void print_last_lines(char *text, int nr)
nr--;
start--;
}
- printf(start);
+ printf("%s", start);
}
static void print_boundaries(const char *title, FIXTURE_DATA(proc_maps_race) *self)
--
2.43.0
This change resolves non literal string format warning invoked
for proc-maps-race.c while compiling.
proc-maps-race.c:205:17: warning: format not a string literal and no format arguments [-Wformat-security]
205 | printf(text);
| ^~~~~~
proc-maps-race.c:209:17: warning: format not a string literal and no format arguments [-Wformat-security]
209 | printf(text);
| ^~~~~~
proc-maps-race.c: In function ‘print_last_lines’:
proc-maps-race.c:224:9: warning: format not a string literal and no format arguments [-Wformat-security]
224 | printf(start);
| ^~~~~~
Added string format specifier %s for the printf calls
in both print_first_lines() and print_last_lines() thus
resolving the warnings invoked.
The test executes fine after this change thus causing no
affect to the functional behavior of the test.
Signed-off-by: Sukrut Heroorkar <hsukrut3(a)gmail.com>
---
tools/testing/selftests/proc/proc-maps-race.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/proc/proc-maps-race.c b/tools/testing/selftests/proc/proc-maps-race.c
index 66773685a047..94bba4553130 100644
--- a/tools/testing/selftests/proc/proc-maps-race.c
+++ b/tools/testing/selftests/proc/proc-maps-race.c
@@ -202,11 +202,11 @@ static void print_first_lines(char *text, int nr)
int offs = end - text;
text[offs] = '\0';
- printf(text);
+ printf("%s", text);
text[offs] = '\n';
printf("\n");
} else {
- printf(text);
+ printf("%s", text);
}
}
@@ -221,7 +221,7 @@ static void print_last_lines(char *text, int nr)
nr--;
start--;
}
- printf(start);
+ printf("%s", start);
}
static void print_boundaries(const char *title, FIXTURE_DATA(proc_maps_race) *self)
--
2.43.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find DUALPI2 iproute2 patch v11.
For more details of DualPI2, please refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
Best Regards,
Chia-Yu
---
v11 (18-Jul-2025)
- Replace TCA_DUALPI2 prefix with TC_DUALPI2 prefix for enums (Jakub Kicinski <kuba(a)kernel.org>)
v10 (02-Jul-2025)
- Replace STEP_THRESH and STEP_PACKETS w/ STEP_THRESH_PKTS and STEP_THRESH_US of net-next patch (Jakub Kicinski <kuba(a)kernel.org>)
v9 (13-Jun-2025)
- Fix space issue and typos (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Change 'rtt_typical' to 'typical_rtt' in tc/q_dualpi2.c (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Add the num of enum used by DualPI2 in pkt_sched.h
v8 (09-May-2025)
- Update pkt_sched.h with the one in nex-next
- Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
v7 (05-May-2025)
- Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml
- Reorganize dualpi2_print_opt() to match the order in tc.yaml
- Remove credit-queue in PRINT_JSON
v6 (26-Apr-2025)
- Update JSON file output due to spec modification in tc.yaml of net-next
v5 (25-Mar-2025)
- Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>)
- Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>)
v4 (16-Mar-2025)
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking.
v3 (21-Feb-2025)
- Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update the manual to align with the latest implementation and clarify the queue naming and default unit
- Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2
v2 (23-Oct-2024)
- Rename get_float in dualpi2 to get_float_min_max in utils.c
- Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>)
---
Chia-Yu Chang (1):
tc: add dualpi2 scheduler module
bash-completion/tc | 11 +-
include/uapi/linux/pkt_sched.h | 68 +++++
include/utils.h | 2 +
ip/iplink_can.c | 14 -
lib/utils.c | 30 ++
man/man8/tc-dualpi2.8 | 249 ++++++++++++++++
tc/Makefile | 1 +
tc/q_dualpi2.c | 528 +++++++++++++++++++++++++++++++++
8 files changed, 888 insertions(+), 15 deletions(-)
create mode 100644 man/man8/tc-dualpi2.8
create mode 100644 tc/q_dualpi2.c
--
2.34.1
From: Benjamin Berg <benjamin.berg(a)intel.com>
Hi,
This patchset adds signal handling to nolibc. Initially, I would like to
use this for tests. But in the long run, the goal is to use nolibc for
the UML kernel itself. In both cases, signal handling will be needed.
With v3 everything is now included in nolibc instead of trying to use
the messy kernel headers.
Benjamin
Benjamin Berg (4):
selftests/nolibc: fix EXPECT_NZ macro
selftests/nolibc: remove outdated comment about construct order
tools/nolibc: add more generic bitmask macros for FD_*
tools/nolibc: add signal support
tools/include/nolibc/Makefile | 1 +
tools/include/nolibc/arch-s390.h | 4 +-
tools/include/nolibc/asm-signal.h | 237 +++++++++++++++++++
tools/include/nolibc/signal.h | 179 ++++++++++++++
tools/include/nolibc/sys.h | 2 +-
tools/include/nolibc/sys/wait.h | 1 +
tools/include/nolibc/time.h | 2 +-
tools/include/nolibc/types.h | 81 ++++---
tools/testing/selftests/nolibc/nolibc-test.c | 139 ++++++++++-
9 files changed, 608 insertions(+), 38 deletions(-)
create mode 100644 tools/include/nolibc/asm-signal.h
--
2.50.1
Currently all test cases are linked with thp_settings, while only 6
out of 50+ targets rely on it.
Instead of making thp_settings as a common dependency, link it only
when necessary.
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
---
tools/testing/selftests/mm/Makefile | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index d4f19f87053b..eea4881c918a 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -158,14 +158,19 @@ TEST_FILES += write_hugetlb_memory.sh
include ../lib.mk
-$(TEST_GEN_PROGS): vm_util.c thp_settings.c
-$(TEST_GEN_FILES): vm_util.c thp_settings.c
+$(TEST_GEN_PROGS): vm_util.c
+$(TEST_GEN_FILES): vm_util.c
$(OUTPUT)/uffd-stress: uffd-common.c
$(OUTPUT)/uffd-unit-tests: uffd-common.c
-$(OUTPUT)/uffd-wp-mremap: uffd-common.c
+$(OUTPUT)/uffd-wp-mremap: uffd-common.c thp_settings.c
$(OUTPUT)/protection_keys: pkey_util.c
$(OUTPUT)/pkey_sighandler_tests: pkey_util.c
+$(OUTPUT)/cow: thp_settings.c
+$(OUTPUT)/migration: thp_settings.c
+$(OUTPUT)/khugepaged: thp_settings.c
+$(OUTPUT)/ksm_tests: thp_settings.c
+$(OUTPUT)/soft-dirty: thp_settings.c
ifeq ($(ARCH),x86_64)
BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
--
2.34.1
When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array and PC-relative addressing mode for global variable,
e.g. "1@-96(%rbp,%rax,8)" and "-1@4+t1(%rip)".
The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).
This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
`parse_usdt_arg`.
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
argument spec.
- change the global variable t1 to a local variable, to avoid compiler
generating PC-relative addressing mode for it.
Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.
The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how
Change since v2:
- fix the `scale` uninitialized error
Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
and pass all test cases.
Change since v4:
- split the patch into two parts, one for the fix and the other for the
test
Do we need to add support for PC-relative USDT argument spec handling in
libbpf? I have some interest in this question, but currently have no
ideas. Getting offsets based on symbols requires dependency on the symbol
table. However, once the binary file is stripped, the symtab will also be
removed, which will cause this approach to fail. Does anyone have any
thoughts on this?
Jiawei Zhao (2):
libbpf: fix USDT SIB argument handling causing unrecognized register
error
selftests/bpf: Force -O2 for USDT selftests to cover SIB handling
logic
tools/lib/bpf/usdt.bpf.h | 33 +++++++++++++-
tools/lib/bpf/usdt.c | 43 ++++++++++++++++---
tools/testing/selftests/bpf/Makefile | 5 +++
tools/testing/selftests/bpf/prog_tests/usdt.c | 18 +++++---
4 files changed, 86 insertions(+), 13 deletions(-)
--
2.43.0
When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array and PC-relative addressing mode for global variable,
e.g. "1@-96(%rbp,%rax,8)" and "-1@4+t1(%rip)".
The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).
This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
`parse_usdt_arg`.
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
argument spec.
- change the global variable t1 to a local variable, to avoid compiler
generating PC-relative addressing mode for it.
Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.
The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how
Change since v2:
- fix the `scale` uninitialized error
Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
and pass all test cases.
Do we need to add support for PC-relative USDT argument spec handling in libbpf?
I have some interest in this question, but currently have no ideas. Getting offsets
based on symbols requires dependency on the symbol table. However, once the binary
file is stripped, the symtab will also be removed, which will cause this approach
to fail. Does anyone have any thoughts on this?
Jiawei Zhao (1):
libbpf: fix USDT SIB argument handling causing unrecognized register
error
tools/lib/bpf/usdt.bpf.h | 33 +++++++++++++-
tools/lib/bpf/usdt.c | 43 ++++++++++++++++---
tools/testing/selftests/bpf/Makefile | 5 +++
tools/testing/selftests/bpf/prog_tests/usdt.c | 18 +++++---
4 files changed, 86 insertions(+), 13 deletions(-)
--
2.43.0
With /proc/pid/maps now being read under per-vma lock protection we can
reuse parts of that code to execute PROCMAP_QUERY ioctl also without
taking mmap_lock. The change is designed to reduce mmap_lock contention
and prevent PROCMAP_QUERY ioctl calls from blocking address space updates.
This patchset was split out of the original patchset [1] that introduced
per-vma lock usage for /proc/pid/maps reading. It contains PROCMAP_QUERY
tests, code refactoring patch to simplify the main change and the actual
transition to per-vma lock.
[1] https://lore.kernel.org/all/20250704060727.724817-1-surenb@google.com/
Suren Baghdasaryan (3):
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
fs/proc/task_mmu: factor out proc_maps_private fields used by
PROCMAP_QUERY
fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks
fs/proc/internal.h | 15 +-
fs/proc/task_mmu.c | 149 ++++++++++++------
tools/testing/selftests/proc/proc-maps-race.c | 65 ++++++++
3 files changed, 174 insertions(+), 55 deletions(-)
base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302
--
2.50.1.565.gc32cd1483b-goog
The step_after_suspend_test verifies that the system successfully
suspended and resumed by setting a timerfd and checking whether the
timer fully expired. However, this method is unreliable due to timing
races.
In practice, the system may take time to enter suspend, during which the
timer may expire just before or during the transition. As a result,
the remaining time after resume may show non-zero nanoseconds, even if
suspend/resume completed successfully. This leads to false test failures.
Replace the timer-based check with a read from
/sys/power/suspend_stats/success. This counter is incremented only
after a full suspend/resume cycle, providing a reliable and race-free
indicator.
Also remove the unused file descriptor for /sys/power/state, which
remained after switching to a system() call to trigger suspend [1].
[1] https://lore.kernel.org/all/20240930224025.2858767-1-yifei.l.liu@oracle.com/
Fixes: c66be905cda2 ("selftests: breakpoints: use remaining time to check if suspend succeed")
Signed-off-by: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
---
.../breakpoints/step_after_suspend_test.c | 41 ++++++++++++++-----
1 file changed, 31 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index 8d275f03e977..8d233ac95696 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -127,22 +127,42 @@ int run_test(int cpu)
return KSFT_PASS;
}
+/*
+ * Reads the suspend success count from sysfs.
+ * Returns the count on success or exits on failure.
+ */
+static int get_suspend_success_count_or_fail(void)
+{
+ FILE *fp;
+ int val;
+
+ fp = fopen("/sys/power/suspend_stats/success", "r");
+ if (!fp)
+ ksft_exit_fail_msg(
+ "Failed to open suspend_stats/success: %s\n",
+ strerror(errno));
+
+ if (fscanf(fp, "%d", &val) != 1) {
+ fclose(fp);
+ ksft_exit_fail_msg(
+ "Failed to read suspend success count\n");
+ }
+
+ fclose(fp);
+ return val;
+}
+
void suspend(void)
{
- int power_state_fd;
int timerfd;
int err;
+ int count_before;
+ int count_after;
struct itimerspec spec = {};
if (getuid() != 0)
ksft_exit_skip("Please run the test as root - Exiting.\n");
- power_state_fd = open("/sys/power/state", O_RDWR);
- if (power_state_fd < 0)
- ksft_exit_fail_msg(
- "open(\"/sys/power/state\") failed %s)\n",
- strerror(errno));
-
timerfd = timerfd_create(CLOCK_BOOTTIME_ALARM, 0);
if (timerfd < 0)
ksft_exit_fail_msg("timerfd_create() failed\n");
@@ -152,14 +172,15 @@ void suspend(void)
if (err < 0)
ksft_exit_fail_msg("timerfd_settime() failed\n");
+ count_before = get_suspend_success_count_or_fail();
+
system("(echo mem > /sys/power/state) 2> /dev/null");
- timerfd_gettime(timerfd, &spec);
- if (spec.it_value.tv_sec != 0 || spec.it_value.tv_nsec != 0)
+ count_after = get_suspend_success_count_or_fail();
+ if (count_after <= count_before)
ksft_exit_fail_msg("Failed to enter Suspend state\n");
close(timerfd);
- close(power_state_fd);
}
int main(int argc, char **argv)
--
2.43.0
This patch fixes unstable LACP negotiation when bonding is configured in
passive mode (`lacp_active=off`).
Previously, the actor would stop sending LACPDUs after initial negotiation
succeeded, leading to the partner timing out and restarting the negotiation
cycle. This resulted in continuous LACP state flapping.
The fix ensures the passive actor starts sending periodic LACPDUs after
receiving the first LACPDU from the partner, in accordance with IEEE
802.1AX-2020 section 6.4.1.
Out of topic:
Although this patch addresses a functional bug and could be considered for
`net`, I'm slightly concerned about potential regressions, as it changes
the current bonding LACP protocol behavior.
It might be safer to merge this through `net-next` first to allow broader
testing. Thoughts?
Hangbin Liu (2):
bonding: send LACPDUs periodically in passive mode after receiving
partner's LACPDU
selftests: bonding: add test for passive LACP mode
drivers/net/bonding/bond_3ad.c | 72 ++++++++++----
drivers/net/bonding/bond_options.c | 1 +
include/net/bond_3ad.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_passive_lacp.sh | 93 +++++++++++++++++++
5 files changed, 151 insertions(+), 19 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh
--
2.46.0
There are several situations where VMM is involved when handling
synchronous external instruction or data aborts, and often VMM
needs to inject external aborts to guest. In addition to manipulating
individual registers with KVM_SET_ONE_REG API, an easier way is to
use the KVM_SET_VCPU_EVENTS API.
This patchset adds two new features to the KVM_SET_VCPU_EVENTS API.
1. Extend KVM_SET_VCPU_EVENTS to support external instruction abort.
2. Allow userspace to emulate ESR_ELx.ISS by supplying ESR_ELx.
In this way, we can also allow userspace to emulate ESR_ELx.ISS2
in future.
The UAPI change for #1 is straightforward. However, I would appreciate
some feedback on the ABI change for #2:
struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
__u8 ext_iabt_pending;
__u8 ext_abt_has_esr;
__u8 pad[3];
__u64 serror_esr;
__u64 ext_abt_esr; // <= +8 bytes
} exception;
__u32 reserved[10]; // <= -8 bytes
};
The offset to kvm_vcpu_events.reserved changes, and the size of
exception changes. I think we can't say userspace will never access
reserved, or they will never use sizeof(exception). Theoretically this
is an ABI break and I want to call it out and ask if a new ABI is needed
for feature #2. For example, is it worthy to introduce exception_v2
or kvm_vcpu_events_v2.
Based on commit 7b8346bd9fce6 ("KVM: arm64: Don't attempt vLPI mappings
when vPE allocation is disabled")
Jiaqi Yan (3):
KVM: arm64: Allow userspace to supply ESR when injecting SEA
KVM: selftests: Test injecting external abort with ISS
Documentation: kvm: update UAPI for injecting SEA
Raghavendra Rao Ananta (1):
KVM: arm64: Allow userspace to inject external instruction abort
Documentation/virt/kvm/api.rst | 48 +++--
arch/arm64/include/asm/kvm_emulate.h | 9 +-
arch/arm64/include/uapi/asm/kvm.h | 7 +-
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/emulate-nested.c | 6 +-
arch/arm64/kvm/guest.c | 42 ++--
arch/arm64/kvm/inject_fault.c | 16 +-
include/uapi/linux/kvm.h | 1 +
tools/arch/arm64/include/uapi/asm/kvm.h | 7 +-
.../selftests/kvm/arm64/external_aborts.c | 191 +++++++++++++++---
.../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++++++
11 files changed, 352 insertions(+), 74 deletions(-)
create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
--
2.50.1.565.gc32cd1483b-goog
Problem
=======
When host APEI is unable to claim synchronous external abort (SEA)
during stage-2 guest abort, today KVM directly injects an async SError
into the VCPU then resumes it. The injected SError usually results in
unpleasant guest kernel panic.
One of the major situation of guest SEA is when VCPU consumes recoverable
uncorrected memory error (UER), which is not uncommon at all in modern
datacenter servers with large amounts of physical memory. Although SError
and guest panic is sufficient to stop the propagation of corrupted memory
there is room to recover from an UER in a more graceful manner.
Proposed Solution
=================
Alternatively KVM can replay the SEA to the faulting VCPU, via existing
KVM_SET_VCPU_EVENTS API. If the memory poison consumption or the fault
that cause SEA is not from guest kernel, the blast radius can be limited
to the consuming or faulting guest userspace process, so the VM can keep
running.
In addition, instead of doing under the hood without involving userspace,
there are benefits to redirect the SEA to VMM:
- VM customers care about the disruptions caused by memory errors, and
VMM usually has the responsibility to start the process of notifying
the customers of memory error events in their VMs. For example some
cloud provider emits a critical log in their observability UI [1], and
provides playbook for customers on how to mitigate disruptions to
their workloads.
- VMM can protect future memory error consumption by unmapping the poisoned
pages from stage-2 page table with KVM userfault, or by splitting the
memslot that contains the poisoned guest pages [2].
- VMM can keep track of SEA events in the VM. When VMM thinks the status
on the host or the VM is bad enough, e.g. number of distinct SEAs
exceeds a threshold, it can restart the VM on another healthy host.
- Behavior parity with x86 architecture. When machine check exception
(MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to
let VMM either recover from the MCE, or terminate itself with VM.
The prior RFC proposes to implement SIGBUS on arm64 as well, but
Marc preferred VCPU exit over signal [3]. However, implementation
aside, returning SEA to VMM is on par with returning MCE to VMM.
Once SEA is redirected to VMM, among other actions, VMM is encouraged
to inject external aborts into the faulting VCPU, which is already
supported by KVM on arm64. We notice injecting instruction abort is not
fully supported by KVM_SET_VCPU_EVENTS. Complement it in the patchset.
New UAPIs
=========
This patchset introduces following userspace-visiable changes to empower
VMM to control what happens next for SEA on guest memory:
- KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled
this new capability at VM creation, and the SEA is not caused by
memory allocated for stage-2 translation table, instead of injecting
SError, return KVM_EXIT_ARM_SEA to userspace.
- KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details
about the SEA is provided in arm_sea as much as possible, including
sanitized ESR value at EL2, if guest virtual and physical addresses
(GPA and GVA) are available and the values if available.
- KVM_CAP_ARM_INJECT_EXT_IABT. VMM today can inject external data abort
to VCPU via KVM_SET_VCPU_EVENTS API. However, in case of instruction
abort, VMM cannot inject it via KVM_SET_VCPU_EVENTS.
KVM_CAP_ARM_INJECT_EXT_IABT is just a natural extend to
KVM_CAP_ARM_INJECT_EXT_DABT that tells VMM KVM_SET_VCPU_EVENTS now
supports external instruction abort.
* From v1 [4]:
- Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid
dereferencing NULL ITE pointer").
- Sanitize ESR_EL2 before reporting it to userspace.
- Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to
stage-2 translation table.
[1] https://cloud.google.com/solutions/sap/docs/manage-host-errors
[2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com
[3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org
[4] https://lore.kernel.org/kvm/20250505161412.1926643-1-jiaqiyan@google.com
Jiaqi Yan (5):
KVM: arm64: VM exit to userspace to handle SEA
KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid
KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER
KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT
Documentation: kvm: new uAPI for handling SEA
Raghavendra Rao Ananta (1):
KVM: arm64: Allow userspace to inject external instruction aborts
Documentation/virt/kvm/api.rst | 128 ++++++-
arch/arm64/include/asm/kvm_emulate.h | 67 ++++
arch/arm64/include/asm/kvm_host.h | 8 +
arch/arm64/include/asm/kvm_ras.h | 2 +-
arch/arm64/include/uapi/asm/kvm.h | 3 +-
arch/arm64/kvm/arm.c | 6 +
arch/arm64/kvm/guest.c | 13 +-
arch/arm64/kvm/inject_fault.c | 3 +
arch/arm64/kvm/mmu.c | 59 ++-
include/uapi/linux/kvm.h | 12 +
tools/arch/arm64/include/asm/esr.h | 2 +
tools/arch/arm64/include/uapi/asm/kvm.h | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++
.../testing/selftests/kvm/arm64/sea_to_user.c | 340 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
16 files changed, 718 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c
--
2.49.0.1266.g31b7d2e469-goog
Vishal!
On Wed, Jul 30 2025 at 23:35, Vishal Parmar wrote:
Please do not top-post and trim your replies.
> The intent behind this change is to make output useful as is.
> for example, to provide a performance report in case of regression.
The point John was making:
>> So it might be worth looking into getting the output to be happy with
>> TAP while you're tweaking things here.
The kernel selftests are converting over to standardized TAP output
format, which is intended to aid automated testing.
So if we change the outpot format of this test, then we switch it over to
TAP format and do not invent yet another randomized output scheme.
> CSV format is also a good alternative if the maintainer prefers that.
The most important information is whether the test succeeded or not and
CSV format is not helping either to conform with the test output
standards.
For the success case, the actual numbers are uninteresting. In the
failure case it's sufficient to emit:
ksft_test_result_fail("Req: NNNN, Exp: $MMMM, Res: $LLLL\n", ...);
In case of regressions (fail), a report providing this output is good
enough for the relevant maintainer/developer to start investigating.
No?
Thanks,
tglx
Hi!
Does anyone have ideas about crediting test authors or tests for bugs
discovered? We increasingly see situations where someone adds a test
then our subsystem CI uncovers a (1 in a 100 runs) bug using that test.
Using reported-by doesn't feel right. But credit should go to the
person who wrote the test. Is anyone else having this dilemma?
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs mount is
auto-selected based on the mounting process's active pidns, and the
pidns itself is basically hidden once the mount has been constructed).
/* pidns mount option for procfs */
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns of a
procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes creates
a procfs mount from an empty pidns so that user namespaced containers
can be nested (without this, the nested containers would fail to
mount procfs). But this requires forking off a helper process because
you cannot just one-shot this using mount(2).
* Container runtimes in general need to fork into a container before
configuring its mounts, which can lead to security issues in the case
of shared-pidns containers (a privileged process in the pidns can
interact with your container runtime process). While
SUID_DUMP_DISABLE and user namespaces make this less of an issue, the
strict need for this due to a minor uAPI wart is kind of unfortunate.
Things would be much easier if there was a way for userspace to just
specify the pidns they want. Patch 1 implements a new "pidns" argument
which can be set using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
The initial security model I have in this RFC is to be as conservative
as possible and just mirror the security model for setns(2) -- which
means that you can only set pidns=... to pid namespaces that your
current pid namespace is a direct ancestor of and you have CAP_SYS_ADMIN
privileges over the pid namespace. This fulfils the requirements of
container runtimes, but I suspect that this may be too strict for some
usecases.
The pidns argument is not displayed in mountinfo -- it's not clear to me
what value it would make sense to show (maybe we could just use ns_dname
to provide an identifier for the namespace, but this number would be
fairly useless to userspace). I'm open to suggestions. Note that
PROCFS_GET_PID_NAMESPACE (see below) does at least let userspace get
information about this outside of mountinfo.
/* ioctl(PROCFS_GET_PID_NAMESPACE) */
In addition, being able to figure out what pid namespace is being used
by a procfs mount is quite useful when you have an administrative
process (such as a container runtime) which wants to figure out the
correct way of mapping PIDs between its own namespace and the namespace
for procfs (using NS_GET_{PID,TGID}_{IN,FROM}_PIDNS). There are
alternative ways to do this, but they all rely on ancillary information
that third-party libraries and tools do not necessarily have access to.
To make this easier, add a new ioctl (PROCFS_GET_PID_NAMESPACE) which
can be used to get a reference to the pidns that a procfs is using.
It's not quite clear what is the correct security model for this API,
but the current approach I've taken is to:
* Make the ioctl only valid on the root (meaning that a process without
access to the procfs root -- such as only having an fd to a procfs
file or some open_tree(2)-like subset -- cannot use this API).
* Require that the process requesting either has access to
/proc/1/ns/pid anyway (i.e. has ptrace-read access to the pidns
pid1), has CAP_SYS_ADMIN access to the pidns (i.e. has administrative
access to it and can join it if they had a handle), or is in a pidns
that is a direct ancestor of the target pidns (i.e. all of the pids
are already visible in the procfs for the current process's pidns).
The security model for this is a little loose, as it seems to me that
all of the cases mentioned are valid cases to allow access, but I'm open
to suggestions for whether we need to make this stricter or looser.
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v2:
- #ifdef CONFIG_PID_NS
- Improve cover letter wording to make it clear we're talking about two
separate features with different permission models. [Andy Lutomirski]
- Fix build warnings in pidns_is_ancestor() patch. [kernel test robot]
- v1: <https://lore.kernel.org/r/20250721-procfs-pidns-api-v1-0-5cd9007e512d@cypha…>
---
Aleksa Sarai (4):
pidns: move is-ancestor logic to helper
procfs: add "pidns" mount option
procfs: add PROCFS_GET_PID_NAMESPACE ioctl
selftests/proc: add tests for new pidns APIs
Documentation/filesystems/proc.rst | 10 ++
fs/proc/root.c | 144 ++++++++++++++-
include/linux/pid_namespace.h | 9 +
include/uapi/linux/fs.h | 3 +
kernel/pid_namespace.c | 23 ++-
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 286 ++++++++++++++++++++++++++++++
8 files changed, 461 insertions(+), 16 deletions(-)
---
base-commit: 4c838c7672c39ec6ec48456c6ce22d14a68f4cda
change-id: 20250717-procfs-pidns-api-8ed1583431f0
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
Currently the RISC-V qemu architecture config has some code to check for
the presence of the BIOS files before loading, printing a message and
quitting if it's not present.
However, this prevents us from loading the architecture file even just
to inspect it if the dependencies are not present. Instead, have kunit.py
look for a check_dependencies function, and call it if present only when
the architecture config is being used.
This is necessary for future changes which enumerate or automatically
select an architecture.
Signed-off-by: David Gow <davidgow(a)google.com>
---
tools/testing/kunit/kunit_kernel.py | 5 +++++
tools/testing/kunit/qemu_configs/riscv.py | 10 ++++++----
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 260d8d9aa1db..c3201a76da24 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -230,6 +230,11 @@ def _get_qemu_ops(config_path: str,
assert isinstance(spec.loader, importlib.abc.Loader)
spec.loader.exec_module(config)
+ # Check for any per-architecture dependencies
+ if hasattr(config, 'check_dependencies'):
+ if not config.check_dependencies():
+ raise ValueError('Missing dependencies for ' + config_path)
+
if not hasattr(config, 'QEMU_ARCH'):
raise ValueError('qemu_config module missing "QEMU_ARCH": ' + config_path)
params: qemu_config.QemuArchParams = config.QEMU_ARCH
diff --git a/tools/testing/kunit/qemu_configs/riscv.py b/tools/testing/kunit/qemu_configs/riscv.py
index c87758030ff7..3c271d1005d9 100644
--- a/tools/testing/kunit/qemu_configs/riscv.py
+++ b/tools/testing/kunit/qemu_configs/riscv.py
@@ -6,10 +6,12 @@ import sys
OPENSBI_FILE = 'opensbi-riscv64-generic-fw_dynamic.bin'
OPENSBI_PATH = '/usr/share/qemu/' + OPENSBI_FILE
-if not os.path.isfile(OPENSBI_PATH):
- print('\n\nOpenSBI bios was not found in "' + OPENSBI_PATH + '".\n'
- 'Please ensure that qemu-system-riscv is installed, or edit the path in "qemu_configs/riscv.py"\n')
- sys.exit()
+def check_dependencies() -> bool:
+ if not os.path.isfile(OPENSBI_PATH):
+ print('\n\nOpenSBI bios was not found in "' + OPENSBI_PATH + '".\n'
+ 'Please ensure that qemu-system-riscv is installed, or edit the path in "qemu_configs/riscv.py"\n')
+ return False
+ return True
QEMU_ARCH = QemuArchParams(linux_arch='riscv',
kconfig='''
--
2.50.1.552.g942d659e1b-goog
Hi Thadeu,,
On Sun, 15 Jun 2025 at 23:31, Thadeu Lima de Souza Cascardo
<cascardo(a)igalia.com> wrote:
>
> Add test cases for static and dynamic minor number allocation and
> deallocation.
>
> While at it, improve description and test suite name.
>
> Some of the cases include:
>
> - that static and dynamic allocation reserved the expected minors.
>
> - that registering duplicate minors or duplicate names will fail.
>
> - that failing to create a sysfs file (due to duplicate names) will
> deallocate the dynamic minor correctly.
>
> - that dynamic allocation does not allocate a minor number in the static
> range.
>
> - that there are no collisions when mixing dynamic and static allocations.
>
> - that opening devices with various minor device numbers work.
>
> - that registering a static number in the dynamic range won't conflict with
> a dynamic allocation.
>
> This last test verifies the bug fixed by commit 6d04d2b554b1 ("misc:
> misc_minor_alloc to use ida for all dynamic/misc dynamic minors") has not
> regressed.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
Thanks for your patch, which is now commit 74d8361be3441dff ("char:
misc: add test cases") in linus/master stable/master
> Changes in v5:
> - Make miscdevice unit test built-in only
> - Make unit test require CONFIG_KUNIT=y
Why were these changes made? This means the test is no longer available
if KUNIT=m, and I can no longer just load the module when I want to
run the test.
> - Link to v4: https://lore.kernel.org/r/20250423-misc-dynrange-v4-0-133b5ae4ca18@igalia.c…
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2506,8 +2506,8 @@ config TEST_IDA
> tristate "Perform selftest on IDA functions"
>
> config TEST_MISC_MINOR
> - tristate "miscdevice KUnit test" if !KUNIT_ALL_TESTS
> - depends on KUNIT
> + bool "miscdevice KUnit test" if !KUNIT_ALL_TESTS
> + depends on KUNIT=y
> default KUNIT_ALL_TESTS
> help
> Kunit test for miscdevice API, specially its behavior in respect to
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
If WAIT_KILLABLE_RECV was specified, and an event is received, the
tracee's syscall is not supposed to be interruptible. This was not properly
ensured if the reply was sent too fast, and an interrupting signal was
received before the reply was processed on the tracee side.
This series fixes the bug and adds a test case for it to the selftests.
Signed-off-by: Johannes Nixdorf <johannes(a)nixdorf.dev>
---
Changes in v2:
- Added a selftest for the bug.
- Link to v1: https://lore.kernel.org/r/20250723-seccomp-races-v1-1-bef5667ce30a@nixdorf.…
---
Johannes Nixdorf (2):
seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
kernel/seccomp.c | 13 ++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 130 ++++++++++++++++++++++++++
2 files changed, 136 insertions(+), 7 deletions(-)
---
base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
change-id: 20250721-seccomp-races-e97897d6d94b
Best regards,
--
Johannes Nixdorf <johannes(a)nixdorf.dev>
Hi Linus,
Please pull this kselftest next update for Linux 6.17-rc1.
Fixes
- false failure of subsystem event test
- glob filter test to use mutex_unlock() instead of mutex_trylock()
- several spelling errors in tests
- test_kexec_jump build errors
- pidfd test duplicate-symbol warnings for SCHED_ CPP symbols
Adds a reliable check for suspend to breakpoints suspend test
Improvements to ipc test
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 19272b37aa4f83ca52bdf9c16d5d81bdd1354494:
Linux 6.16-rc1 (2025-06-08 13:44:43 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.17-rc1
for you to fetch changes up to 30fb5e134f05800dc424f8aa1d69841a6bdd9a54:
selftests/pidfd: Fix duplicate-symbol warnings for SCHED_ CPP symbols (2025-07-24 16:14:45 -0600)
----------------------------------------------------------------
linux_kselftest-next-6.17-rc1
Fixes
- false failure of subsystem event test
- glob filter test to use mutex_unlock() instead of mutex_trylock()
- several spelling errors in tests
- test_kexec_jump build errors
- pidfd test duplicate-symbol warnings for SCHED_ CPP symbols
Adds a reliable check for suspend to breakpoints suspend test
Improvements to ipc test
----------------------------------------------------------------
Ankit Chauhan (1):
selftests/ptrace: Fix spelling mistake "multible" -> "multiple"
Jihed Chaibi (1):
selftests/cpu-hotplug: fix typo in hotplaggable_offline_cpus function name
Masami Hiramatsu (Google) (1):
selftests: tracing: Use mutex_unlock for testing glob filter
Moon Hee Lee (2):
selftests: breakpoints: use suspend_stats to reliably check suspend success
selftests/kexec: fix test_kexec_jump build
Nick Huang (1):
selftests: ipc: Replace fail print statements with ksft_test_result_fail
Paul E. McKenney (1):
selftests/pidfd: Fix duplicate-symbol warnings for SCHED_ CPP symbols
Shuah Khan (1):
selftests: print installation complete message
Steven Rostedt (1):
selftests/tracing: Fix false failure of subsystem event test
Tianyi Cui (1):
selftests: Add version file to kselftest installation dir
tools/testing/selftests/Makefile | 8 ++++
.../breakpoints/step_after_suspend_test.c | 41 ++++++++++++++-----
.../selftests/cpu-hotplug/cpu-on-off-test.sh | 4 +-
.../ftrace/test.d/event/subsystem-enable.tc | 28 ++++++++++++-
.../ftrace/test.d/ftrace/func-filter-glob.tc | 2 +-
tools/testing/selftests/ipc/msgque.c | 47 +++++++++++-----------
tools/testing/selftests/kexec/Makefile | 2 +-
tools/testing/selftests/pidfd/pidfd.h | 9 +++++
tools/testing/selftests/ptrace/peeksiginfo.c | 2 +-
9 files changed, 102 insertions(+), 41 deletions(-)
----------------------------------------------------------------
Hi Linus,
Please pull the following kunit next update for Linux 6.17-rc1.
Corrects MODULE_IMPORT_NS() syntax documentation, makes kunit_test timeout
configurable via a module parameter and a Kconfig option, fixes longest
symbol length test, adds a test for static stub, and adjusts kunit_test
timeout based on test_{suite,case} speed.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 19272b37aa4f83ca52bdf9c16d5d81bdd1354494:
Linux 6.16-rc1 (2025-06-08 13:44:43 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-6.17-rc1
for you to fetch changes up to 34db4fba81916a2001d7a503dfcf718c08ed5c42:
kunit: fix longest symbol length test (2025-07-10 14:02:07 -0600)
----------------------------------------------------------------
linux_kselftest-kunit-6.17-rc1
Corrects MODULE_IMPORT_NS() syntax documentation, makes kunit_test timeout
configurable via a module parameter and a Kconfig option, fixes longest
symbol length test, adds a test for static stub, and adjusts kunit_test
timeout based on test_{suite,case} speed.
----------------------------------------------------------------
Brian Norris (1):
Documentation: kunit: Correct MODULE_IMPORT_NS() syntax
Marie Zhussupova (1):
kunit: Make default kunit_test timeout configurable via both a module parameter and a Kconfig option
Sergio González Collado (1):
kunit: fix longest symbol length test
Tzung-Bi Shih (1):
kunit: Add test for static stub
Ujwal Jain (1):
kunit: Adjust kunit_test timeout based on test_{suite,case} speed
Documentation/dev-tools/kunit/usage.rst | 2 +-
include/kunit/try-catch.h | 1 +
lib/Kconfig.debug | 1 +
lib/kunit/Kconfig | 13 ++++++++
lib/kunit/kunit-test.c | 55 ++++++++++++++++++++++++++++++---
lib/kunit/test.c | 47 ++++++++++++++++++++++++++--
lib/kunit/try-catch-impl.h | 4 ++-
lib/kunit/try-catch.c | 29 ++---------------
lib/tests/longest_symbol_kunit.c | 3 +-
9 files changed, 118 insertions(+), 37 deletions(-)
----------------------------------------------------------------
Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
infrastructure.
---
Changes in v5:
- Rely on __ipv4_addr_hash() to compute the hash used as encap ID
- Remove unnecessary pskb_may_pull() in nf_flow_tuple_encap()
- Add nf_flow_ip4_ecanp_pop utility routine
- Link to v4: https://lore.kernel.org/r/20250718-nf-flowtable-ipip-v4-0-f8bb1c18b986@kern…
Changes in v4:
- Use the hash value of the saddr, daddr and protocol of outer IP header as
encapsulation id.
- Link to v3: https://lore.kernel.org/r/20250703-nf-flowtable-ipip-v3-0-880afd319b9f@kern…
Changes in v3:
- Add outer IP header sanity checks
- target nf-next tree instead of net-next
- Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kern…
Changes in v2:
- Introduce IPIP flowtable selftest
- Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kern…
---
Lorenzo Bianconi (2):
net: netfilter: Add IPIP flowtable SW acceleration
selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
include/linux/netdevice.h | 1 +
net/ipv4/ipip.c | 28 +++++++++++
net/netfilter/nf_flow_table_ip.c | 56 +++++++++++++++++++++-
net/netfilter/nft_flow_offload.c | 1 +
.../selftests/net/netfilter/nft_flowtable.sh | 40 ++++++++++++++++
5 files changed, 124 insertions(+), 2 deletions(-)
---
base-commit: dd500e4aecf25e48e874ca7628697969df679493
change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067
Best regards,
--
Lorenzo Bianconi <lorenzo(a)kernel.org>
Show precise rejected function when attaching fexit/fmod_ret to __noreturn
functions.
Add log for attaching tracing programs to functions in deny list.
Add selftest for attaching tracing programs to functions in deny list.
Migrate fexit_noreturns case into tracing_failure test suite.
changes:
v4:
- change tracing_deny case attaching function (Yonghong Song)
- add Acked-by: Yafang Shao and Yonghong Song
v3:
- add tracing_deny case into existing files (Alexei)
- migrate fexit_noreturns into tracing_failure
- change SOB
https://lore.kernel.org/bpf/20250722153434.20571-1-kafai.wan@linux.dev/
v2:
- change verifier log message (Alexei)
- add missing Suggested-by
https://lore.kernel.org/bpf/20250714120408.1627128-1-mannkafai@gmail.com/
v1:
https://lore.kernel.org/all/20250710162717.3808020-1-mannkafai@gmail.com/
---
KaFai Wan (4):
bpf: Show precise rejected function when attaching fexit/fmod_ret to
__noreturn functions
bpf: Add log for attaching tracing programs to functions in deny list
selftests/bpf: Add selftest for attaching tracing programs to
functions in deny list
selftests/bpf: Migrate fexit_noreturns case into tracing_failure test
suite
kernel/bpf/verifier.c | 5 +-
.../bpf/prog_tests/fexit_noreturns.c | 9 ----
.../bpf/prog_tests/tracing_failure.c | 52 +++++++++++++++++++
.../selftests/bpf/progs/fexit_noreturns.c | 15 ------
.../selftests/bpf/progs/tracing_failure.c | 12 +++++
5 files changed, 68 insertions(+), 25 deletions(-)
delete mode 100644 tools/testing/selftests/bpf/prog_tests/fexit_noreturns.c
delete mode 100644 tools/testing/selftests/bpf/progs/fexit_noreturns.c
--
2.43.0