When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array and PC-relative addressing mode for global variable,
e.g. "1@-96(%rbp,%rax,8)" and "-1@4+t1(%rip)".
The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).
This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
`parse_usdt_arg`.
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
argument spec.
- change the global variable t1 to a local variable, to avoid compiler
generating PC-relative addressing mode for it.
Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.
The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how
Change since v2:
- fix the `scale` uninitialized error
Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
and pass all test cases.
Do we need to add support for PC-relative USDT argument spec handling in libbpf?
I have some interest in this question, but currently have no ideas. Getting offsets
based on symbols requires dependency on the symbol table. However, once the binary
file is stripped, the symtab will also be removed, which will cause this approach
to fail. Does anyone have any thoughts on this?
Jiawei Zhao (1):
libbpf: fix USDT SIB argument handling causing unrecognized register
error
tools/lib/bpf/usdt.bpf.h | 33 +++++++++++++-
tools/lib/bpf/usdt.c | 43 ++++++++++++++++---
tools/testing/selftests/bpf/Makefile | 5 +++
tools/testing/selftests/bpf/prog_tests/usdt.c | 18 +++++---
4 files changed, 86 insertions(+), 13 deletions(-)
--
2.43.0
With /proc/pid/maps now being read under per-vma lock protection we can
reuse parts of that code to execute PROCMAP_QUERY ioctl also without
taking mmap_lock. The change is designed to reduce mmap_lock contention
and prevent PROCMAP_QUERY ioctl calls from blocking address space updates.
This patchset was split out of the original patchset [1] that introduced
per-vma lock usage for /proc/pid/maps reading. It contains PROCMAP_QUERY
tests, code refactoring patch to simplify the main change and the actual
transition to per-vma lock.
[1] https://lore.kernel.org/all/20250704060727.724817-1-surenb@google.com/
Suren Baghdasaryan (3):
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
fs/proc/task_mmu: factor out proc_maps_private fields used by
PROCMAP_QUERY
fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks
fs/proc/internal.h | 15 +-
fs/proc/task_mmu.c | 149 ++++++++++++------
tools/testing/selftests/proc/proc-maps-race.c | 65 ++++++++
3 files changed, 174 insertions(+), 55 deletions(-)
base-commit: 01da54f10fddf3b01c5a3b80f6b16bbad390c302
--
2.50.1.565.gc32cd1483b-goog
The step_after_suspend_test verifies that the system successfully
suspended and resumed by setting a timerfd and checking whether the
timer fully expired. However, this method is unreliable due to timing
races.
In practice, the system may take time to enter suspend, during which the
timer may expire just before or during the transition. As a result,
the remaining time after resume may show non-zero nanoseconds, even if
suspend/resume completed successfully. This leads to false test failures.
Replace the timer-based check with a read from
/sys/power/suspend_stats/success. This counter is incremented only
after a full suspend/resume cycle, providing a reliable and race-free
indicator.
Also remove the unused file descriptor for /sys/power/state, which
remained after switching to a system() call to trigger suspend [1].
[1] https://lore.kernel.org/all/20240930224025.2858767-1-yifei.l.liu@oracle.com/
Fixes: c66be905cda2 ("selftests: breakpoints: use remaining time to check if suspend succeed")
Signed-off-by: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
---
.../breakpoints/step_after_suspend_test.c | 41 ++++++++++++++-----
1 file changed, 31 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index 8d275f03e977..8d233ac95696 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -127,22 +127,42 @@ int run_test(int cpu)
return KSFT_PASS;
}
+/*
+ * Reads the suspend success count from sysfs.
+ * Returns the count on success or exits on failure.
+ */
+static int get_suspend_success_count_or_fail(void)
+{
+ FILE *fp;
+ int val;
+
+ fp = fopen("/sys/power/suspend_stats/success", "r");
+ if (!fp)
+ ksft_exit_fail_msg(
+ "Failed to open suspend_stats/success: %s\n",
+ strerror(errno));
+
+ if (fscanf(fp, "%d", &val) != 1) {
+ fclose(fp);
+ ksft_exit_fail_msg(
+ "Failed to read suspend success count\n");
+ }
+
+ fclose(fp);
+ return val;
+}
+
void suspend(void)
{
- int power_state_fd;
int timerfd;
int err;
+ int count_before;
+ int count_after;
struct itimerspec spec = {};
if (getuid() != 0)
ksft_exit_skip("Please run the test as root - Exiting.\n");
- power_state_fd = open("/sys/power/state", O_RDWR);
- if (power_state_fd < 0)
- ksft_exit_fail_msg(
- "open(\"/sys/power/state\") failed %s)\n",
- strerror(errno));
-
timerfd = timerfd_create(CLOCK_BOOTTIME_ALARM, 0);
if (timerfd < 0)
ksft_exit_fail_msg("timerfd_create() failed\n");
@@ -152,14 +172,15 @@ void suspend(void)
if (err < 0)
ksft_exit_fail_msg("timerfd_settime() failed\n");
+ count_before = get_suspend_success_count_or_fail();
+
system("(echo mem > /sys/power/state) 2> /dev/null");
- timerfd_gettime(timerfd, &spec);
- if (spec.it_value.tv_sec != 0 || spec.it_value.tv_nsec != 0)
+ count_after = get_suspend_success_count_or_fail();
+ if (count_after <= count_before)
ksft_exit_fail_msg("Failed to enter Suspend state\n");
close(timerfd);
- close(power_state_fd);
}
int main(int argc, char **argv)
--
2.43.0
This patch fixes unstable LACP negotiation when bonding is configured in
passive mode (`lacp_active=off`).
Previously, the actor would stop sending LACPDUs after initial negotiation
succeeded, leading to the partner timing out and restarting the negotiation
cycle. This resulted in continuous LACP state flapping.
The fix ensures the passive actor starts sending periodic LACPDUs after
receiving the first LACPDU from the partner, in accordance with IEEE
802.1AX-2020 section 6.4.1.
Out of topic:
Although this patch addresses a functional bug and could be considered for
`net`, I'm slightly concerned about potential regressions, as it changes
the current bonding LACP protocol behavior.
It might be safer to merge this through `net-next` first to allow broader
testing. Thoughts?
Hangbin Liu (2):
bonding: send LACPDUs periodically in passive mode after receiving
partner's LACPDU
selftests: bonding: add test for passive LACP mode
drivers/net/bonding/bond_3ad.c | 72 ++++++++++----
drivers/net/bonding/bond_options.c | 1 +
include/net/bond_3ad.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_passive_lacp.sh | 93 +++++++++++++++++++
5 files changed, 151 insertions(+), 19 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh
--
2.46.0
There are several situations where VMM is involved when handling
synchronous external instruction or data aborts, and often VMM
needs to inject external aborts to guest. In addition to manipulating
individual registers with KVM_SET_ONE_REG API, an easier way is to
use the KVM_SET_VCPU_EVENTS API.
This patchset adds two new features to the KVM_SET_VCPU_EVENTS API.
1. Extend KVM_SET_VCPU_EVENTS to support external instruction abort.
2. Allow userspace to emulate ESR_ELx.ISS by supplying ESR_ELx.
In this way, we can also allow userspace to emulate ESR_ELx.ISS2
in future.
The UAPI change for #1 is straightforward. However, I would appreciate
some feedback on the ABI change for #2:
struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
__u8 ext_iabt_pending;
__u8 ext_abt_has_esr;
__u8 pad[3];
__u64 serror_esr;
__u64 ext_abt_esr; // <= +8 bytes
} exception;
__u32 reserved[10]; // <= -8 bytes
};
The offset to kvm_vcpu_events.reserved changes, and the size of
exception changes. I think we can't say userspace will never access
reserved, or they will never use sizeof(exception). Theoretically this
is an ABI break and I want to call it out and ask if a new ABI is needed
for feature #2. For example, is it worthy to introduce exception_v2
or kvm_vcpu_events_v2.
Based on commit 7b8346bd9fce6 ("KVM: arm64: Don't attempt vLPI mappings
when vPE allocation is disabled")
Jiaqi Yan (3):
KVM: arm64: Allow userspace to supply ESR when injecting SEA
KVM: selftests: Test injecting external abort with ISS
Documentation: kvm: update UAPI for injecting SEA
Raghavendra Rao Ananta (1):
KVM: arm64: Allow userspace to inject external instruction abort
Documentation/virt/kvm/api.rst | 48 +++--
arch/arm64/include/asm/kvm_emulate.h | 9 +-
arch/arm64/include/uapi/asm/kvm.h | 7 +-
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/emulate-nested.c | 6 +-
arch/arm64/kvm/guest.c | 42 ++--
arch/arm64/kvm/inject_fault.c | 16 +-
include/uapi/linux/kvm.h | 1 +
tools/arch/arm64/include/uapi/asm/kvm.h | 7 +-
.../selftests/kvm/arm64/external_aborts.c | 191 +++++++++++++++---
.../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++++++
11 files changed, 352 insertions(+), 74 deletions(-)
create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
--
2.50.1.565.gc32cd1483b-goog