Hi,
===== START =====
TEST: enq_last_no_enq_fails
DESCRIPTION: Verify we fail to load a scheduler if we specify the SCX_OPS_ENQ_LAST flag without defining ops.enqueue()
OUTPUT:
ERR: enq_last_no_enq_fails.c:35
Incorrectly succeeded in to attaching scheduler
not ok 2 enq_last_no_enq_fails #
===== END =====
Above selftest fails even when BPF scheduler is not loaded into the kernel.
Below is snippet from the dmesg verifing bpf program was not loaded:
sched_ext: enq_last_no_enq_fails: SCX_OPS_ENQ_LAST requires ops.enqueue() to be implemented
scx_ops_enable.isra.0+0xde8/0xe30
bpf_struct_ops_link_create+0x1ac/0x240
link_create+0x178/0x400
__sys_bpf+0x7ac/0xd50
sys_bpf+0x2c/0x70
system_call_exception+0x148/0x310
system_call_vectored_common+0x15c/0x2ec
sched_ext: "enq_select_cpu_fails" does not implement cgroup cpu.weight
sched_ext: BPF scheduler "enq_select_cpu_fails" enabled
sched_ext: BPF scheduler "enq_select_cpu_fails" disabled (runtime error)
static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
{
...
ret = validate_ops(ops);
if (ret)
goto err_disable;
...
err_disable:
mutex_unlock(&scx_ops_enable_mutex);
/*
* Returning an error code here would not pass all the error information
* to userspace. Record errno using scx_ops_error() for cases
* scx_ops_error() wasn't already invoked and exit indicating success so
* that the error is notified through ops.exit() with all the details.
*
* Flush scx_ops_disable_work to ensure that error is reported before
* init completion.
*/
scx_ops_error("scx_ops_enable() failed (%d)", ret);
kthread_flush_work(&scx_ops_disable_work);
return 0;
}
validate_ops() correctly reports the error, but err_disable path ultimately
returns with a value of zero
from: enq_last_no_enq_fails.c
static enum scx_test_status run(void *ctx)
{
struct enq_last_no_enq_fails *skel = ctx;
struct bpf_link *link;
link = bpf_map__attach_struct_ops(skel->maps.enq_last_no_enq_fails_ops);
if (link) {
SCX_ERR("Incorrectly succeeded in to attaching scheduler");
return SCX_TEST_FAIL;
}
bpf_link__destroy(link);
return SCX_TEST_PASS;
}
From: Jeff Xu <jeffxu(a)google.com>
Two fixes for madvise(MADV_DONTNEED) when sealed.
For PROT_NONE mappings, the previous blocking of
madvise(MADV_DONTNEED) is unnecessary. As PROT_NONE already prohibits
memory access, madvise(MADV_DONTNEED) should be allowed to proceed in
order to free the page.
For file-backed, private, read-only memory mappings, we previously did
not block the madvise(MADV_DONTNEED). This was based on
the assumption that the memory's content, being file-backed, could be
retrieved from the file if accessed again. However, this assumption
failed to consider scenarios where a mapping is initially created as
read-write, modified, and subsequently changed to read-only. The newly
introduced VM_WASWRITE flag addresses this oversight.
Jeff Xu (2):
mseal: Two fixes for madvise(MADV_DONTNEED) when sealed
selftest/mseal: Add tests for madvise
include/linux/mm.h | 2 +
mm/mprotect.c | 3 +
mm/mseal.c | 42 +++++++--
tools/testing/selftests/mm/mseal_test.c | 118 +++++++++++++++++++++++-
4 files changed, 157 insertions(+), 8 deletions(-)
--
2.47.0.rc1.288.g06298d1525-goog
Hi
Note for V12:
There was a small conflict between the Intel PT changes in
"KVM: x86: Fix Intel PT Host/Guest mode when host tracing" and the
changes in this patch set, so I have put the patch sets together,
along with outstanding fix "perf/x86/intel/pt: Fix buffer full but
size is 0 case"
Cover letter for KVM changes (patches 2 to 4):
There is a long-standing problem whereby running Intel PT on host and guest
in Host/Guest mode, causes VM-Entry failure.
The motivation for this patch set is to provide a fix for stable kernels
prior to the advent of the "Mediated Passthrough vPMU" patch set:
https://lore.kernel.org/kvm/20240801045907.4010984-1-mizhang@google.com/
which would render a large part of the fix unnecessary but likely not be
suitable for backport to stable due to its size and complexity.
Ideally, this patch set would be applied before "Mediated Passthrough vPMU"
Note that the fix does not conflict with "Mediated Passthrough vPMU", it
is just that "Mediated Passthrough vPMU" will make the code to stop and
restart Intel PT unnecessary.
Note for V11:
Moving aux_paused into a union within struct hw_perf_event caused
a regression because aux_paused was being written unconditionally
even though it is valid only for AUX (e.g. Intel PT) PMUs.
That is fixed in V11.
Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.
The ability to pause or resume tracing when another event happens, can do
that.
These patches add such a facilty and show how it would work for Intel
Processor Trace.
Maintainers of other AUX area tracing implementations are requested to
consider if this is something they might employ and then whether or not
the ABI would work for them. Note, thank you to James Clark (ARM) for
evaluating the API for Coresight. Suzuki K Poulose (ARM) also responded
positively to the RFC.
Changes to perf tools are now (since V4) fleshed out.
Please note, Intel® Architecture Instruction Set Extensions and Future
Features Programming Reference March 2024 319433-052, currently:
https://cdrdv2.intel.com/v1/dl/getContent/671368
introduces hardware pause / resume for Intel PT in a feature named
Intel PT Trigger Tracing.
For that more fields in perf_event_attr will be necessary. The main
differences are:
- it can be applied not just to overflows, but optionally to
every event
- a packet is emitted into the trace, optionally with IP
information
- no PMI
- works with PMC and DR (breakpoint) events only
Here are the proposed additions to perf_event_attr, please comment:
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 0c557f0a17b3..05dcc43f11bb 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -369,6 +369,22 @@ enum perf_event_read_format {
PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
};
+enum {
+ PERF_AUX_ACTION_START_PAUSED = 1U << 0,
+ PERF_AUX_ACTION_PAUSE = 1U << 1,
+ PERF_AUX_ACTION_RESUME = 1U << 2,
+ PERF_AUX_ACTION_EMIT = 1U << 3,
+ PERF_AUX_ACTION_NR = 0x1f << 4,
+ PERF_AUX_ACTION_NO_IP = 1U << 9,
+ PERF_AUX_ACTION_PAUSE_ON_EVT = 1U << 10,
+ PERF_AUX_ACTION_RESUME_ON_EVT = 1U << 11,
+ PERF_AUX_ACTION_EMIT_ON_EVT = 1U << 12,
+ PERF_AUX_ACTION_NR_ON_EVT = 0x1f << 13,
+ PERF_AUX_ACTION_NO_IP_ON_EVT = 1U << 18,
+ PERF_AUX_ACTION_MASK = ~PERF_AUX_ACTION_START_PAUSED,
+ PERF_AUX_PAUSE_RESUME_MASK = PERF_AUX_ACTION_PAUSE | PERF_AUX_ACTION_RESUME,
+};
+
#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */
#define PERF_ATTR_SIZE_VER1 72 /* add: config2 */
#define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */
@@ -515,10 +531,19 @@ struct perf_event_attr {
union {
__u32 aux_action;
struct {
- __u32 aux_start_paused : 1, /* start AUX area tracing paused */
- aux_pause : 1, /* on overflow, pause AUX area tracing */
- aux_resume : 1, /* on overflow, resume AUX area tracing */
- __reserved_3 : 29;
+ __u32 aux_start_paused : 1, /* start AUX area tracing paused */
+ aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ aux_emit : 1, /* generate AUX records instead of events */
+ aux_nr : 5, /* AUX area tracing reference number */
+ aux_no_ip : 1, /* suppress IP in AUX records */
+ /* Following apply to event occurrence not overflows */
+ aux_pause_on_evt : 1, /* on event, pause AUX area tracing */
+ aux_resume_on_evt : 1, /* on event, resume AUX area tracing */
+ aux_emit_on_evt : 1, /* generate AUX records instead of events */
+ aux_nr_on_evt : 5, /* AUX area tracing reference number */
+ aux_no_ip_on_evt : 1, /* suppress IP in AUX records */
+ __reserved_3 : 13;
};
};
Changes in V13:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Do aux_resume at the end of __perf_event_overflow() so as to trace
less of perf itself
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
Add error message also in EOPNOTSUPP case (Leo)
Changes in V12:
Add previously sent patch "perf/x86/intel/pt: Fix buffer full
but size is 0 case"
Add previously sent patch set "KVM: x86: Fix Intel PT Host/Guest
mode when host tracing"
Rebase on current tip plus patch set "KVM: x86: Fix Intel PT Host/Guest
mode when host tracing"
Changes in V11:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Make assignment to event->hw.aux_paused conditional on
(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE).
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
Remove definition of has_aux_action() because it has
already been added as an inline function.
perf/x86/intel/pt: Fix sampling synchronization
perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64
perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF
Dropped because they have already been applied
Changes in V10:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Move aux_paused into a union within struct hw_perf_event.
Additional comment wrt PERF_EF_PAUSE/PERF_EF_RESUME.
Factor out has_aux_action() as an inline function.
Use scoped_guard for irqsave.
Move calls of perf_event_aux_pause() from __perf_event_output()
to __perf_event_overflow().
Changes in V9:
perf/x86/intel/pt: Fix sampling synchronization
New patch
perf/core: Add aux_pause, aux_resume, aux_start_paused
Move aux_paused to struct hw_perf_event
perf/x86/intel/pt: Add support for pause / resume
Add more comments and barriers for resume_allowed and
pause_allowed
Always use WRITE_ONCE with resume_allowed
Changes in V8:
perf tools: Parse aux-action
Fix clang warning:
util/auxtrace.c:821:7: error: missing field 'aux_action' initializer [-Werror,-Wmissing-field-initializers]
821 | {NULL},
| ^
Changes in V7:
Add Andi's Reviewed-by for patches 2-12
Re-base
Changes in V6:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Removed READ/WRITE_ONCE from __perf_event_aux_pause()
Expanded comment about guarding against NMI
Changes in V5:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Added James' Ack
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
New patch
perf tools
Added Ian's Ack
Changes in V4:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Rename aux_output_cfg -> aux_action
Reorder aux_action bits from:
aux_pause, aux_resume, aux_start_paused
to:
aux_start_paused, aux_pause, aux_resume
Fix aux_action bits __u64 -> __u32
coresight: Have a stab at support for pause / resume
Dropped
perf tools
All new patches
Changes in RFC V3:
coresight: Have a stab at support for pause / resume
'mode' -> 'flags' so it at least compiles
Changes in RFC V2:
Use ->stop() / ->start() instead of ->pause_resume()
Move aux_start_paused bit into aux_output_cfg
Tighten up when Intel PT pause / resume is allowed
Add an example of how it might work for CoreSight
Adrian Hunter (14):
perf/x86/intel/pt: Fix buffer full but size is 0 case
KVM: x86: Fix Intel PT IA32_RTIT_CTL MSR validation
KVM: x86: Fix Intel PT Host/Guest mode when host tracing also
KVM: selftests: Add guest Intel PT test
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Add support for pause / resume
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
perf tools: Add aux_start_paused, aux_pause and aux_resume
perf tools: Add aux-action config term
perf tools: Parse aux-action
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
perf intel-pt: Improve man page format
perf intel-pt: Add documentation for pause / resume
perf intel-pt: Add a test for pause / resume
arch/x86/events/intel/core.c | 4 +-
arch/x86/events/intel/pt.c | 209 +++++++-
arch/x86/events/intel/pt.h | 16 +
arch/x86/include/asm/intel_pt.h | 4 +
arch/x86/kvm/vmx/vmx.c | 26 +-
arch/x86/kvm/vmx/vmx.h | 1 -
include/linux/perf_event.h | 28 +
include/uapi/linux/perf_event.h | 11 +-
kernel/events/core.c | 75 ++-
kernel/events/internal.h | 1 +
tools/include/uapi/linux/perf_event.h | 11 +-
tools/perf/Documentation/perf-intel-pt.txt | 596 +++++++++++++--------
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/builtin-record.c | 4 +-
tools/perf/tests/shell/test_intel_pt.sh | 28 +
tools/perf/util/auxtrace.c | 67 ++-
tools/perf/util/auxtrace.h | 6 +-
tools/perf/util/evsel.c | 15 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/evsel_config.h | 1 +
tools/perf/util/parse-events.c | 10 +
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 3 +
tools/perf/util/pmu.c | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
tools/testing/selftests/kvm/x86_64/intel_pt.c | 381 +++++++++++++
28 files changed, 1243 insertions(+), 264 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/intel_pt.c
Regards
Adrian
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v4->v5 changes:
- Handled comment by Stanislav Fomichev to fix a print format error.
- Made thread.c completely automated by starting proc_filter program
from within threads.c.
- Changed name CONFIG_CN_HASH_KUNIT_TEST to CN_HASH_KUNIT_TEST in
Kconfig.debug and changed display text.
v3->v4 changes:
- Reduce size of exit.log by removing unnecessary text.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it.
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one in cn_hash.c
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Handled comment by Liam Howlett to make threads test automated.
proc_filter.c creates exit.log, which is read by thread.c and checks
the values reported.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 +++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 232 ++++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 967 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ]
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/progs/verifier_scalar_ids.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 13b29a7faa71a..d24d3a36ec144 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -656,4 +656,71 @@ __naked void two_old_ids_one_cur_id(void)
: __clobber_all);
}
+SEC("socket")
+/* Note the flag, see verifier.c:opt_subreg_zext_lo32_rnd_hi32() */
+__flag(BPF_F_TEST_RND_HI32)
+__success
+/* This test was added because of a bug in verifier.c:sync_linked_regs(),
+ * upon range propagation it destroyed subreg_def marks for registers.
+ * The subreg_def mark is used to decide whether zero extension instructions
+ * are needed when register is read. When BPF_F_TEST_RND_HI32 is set it
+ * also causes generation of statements to randomize upper halves of
+ * read registers.
+ *
+ * The test is written in a way to return an upper half of a register
+ * that is affected by range propagation and must have it's subreg_def
+ * preserved. This gives a return value of 0 and leads to undefined
+ * return value if subreg_def mark is not preserved.
+ */
+__retval(0)
+/* Check that verifier believes r1/r0 are zero at exit */
+__log_level(2)
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+__msg("from 3 to 4")
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+/* Verify that statements to randomize upper half of r1 had not been
+ * generated.
+ */
+__xlated("call unknown")
+__xlated("r0 &= 2147483647")
+__xlated("w1 = w0")
+/* This is how disasm.c prints BPF_ZEXT_REG at the moment, x86 and arm
+ * are the only CI archs that do not need zero extension for subregs.
+ */
+#if !defined(__TARGET_ARCH_x86) && !defined(__TARGET_ARCH_arm64)
+__xlated("w1 = w1")
+#endif
+__xlated("if w0 < 0xa goto pc+0")
+__xlated("r1 >>= 32")
+__xlated("r0 = r1")
+__xlated("exit")
+__naked void linked_regs_and_subreg_def(void)
+{
+ asm volatile (
+ "call %[bpf_ktime_get_ns];"
+ /* make sure r0 is in 32-bit range, otherwise w1 = w0 won't
+ * assign same IDs to registers.
+ */
+ "r0 &= 0x7fffffff;"
+ /* link w1 and w0 via ID */
+ "w1 = w0;"
+ /* 'if' statement propagates range info from w0 to w1,
+ * but should not affect w1->subreg_def property.
+ */
+ "if w0 < 10 goto +0;"
+ /* r1 is read here, on archs that require subreg zero
+ * extension this would cause zext patch generation.
+ */
+ "r1 >>= 32;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_ktime_get_ns)
+ : __clobber_all);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ]
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/progs/verifier_scalar_ids.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 13b29a7faa71a..d24d3a36ec144 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -656,4 +656,71 @@ __naked void two_old_ids_one_cur_id(void)
: __clobber_all);
}
+SEC("socket")
+/* Note the flag, see verifier.c:opt_subreg_zext_lo32_rnd_hi32() */
+__flag(BPF_F_TEST_RND_HI32)
+__success
+/* This test was added because of a bug in verifier.c:sync_linked_regs(),
+ * upon range propagation it destroyed subreg_def marks for registers.
+ * The subreg_def mark is used to decide whether zero extension instructions
+ * are needed when register is read. When BPF_F_TEST_RND_HI32 is set it
+ * also causes generation of statements to randomize upper halves of
+ * read registers.
+ *
+ * The test is written in a way to return an upper half of a register
+ * that is affected by range propagation and must have it's subreg_def
+ * preserved. This gives a return value of 0 and leads to undefined
+ * return value if subreg_def mark is not preserved.
+ */
+__retval(0)
+/* Check that verifier believes r1/r0 are zero at exit */
+__log_level(2)
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+__msg("from 3 to 4")
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+/* Verify that statements to randomize upper half of r1 had not been
+ * generated.
+ */
+__xlated("call unknown")
+__xlated("r0 &= 2147483647")
+__xlated("w1 = w0")
+/* This is how disasm.c prints BPF_ZEXT_REG at the moment, x86 and arm
+ * are the only CI archs that do not need zero extension for subregs.
+ */
+#if !defined(__TARGET_ARCH_x86) && !defined(__TARGET_ARCH_arm64)
+__xlated("w1 = w1")
+#endif
+__xlated("if w0 < 0xa goto pc+0")
+__xlated("r1 >>= 32")
+__xlated("r0 = r1")
+__xlated("exit")
+__naked void linked_regs_and_subreg_def(void)
+{
+ asm volatile (
+ "call %[bpf_ktime_get_ns];"
+ /* make sure r0 is in 32-bit range, otherwise w1 = w0 won't
+ * assign same IDs to registers.
+ */
+ "r0 &= 0x7fffffff;"
+ /* link w1 and w0 via ID */
+ "w1 = w0;"
+ /* 'if' statement propagates range info from w0 to w1,
+ * but should not affect w1->subreg_def property.
+ */
+ "if w0 < 10 goto +0;"
+ /* r1 is read here, on archs that require subreg zero
+ * extension this would cause zext patch generation.
+ */
+ "r1 >>= 32;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_ktime_get_ns)
+ : __clobber_all);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v2
* The macros in kselftest_harness.h seem to be broken - __EXPECT() is
terminated by '} while (0); OPTIONAL_HANDLER(_assert)' meaning it is not
safe in single line if / else or for /which blocks, however working
around this results in checkpatch producing invalid warnings, as reported
by Shuah.
* Fixing these macros is out of scope for this series, so compromise and
instead rewrite test blocks so as to use multiple lines by separating out
a decl in most cases. This has the side effect of, for the most part,
making things more readable.
* Heavily document the use of the volatile keyword - we can't avoid
checkpatch complaining about this, so we explain it, as reported by
Shuah.
* Updated commit message to highlight that we skip tests we lack
permissions for, as reported by Shuah.
* Replaced a perror() with ksft_exit_fail_perror(), as reported by Shuah.
* Added user friendly messages to cases where tests are skipped due to lack
of permissions, as reported by Shuah.
* Update the tool header to include the new MADV_GUARD_POISON/UNPOISON
defines and directly include asm-generic/mman.h to get the
platform-neutral versions to ensure we import them.
* Finally fixed Vlastimil's email address in Suggested-by tags from suze to
suse, as reported by Vlastimil.
* Added linux-api to cc list, as reported by Vlastimil.
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
https://lore.kernel.org/all/cover.1729196871.git.lorenzo.stoakes@oracle.com/
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
tools: testing: update tools UAPI header for mman-common.h
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 +++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++-
tools/include/uapi/asm-generic/mman-common.h | 3 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1228 ++++++++++++++++++
19 files changed, 1627 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.47.0
Currently if we encounter an error between fork() and exec() of a child
process we log the error to stderr. This means that the errors don't get
annotated with the child information which makes diagnostics harder and
means that if we miss the exit signal from the child we can deadlock
waiting for output from the child. Improve robustness and output quality
by logging to stdout instead.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/fp-stress.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c
index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..80f22789504d661efc52a90d4b0893fbebec42f8 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -79,7 +79,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(pipefd[1], 1);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -89,7 +89,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(startup_pipe[0], 3);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -107,16 +107,15 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = read(3, &i, sizeof(i));
if (ret < 0)
- fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
- strerror(errno), errno);
+ printf("read(startp pipe) failed: %s (%d)\n",
+ strerror(errno), errno);
if (ret > 0)
- fprintf(stderr, "%d bytes of data on startup pipe\n",
- ret);
+ printf("%d bytes of data on startup pipe\n", ret);
close(3);
ret = execl(program, program, NULL);
- fprintf(stderr, "execl(%s) failed: %d (%s)\n",
- program, errno, strerror(errno));
+ printf("execl(%s) failed: %d (%s)\n",
+ program, errno, strerror(errno));
exit(EXIT_FAILURE);
} else {
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-arm64-fp-stress-exec-fail-d074ec82cf43
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This patch series migrates test cases out of test_sock.c to
prog_tests-style tests. It moves all BPF_CGROUP_INET4_POST_BIND and
BPF_CGROUP_INET6_POST_BIND test cases into a new prog_test,
sock_post_bind.c, while reimplementing all LOAD_REJECT test cases as
verifier tests in progs/verifier_sock.c. Finally, it moves remaining
BPF_CGROUP_INET_SOCK_CREATE test coverage into prog_tests/sock_create.c
before retiring test_sock.c completely.
Changes
=======
v1->v2:
- Remove superfluous verbose bool from the top of sock_post_bind.c.
- Use ASSERT_OK_FD instead of ASSERT_GE to test cgroup_fd validity.
- Run sock_post_bind tests in their own namespace, "sock_post_bind".
Jordan Rife (4):
selftests/bpf: Migrate *_POST_BIND test cases to prog_tests
selftests/bpf: Migrate LOAD_REJECT test cases to prog_tests
selftests/bpf: Migrate BPF_CGROUP_INET_SOCK_CREATE test cases to
prog_tests
selftests/bpf: Retire test_sock.c
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/prog_tests/sock_create.c | 35 ++-
.../sock_post_bind.c} | 256 +++++-------------
.../selftests/bpf/progs/verifier_sock.c | 60 ++++
5 files changed, 150 insertions(+), 205 deletions(-)
rename tools/testing/selftests/bpf/{test_sock.c => prog_tests/sock_post_bind.c} (64%)
--
2.47.0.105.g07ac214952-goog
Hi Zheng,
Cc-ed kunit folks, as we usually do for DAMON kunit test changes.
On Tue, 22 Oct 2024 16:39:27 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
> As discussed in [1], damon_va_evenly_split_region() is called to
> size-evenly split a region into 'nr_pieces' small regions,
> when nr_pieces == 1, no actual split is required. Check that case
> for better code readability and add a simple kunit testcase.
>
> [1] https://lore.kernel.org/all/20241021163316.12443-1-sj@kernel.org/
>
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Thanks,
SJ
[...]
Hi Zheng,
We Cc kunit folks for any DAMON kunit test changes, so I Cc-ed them.
On Tue, 22 Oct 2024 16:39:26 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
> According to the logic of damon_va_evenly_split_region(), currently
> following split case would not meet the expectation:
>
> Suppose DAMON_MIN_REGION=0x1000,
> Case: Split [0x0, 0x3000) into 2 pieces, then the result would be
> acutually 3 regions:
> [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000)
> but NOT the expected 2 regions:
> [0x0, 0x1000), [0x1000, 0x3000) !!!
>
> The root cause is that when calculating size of each split piece in
> damon_va_evenly_split_region():
>
> `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);`
>
> both the dividing and the ALIGN_DOWN may cause loss of precision,
> then each time split one piece of size 'sz_piece' from origin 'start' to
> 'end' would cause more pieces are split out than expected!!!
>
> To fix it, count for each piece split and make sure no more than
> 'nr_pieces'. In addition, add above case into damon_test_split_evenly().
>
> After this patch, damon-operations test passed:
Just for a clarification. damon-operations test doesn't fail without this
patch. This patch introduces two changes. A new kunit test, and a bug fix.
Without the bug fix, the new kunit test fails.
I usually prefer separating test changes from fixes (introduc a fix first, and
then the test for it, to avoid unnecessary test failures). But, given the
small size and the simplicity of the kunit change for this patch, I think
introducing it together with the fix is ok.
>
> # ./tools/testing/kunit/kunit.py run damon-operations
> [...]
> ============== damon-operations (6 subtests) ===============
> [PASSED] damon_test_three_regions_in_vmas
> [PASSED] damon_test_apply_three_regions1
> [PASSED] damon_test_apply_three_regions2
> [PASSED] damon_test_apply_three_regions3
> [PASSED] damon_test_apply_three_regions4
> [PASSED] damon_test_split_evenly
> ================ [PASSED] damon-operations =================
>
> Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces")
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Thanks,
SJ
[...]
Thanks for all the reviews.
V5:
Replace /sys/kernel/livepatch also in other/already existing tests.
Improve commit message of 3rd patch.
V4:
Use variable for /sys/kernel/debug.
Be consistent with "" around variables.
Fix path in commit message to /sys/kernel/debug/kprobes/enabled.
V3:
Save and restore kprobe state also when test fails, by integrating it
into setup_config() and cleanup().
Rename SYSFS variables in a more logical way.
Sort test modules in alphabetical order.
Rename module description.
V2:
Save and restore kprobe state.
Michael Vetter (3):
selftests: livepatch: rename KLP_SYSFS_DIR to SYSFS_KLP_DIR
selftests: livepatch: save and restore kprobe state
selftests: livepatch: test livepatching a kprobed function
tools/testing/selftests/livepatch/Makefile | 3 +-
.../testing/selftests/livepatch/functions.sh | 29 +++++----
.../selftests/livepatch/test-callbacks.sh | 24 +++----
.../selftests/livepatch/test-ftrace.sh | 2 +-
.../selftests/livepatch/test-kprobe.sh | 62 +++++++++++++++++++
.../selftests/livepatch/test-livepatch.sh | 12 ++--
.../testing/selftests/livepatch/test-state.sh | 8 +--
.../selftests/livepatch/test-syscall.sh | 2 +-
.../testing/selftests/livepatch/test-sysfs.sh | 8 +--
.../selftests/livepatch/test_modules/Makefile | 3 +-
.../livepatch/test_modules/test_klp_kprobe.c | 38 ++++++++++++
11 files changed, 150 insertions(+), 41 deletions(-)
create mode 100755 tools/testing/selftests/livepatch/test-kprobe.sh
create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c
--
2.47.0
For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.
Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 73 -------------------
tools/testing/selftests/net/lib.sh | 73 +++++++++++++++++++
2 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index d28dbf27c1f0..8625e3c99f55 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -445,79 +445,6 @@ done
##############################################################################
# Helpers
-# Whether FAILs should be interpreted as XFAILs. Internal.
-FAIL_TO_XFAIL=
-
-check_err()
-{
- local err=$1
- local msg=$2
-
- if ((err)); then
- if [[ $FAIL_TO_XFAIL = yes ]]; then
- ret_set_ksft_status $ksft_xfail "$msg"
- else
- ret_set_ksft_status $ksft_fail "$msg"
- fi
- fi
-}
-
-check_fail()
-{
- local err=$1
- local msg=$2
-
- check_err $((!err)) "$msg"
-}
-
-check_err_fail()
-{
- local should_fail=$1; shift
- local err=$1; shift
- local what=$1; shift
-
- if ((should_fail)); then
- check_fail $err "$what succeeded, but should have failed"
- else
- check_err $err "$what failed"
- fi
-}
-
-xfail()
-{
- FAIL_TO_XFAIL=yes "$@"
-}
-
-xfail_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW = yes ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
-omit_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW != yes ]]; then
- "$@"
- fi
-}
-
-xfail_on_veth()
-{
- local dev=$1; shift
- local kind
-
- kind=$(ip -j -d link show dev $dev |
- jq -r '.[].linkinfo.info_kind')
- if [[ $kind = veth ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 4f52b8e48a3a..6bcf5d13879d 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -361,3 +361,76 @@ tests_run()
$current_test
done
}
+
+# Whether FAILs should be interpreted as XFAILs. Internal.
+FAIL_TO_XFAIL=
+
+check_err()
+{
+ local err=$1
+ local msg=$2
+
+ if ((err)); then
+ if [[ $FAIL_TO_XFAIL = yes ]]; then
+ ret_set_ksft_status $ksft_xfail "$msg"
+ else
+ ret_set_ksft_status $ksft_fail "$msg"
+ fi
+ fi
+}
+
+check_fail()
+{
+ local err=$1
+ local msg=$2
+
+ check_err $((!err)) "$msg"
+}
+
+check_err_fail()
+{
+ local should_fail=$1; shift
+ local err=$1; shift
+ local what=$1; shift
+
+ if ((should_fail)); then
+ check_fail $err "$what succeeded, but should have failed"
+ else
+ check_err $err "$what failed"
+ fi
+}
+
+xfail()
+{
+ FAIL_TO_XFAIL=yes "$@"
+}
+
+xfail_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW = yes ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
+
+omit_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW != yes ]]; then
+ "$@"
+ fi
+}
+
+xfail_on_veth()
+{
+ local dev=$1; shift
+ local kind
+
+ kind=$(ip -j -d link show dev $dev |
+ jq -r '.[].linkinfo.info_kind')
+ if [[ $kind = veth ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
--
2.45.0
It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.
Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 10 ++++++++++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 41dd14c42c48..d28dbf27c1f0 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1285,16 +1285,6 @@ matchall_sink_create()
action drop
}
-tests_run()
-{
- local current_test
-
- for current_test in ${TESTS:-$ALL_TESTS}; do
- in_defer_scope \
- $current_test
- done
-}
-
cleanup()
{
pre_cleanup
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 691318b1ec55..4f52b8e48a3a 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -351,3 +351,13 @@ log_info()
echo "INFO: $msg"
}
+
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ in_defer_scope \
+ $current_test
+ done
+}
--
2.45.0
This series is a follow-up to Joey's Permission Overlay Extension (POE)
series [1] that recently landed on mainline. The goal is to improve the
way we handle the register that governs which pkeys/POIndex are
accessible (POR_EL0) during signal delivery. As things stand, we may
unexpectedly fail to write the signal frame on the stack because POR_EL0
is not reset before the uaccess operations. See patch 3 for more details
and the main changes this series brings.
A similar series landed recently for x86/MPK [2]; the present series
aims at aligning arm64 with x86. Worth noting: once the signal frame is
written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
only. This means that a program that sets up an alternate signal stack
with a non-zero pkey will need some assembly trampoline to set POR_EL0
before invoking the real signal handler, as discussed here [3].
The x86 series also added kselftests to ensure that no spurious SIGSEGV
occurs during signal delivery regardless of which pkey is accessible at
the point where the signal is delivered. This series adapts those
kselftests to allow running them on arm64 (patch 4-5).
Finally patch 2 is a clean-up following feedback on Joey's series [4].
I have tested this series on arm64 and x86_64 (booting and running the
protection_keys and pkey_sighandler_tests mm kselftests).
- Kevin
[1] https://lore.kernel.org/linux-arm-kernel/20240822151113.1479789-1-joey.goul…
[2] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@ora…
[3] https://lore.kernel.org/lkml/CABi2SkWxNkP2O7ipkP67WKz0-LV33e5brReevTTtba6oK…
[4] https://lore.kernel.org/linux-arm-kernel/20241015114116.GA19334@willie-the-…
Cc: akpm(a)linux-foundation.org
Cc: anshuman.khandual(a)arm.com
Cc: aruna.ramakrishna(a)oracle.com
Cc: broonie(a)kernel.org
Cc: catalin.marinas(a)arm.com
Cc: dave.hansen(a)linux.intel.com
Cc: dave.martin(a)arm.com
Cc: jeffxu(a)chromium.org
Cc: joey.gouly(a)arm.com
Cc: shuah(a)kernel.org
Cc: will(a)kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: x86(a)kernel.org
Kevin Brodsky (5):
arm64: signal: Remove unused macro
arm64: signal: Remove unnecessary check when saving POE state
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
selftests/mm: Use generic pkey register manipulation
selftests/mm: Enable pkey_sighandler_tests on arm64
arch/arm64/kernel/signal.c | 92 +++++++++++++---
tools/testing/selftests/mm/Makefile | 8 +-
tools/testing/selftests/mm/pkey-arm64.h | 1 +
tools/testing/selftests/mm/pkey-x86.h | 2 +
.../selftests/mm/pkey_sighandler_tests.c | 101 +++++++++++++-----
5 files changed, 159 insertions(+), 45 deletions(-)
--
2.43.0
Commit 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX
instructions") unconditionally added -march=x86-64-v2 to the CFLAGS used
to build the KVM selftests which does not work on non-x86 architectures:
cc1: error: unknown value ‘x86-64-v2’ for ‘-march’
Fix this by making the addition of this x86 specific command line flag
conditional on building for x86.
Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/kvm/Makefile | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e6b7e01d57080b304b21120f0d47bda260ba6c43..156fbfae940feac649f933dc6e048a2e2926542a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -244,11 +244,13 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
-I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \
-I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \
- -march=x86-64-v2 \
$(KHDR_INCLUDES)
ifeq ($(ARCH),s390)
CFLAGS += -march=z10
endif
+ifeq ($(ARCH),x86)
+ CFLAGS += -march=x86-64-v2
+endif
ifeq ($(ARCH),arm64)
tools_dir := $(top_srcdir)/tools
arm64_tools_dir := $(tools_dir)/arch/arm64/tools/
---
base-commit: d129377639907fce7e0a27990e590e4661d3ee02
change-id: 20241021-kvm-build-break-495abedc51e0
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Recently, a defer helper was added to Python selftests. The idea is to keep
cleanup commands close to their dirtying counterparts, thereby making it
more transparent what is cleaning up what, making it harder to miss a
cleanup, and make the whole cleanup business exception safe. All these
benefits are applicable to bash as well, exception safety can be
interpreted in terms of safety vs. a SIGINT.
This patchset therefore introduces a framework of several helpers that
serve to schedule cleanups in bash selftests.
- Patch #1 has more details about the primitives being introduced.
Patch #2 adds a fallback cleanup() function to lib.sh, because ideally
selftests wouldn't need to introduce a dedicated cleanup function at all.
- Patch #3 adds a parameter to stop_traffic(), which makes it possible to
start other background processes after the traffic is started without
confusing the cleanup.
- Patches #4 to #10 convert a number of selftests.
The goal was to convert all tests that use start_traffic / stop_traffic
to the defer framework. Leftover traffic generators are a particularly
painful sort of a missed cleanup. Normal unfinished cleanups can usually
be cleaned up simply by rerunning the test and interrupting it early to
let the cleanups run again / in full. This does not work with
stop_traffic, because it is only issued at the end of the test case that
starts the traffic. At the same time, leftover traffic generators
influence follow-up test runs, and are hard to notice.
The tests were however converted whole-sale, not just their traffic bits.
Thus they form a proof of concept of the defer framework.
v2:
- Patch #1:
- In __defer__schedule(), use ndefers in place of
${__DEFER__NJOBS[$ndefers_key]}
- Patch #4:
- Defer stop_traffic including the sleep. The sleep is actually
necessary and v1 was wrong in that it had the sleep prior to the
stop_traffic invocation.
v1 (from the RFC):
- Patch #1:
- Added the priority defer track
- Dropped defer_scoped_fn, added in_defer_scope
- Extracted to a separate independent module
- Patch #2:
- Moved this bit to a separate patch
- Patch #3:
- New patch
- Patch #4 (RED):
- Squashed the individual RED-related patches into one
- Converted the SW datapath RED selftest as well
- Patch #5 (TBF):
- Fully converted the selftest, not just stop_traffic
- Patches #6, #7, #8, #9, #10:
- New patch
Petr Machata (10):
selftests: net: lib: Introduce deferred commands
selftests: forwarding: Add a fallback cleanup()
selftests: forwarding: lib: Allow passing PID to stop_traffic()
selftests: RED: Use defer for test cleanup
selftests: TBF: Use defer for test cleanup
selftests: ETS: Use defer for test cleanup
selftests: mlxsw: qos_mc_aware: Use defer for test cleanup
selftests: mlxsw: qos_ets_strict: Use defer for test cleanup
selftests: mlxsw: qos_max_descriptors: Use defer for test cleanup
selftests: mlxsw: devlink_trap_police: Use defer for test cleanup
.../drivers/net/mlxsw/devlink_trap_policer.sh | 85 ++++----
.../drivers/net/mlxsw/qos_ets_strict.sh | 167 ++++++++--------
.../drivers/net/mlxsw/qos_max_descriptors.sh | 118 ++++-------
.../drivers/net/mlxsw/qos_mc_aware.sh | 146 +++++++-------
.../selftests/drivers/net/mlxsw/sch_ets.sh | 26 ++-
.../drivers/net/mlxsw/sch_red_core.sh | 185 +++++++++---------
.../drivers/net/mlxsw/sch_red_ets.sh | 24 +--
.../drivers/net/mlxsw/sch_red_root.sh | 18 +-
tools/testing/selftests/net/forwarding/lib.sh | 13 +-
.../selftests/net/forwarding/sch_ets.sh | 7 +-
.../selftests/net/forwarding/sch_ets_core.sh | 81 +++-----
.../selftests/net/forwarding/sch_ets_tests.sh | 14 +-
.../selftests/net/forwarding/sch_red.sh | 103 ++++------
.../selftests/net/forwarding/sch_tbf_core.sh | 91 +++------
.../net/forwarding/sch_tbf_etsprio.sh | 7 +-
.../selftests/net/forwarding/sch_tbf_root.sh | 3 +-
tools/testing/selftests/net/lib.sh | 3 +
tools/testing/selftests/net/lib/Makefile | 2 +-
tools/testing/selftests/net/lib/sh/defer.sh | 115 +++++++++++
19 files changed, 595 insertions(+), 613 deletions(-)
create mode 100644 tools/testing/selftests/net/lib/sh/defer.sh
--
2.45.0
Currently fp-stress does not report a top level test result if it runs to
completion, it always exits with a return code 0. Use the ksft_finished()
helper to ensure that the exit code for the top level program reports a
failure if any of the individual tests has failed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/fp-stress.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c
index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..e62c9dbad5010234d70b477cf8c52ba0b312910e 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -651,7 +651,5 @@ int main(int argc, char **argv)
drain_output(true);
- ksft_print_cnts();
-
- return 0;
+ ksft_finished();
}
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-arm64-fp-stress-exit-code-90fe21dc4bc3
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
https://bugzilla.kernel.org/attachment.cgi?id=306250
Testing Done:
- Added a selftest to test the Idle HLT intercept functionality.
- Compile and functionality testing for the Idle HLT intercept selftest
are only done for x86_64.
- Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
a way that each architecture can share the generic types and also
provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
KVM: selftests: Add safe_halt() and cli() helpers to common code
KVM: selftests: Add an interface to read the data of named vcpu stat
KVM: selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 11 ++-
tools/testing/selftests/kvm/Makefile | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 44 +++++++++
.../kvm/include/x86_64/kvm_util_arch.h | 40 +++++++++
.../selftests/kvm/include/x86_64/processor.h | 18 ++++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 +++++++
.../selftests/kvm/x86_64/svm_idle_hlt_test.c | 89 +++++++++++++++++++
10 files changed, 236 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c
base-commit: d91a9cc16417b8247213a0144a1f0fd61dc855dd
--
2.34.1
This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
parent HWPT typically hold one stage-2 IO pagetable and tag it with only
one ID in the cache entries. When sharing one large stage-2 IO pagetable
across physical IOMMU instances, that one ID may not always be available
across all the IOMMU instances. In other word, it's ideal for SW to have
a different container for the stage-2 IO pagetable so it can hold another
ID that's available.
For this "different container", add vIOMMU, an additional layer to hold
extra virtualization information:
_______________________________________________________________________
| iommufd (with vIOMMU) |
| |
| [5] |
| _____________ |
| | | |
| [1] | vIOMMU | [4] [2] |
| ________________ | | _____________ ________ |
| | | | [3] | | | | | |
| | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
| |________________| |_____________| |_____________| |________| |
| | | | | |
|_________|____________________|__________________|_______________|_____|
| | | |
| ______v_____ ______v_____ ___v__
| PFN storage | (paging) | | (nested) | |struct|
|------------>|iommu_domain|<----|iommu_domain|<----|device|
|____________| |____________| |______|
The vIOMMU object should be seen as a slice of a physical IOMMU instance
that is passed to or shared with a VM. That can be some HW/SW resources:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting
On a multi-IOMMU system, the vIOMMU object must be instanced to the number
of the physical IOMMUs that are passed to (via devices) a guest VM, while
being able to hold the shareable parent HWPT. Each vIOMMU then just needs
to allocate its own individual ID to tag its own cache:
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested0 |--->| viommu0 ------------------
---------------- | | IDx |
----------------------------
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested1 |--->| viommu1 ------------------
---------------- | | IDy |
----------------------------
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation
only. Later series will add more data structures and their ioctls.
As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT
type for a core-allocated-core-managed vIOMMU object, allowing drivers to
simply hook a default viommu ops for viommu-based invalidation alone. And
add support for driver-specific type of vIOMMU allocation, and implement
that in the ARM SMMUv3 driver for a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up
series to support vDEVICE, vIRQ (vEVENT) and VQUEUE objects. Although we
repurposed the vIOMMU object from an earlier RFC, just for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v3
(paring QEMU branch for testing will be provided with the part2 series)
Changelog
v3
* Rebased on top of Jason's nesting v3 series
https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvid…
* Split the series into smaller parts
* Added Jason's Reviewed-by
* Added back viommu->iommu_dev
* Added support for driver-allocated vIOMMU v.s. core-allocated
* Dropped arm_smmu_cache_invalidate_user
* Added an iommufd_test_wait_for_users() in selftest
* Reworked test code to make viommu an individual FIXTURE
* Added missing TEST_LENGTH case for the new ioctl command
v2
https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
* Limited vdev_id to one per idev
* Added a rw_sem to protect the vdev_id list
* Reworked driver-level APIs with proper lockings
* Added a new viommu_api file for IOMMUFD_DRIVER config
* Dropped useless iommu_dev point from the viommu structure
* Added missing index numnbers to new types in the uAPI header
* Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
* Reworked mock_viommu_cache_invalidate() using the new iommu helper
* Reordered details of set/unset_vdev_id handlers for proper lockings
v1
https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Nicolin Chen (11):
iommufd: Move struct iommufd_object to public iommufd header
iommufd: Rename _iommufd_object_alloc to iommufd_object_alloc_elm
iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
iommu: Pass in a viommu pointer to domain_alloc_user op
iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
iommufd/selftest: Add refcount to mock_iommu_device
iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
Documentation: userspace-api: iommufd: Update vIOMMU
iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 18 ++++
drivers/iommu/iommufd/iommufd_private.h | 23 ++---
drivers/iommu/iommufd/iommufd_test.h | 2 +
include/linux/iommu.h | 15 +++
include/linux/iommufd.h | 52 +++++++++++
include/uapi/linux/iommufd.h | 54 +++++++++--
tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++
drivers/iommu/amd/iommu.c | 1 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 24 +++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +
drivers/iommu/intel/iommu.c | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 27 +++++-
drivers/iommu/iommufd/main.c | 38 ++------
drivers/iommu/iommufd/selftest.c | 79 ++++++++++++++--
drivers/iommu/iommufd/viommu.c | 91 +++++++++++++++++++
drivers/iommu/iommufd/viommu_api.c | 57 ++++++++++++
tools/testing/selftests/iommu/iommufd.c | 84 +++++++++++++++++
Documentation/userspace-api/iommufd.rst | 66 +++++++++++++-
19 files changed, 602 insertions(+), 65 deletions(-)
create mode 100644 drivers/iommu/iommufd/viommu.c
create mode 100644 drivers/iommu/iommufd/viommu_api.c
--
2.43.0
This patch series migrates test cases out of test_sock.c to
prog_tests-style tests. It moves all BPF_CGROUP_INET4_POST_BIND and
BPF_CGROUP_INET6_POST_BIND test cases into a new prog_test,
sock_post_bind.c, while reimplementing all LOAD_REJECT test cases as
verifier tests in progs/verifier_sock.c. Finally, it moves remaining
BPF_CGROUP_INET_SOCK_CREATE test coverage into prog_tests/sock_create.c
before retiring test_sock.c completely.
Jordan Rife (4):
selftests/bpf: Migrate *_POST_BIND test cases to prog_tests
selftests/bpf: Migrate LOAD_REJECT test cases to prog_tests
selftests/bpf: Migrate BPF_CGROUP_INET_SOCK_CREATE test cases to
prog_tests
selftests/bpf: Retire test_sock.c
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/prog_tests/sock_create.c | 35 ++-
.../sock_post_bind.c} | 251 ++++--------------
.../selftests/bpf/progs/verifier_sock.c | 60 +++++
5 files changed, 142 insertions(+), 208 deletions(-)
rename tools/testing/selftests/bpf/{test_sock.c => prog_tests/sock_post_bind.c} (64%)
--
2.47.0.rc1.288.g06298d1525-goog
Hello,
this series aims to bring test_tcp_check_syncookie.sh scope into
test_progs to make sure that the corresponding tests are also run
automatically in CI. This script tests for bpf_tcp_{gen,check}_syncookie
and bpf_skc_lookup_tcp, in different contexts (ipv4, v6 or dual, and
with tc and xdp programs).
Some other tests like btf_skc_cls_ingress have some overlapping tests with
test_tcp_check_syncookie.sh, so this series moves the missing bits from
test_tcp_check_syncookie.sh into btf_skc_cls_ingress, which is already
integrated into test_progs.
- the first three commits bring some minor improvements to
btf_skc_cls_ingress without changing its testing scope
- fourth and fifth commits bring test_tcp_check_syncookie.sh features
into btf_skc_cls_ingress
- last commit removes test_tcp_check_syncookie.sh
The only topic for which I am not sure for this integration is the
necessity or not to run the tests with different program types:
test_tcp_check_syncookie.sh runs tests with both tc and xdp programs, but
btf_skc_cls_ingress currently tests those helpers only with a tc
program. Would it make sense to also make sure that btf_skc_cls_ingress
is tested with all the programs types supported by those helpers ?
The series has been tested both in CI and in a local x86_64 qemu
environment:
# ./test_progs -a btf_skc_cls_ingress
#38/1 btf_skc_cls_ingress/conn_ipv4:OK
#38/2 btf_skc_cls_ingress/conn_ipv6:OK
#38/3 btf_skc_cls_ingress/conn_dual:OK
#38/4 btf_skc_cls_ingress/syncookie_ipv4:OK
#38/5 btf_skc_cls_ingress/syncookie_ipv6:OK
#38/6 btf_skc_cls_ingress/syncookie_dual:OK
#38 btf_skc_cls_ingress:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Changes in v2:
- fix initial test author mail in Cc
- Fix default cases in switches: indent, action
- remove unneeded initializer
- remove duplicate interface bring-up
- remove unnecessary check and return in bpf program
- Link to v1: https://lore.kernel.org/r/20241016-syncookie-v1-0-3b7a0de12153@bootlin.com
---
Alexis Lothoré (eBPF Foundation) (6):
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: remove test_tcp_check_syncookie
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 9 +-
.../selftests/bpf/prog_tests/btf_skc_cls_ingress.c | 264 +++++++++++++--------
.../selftests/bpf/progs/test_btf_skc_cls_ingress.c | 82 ++++---
.../bpf/progs/test_tcp_check_syncookie_kern.c | 167 -------------
.../selftests/bpf/test_tcp_check_syncookie.sh | 85 -------
.../selftests/bpf/test_tcp_check_syncookie_user.c | 213 -----------------
7 files changed, 217 insertions(+), 604 deletions(-)
---
base-commit: 030207b7fce8bad6827615cfc2c6592916e2c336
change-id: 20241015-syncookie-ea7686264586
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Hi,
The asm-generic/unistd.h file has wrong __NR_userfaultfd syscall number which
doesn't even depend on the architecture. This has caused failure of a selftest
which was fixed recently [1].
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define __NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define __NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define __NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define __NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define __NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that it
is different for different arch:
x86 374
x86_64 323
ARM 347/358
It seems include/uapi/asm-generic/unistd has wrong 282 value in it. Maybe I'm
missing some context.. Please have a look at it.
The __NR_userfaultfd was added to include/uapi/asm-generic/unistd.h in
09f7298100ea ("Subject: [PATCH] userfaultfd: register uapi generic syscall (aarch64)").
[1] https://lore.kernel.org/all/20240912103151.1520254-1-usama.anjum@collabora.…
--
BR,
/Muhammad Usama Anjum
From: Jeff Xu <jeffxu(a)chromium.org>
This series increase the test coverage of mseal_test by:
Add check for vma_size, prot, and error code for existing tests.
Add more testcases for madvise, munmap, mmap and mremap to cover
sealing in different scenarios.
The increase test coverage hopefully help to prevent future regression.
It doesn't change any existing mm api's semantics, i.e. it will pass on
linux main and 6.10 branch.
Note: in order to pass this test in mm-unstable, mm-unstable must have
Liam's fix on mmap [1]
[1] https://lore.kernel.org/linux-kselftest/vyllxuh5xbqmaoyl2mselebij5ox7cseekj…
History:
V3:
- no-functional change, incooperate feedback from Pedro Falcato
V2:
- https://lore.kernel.org/linux-kselftest/20240829214352.963001-1-jeffxu@chro…
- remove the mmap fix (Liam R. Howlett will fix it separately)
- Add cover letter (Lorenzo Stoakes)
- split the testcase for ease of review (Mark Brown)
V1:
- https://lore.kernel.org/linux-kselftest/20240828225522.684774-1-jeffxu@chro…
Jeff Xu (5):
selftests/mseal_test: Check vma_size, prot, error code.
selftests/mseal: add sealed madvise type
selftests/mseal: munmap across multiple vma ranges.
selftests/mseal: add more tests for mmap
selftests/mseal: add more tests for mremap
tools/testing/selftests/mm/mseal_test.c | 830 ++++++++++++++++++++++--
1 file changed, 763 insertions(+), 67 deletions(-)
--
2.46.0.469.g59c65b2a67-goog
From: Jason Xing <kernelxing(a)tencent.com>
When I compiled the tools/testing/selftests/bpf, the following error
pops out:
uprobe_multi.c: In function ‘trigger_uprobe’:
uprobe_multi.c:109:26: error: ‘MADV_PAGEOUT’ undeclared (first use in this function); did you mean ‘MADV_RANDOM’?
madvise(addr, page_sz, MADV_PAGEOUT);
^~~~~~~~~~~~
MADV_RANDOM
Including the <linux/linux/mman.h> header file solves this compilation error.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
v2
Link: https://lore.kernel.org/all/20241020031422.46894-1-kerneljasonxing@gmail.co…
1. handle it in a proper way
---
tools/testing/selftests/bpf/uprobe_multi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
index c7828b13e5ff..40231f02b95d 100644
--- a/tools/testing/selftests/bpf/uprobe_multi.c
+++ b/tools/testing/selftests/bpf/uprobe_multi.c
@@ -4,6 +4,7 @@
#include <string.h>
#include <stdbool.h>
#include <stdint.h>
+#include <linux/mman.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sdt.h>
--
2.37.3
From: Jason Xing <kernelxing(a)tencent.com>
When I compiled the tools/testing/selftests/bpf, the following error
pops out:
uprobe_multi.c: In function ‘trigger_uprobe’:
uprobe_multi.c:109:26: error: ‘MADV_PAGEOUT’ undeclared (first use in this function); did you mean ‘MADV_RANDOM’?
madvise(addr, page_sz, MADV_PAGEOUT);
^~~~~~~~~~~~
MADV_RANDOM
We can see MADV_PAGEOUT existing in mman-common.h on x86 arch, so
including this header file solves this compilation error.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
tools/testing/selftests/bpf/uprobe_multi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
index c7828b13e5ff..b0e11ffe0e1c 100644
--- a/tools/testing/selftests/bpf/uprobe_multi.c
+++ b/tools/testing/selftests/bpf/uprobe_multi.c
@@ -5,6 +5,7 @@
#include <stdbool.h>
#include <stdint.h>
#include <sys/mman.h>
+#include <mman-common.h>
#include <unistd.h>
#include <sdt.h>
--
2.37.3
The PSCI v1.3 spec (https://developer.arm.com/documentation/den0022)
adds support for a SYSTEM_OFF2 function enabling a HIBERNATE_OFF state
which is analogous to ACPI S4. This will allow hosting environments to
determine that a guest is hibernated rather than just powered off, and
ensure that they preserve the virtual environment appropriately to
allow the guest to resume safely (or bump the hardware_signature in the
FACS to trigger a clean reboot instead).
This updates KVM to support advertising PSCI v1.3, and unconditionally
enables the SYSTEM_OFF2 support when PSCI v1.3 is enabled.
For the guest side, add a new SYS_OFF_MODE_POWER_OFF handler with higher
priority than the EFI one, but which *only* triggers when there's a
hibernation in progress. There are other ways to do this (see the commit
message for more details) but this seemed like the simplest.
Version 2 of the patch series splits out the psci.h definitions into a
separate commit (a dependency for both the guest and KVM side), and adds
definitions for the other new functions added in v1.3. It also moves the
pKVM psci-relay support to a separate commit; although in arch/arm64/kvm
that's actually about the *guest* side of SYSTEM_OFF2 (i.e. using it
from the host kernel, relayed through nVHE).
Version 3 dropped the KVM_CAP which allowed userspace to explicitly opt
in to the new feature like with SYSTEM_SUSPEND, and makes it depend only
on PSCI v1.3 being exposed to the guest.
Version 4 is no longer RFC, as the PSCI v1.3 spec is finally published.
Minor fixes from the last round of review, and an added KVM self test.
Version 5 drops some of the changes which didn't make it to the final
v1.3 spec, and cleans up a couple of places which still referred to it
as 'alpha' or 'beta'. It also temporarily drops the guest-side patch to
invoke SYSTEM_OFF2 for hibernation, pending confirmation that the final
PSCI v1.3 spec just has a typo where it changed to saying that 0x1
should be passed to mean HIBERNATE_OFF, even though it's advertised as
bit 0. That can be sent under separate cover, and perhaps should have
been anyway. The change in question doesn't matter for any of the KVM
patches, because we just treat SYSTEM_OFF2 like the existing
SYSTEM_RESET2, setting a flag to indicate that it was a SYSTEM_OFF2
call, but not actually caring about the argument; that's for userspace
to worry about.
David Woodhouse (5):
firmware/psci: Add definitions for PSCI v1.3 specification
KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
KVM: arm64: Add support for PSCI v1.2 and v1.3
KVM: selftests: Add test for PSCI SYSTEM_OFF2
KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
Documentation/virt/kvm/api.rst | 11 +++++
arch/arm64/include/uapi/asm/kvm.h | 6 +++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 +
arch/arm64/kvm/hypercalls.c | 2 +
arch/arm64/kvm/psci.c | 43 ++++++++++++++++-
include/kvm/arm_psci.h | 4 +-
include/uapi/linux/psci.h | 5 ++
tools/testing/selftests/kvm/aarch64/psci_test.c | 61 +++++++++++++++++++++++++
8 files changed, 132 insertions(+), 2 deletions(-)
Hello,
this series aims to bring test_tcp_check_syncookie.sh scope into
test_progs to make sure that the corresponding tests are also run
automatically in CI. This script tests for bpf_tcp_{gen,check}_syncookie
and bpf_skc_lookup_tcp, in different contexts (ipv4, v6 or dual, and
with tc and xdp programs).
Some other tests like btf_skc_cls_ingress have some overlapping tests with
test_tcp_check_syncookie.sh, so this series moves the missing bits from
test_tcp_check_syncookie.sh into btf_skc_cls_ingress, which is already
integrated into test_progs.
- the first three commits bring some minor improvements to
btf_skc_cls_ingress without changing its testing scope
- fourth and fifth commits bring test_tcp_check_syncookie.sh features
into btf_skc_cls_ingress
- last commit removes test_tcp_check_syncookie.sh
The only topic for which I am not sure for this integration is the
necessity or not to run the tests with different program types:
test_tcp_check_syncookie.sh runs tests with both tc and xdp programs, but
btf_skc_cls_ingress currently tests those helpers only with a tc
program. Would it make sense to also make sure that btf_skc_cls_ingress
is tested with all the programs types supported by those helpers ?
The series has been tested both in CI and in a local x86_64 qemu
environment:
# ./test_progs -a btf_skc_cls_ingress
#38/1 btf_skc_cls_ingress/conn_ipv4:OK
#38/2 btf_skc_cls_ingress/conn_ipv6:OK
#38/3 btf_skc_cls_ingress/conn_dual:OK
#38/4 btf_skc_cls_ingress/syncookie_ipv4:OK
#38/5 btf_skc_cls_ingress/syncookie_ipv6:OK
#38/6 btf_skc_cls_ingress/syncookie_dual:OK
#38 btf_skc_cls_ingress:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (6):
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: remove test_tcp_check_syncookie
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 9 +-
.../selftests/bpf/prog_tests/btf_skc_cls_ingress.c | 265 +++++++++++++--------
.../selftests/bpf/progs/test_btf_skc_cls_ingress.c | 83 +++++--
.../bpf/progs/test_tcp_check_syncookie_kern.c | 167 -------------
.../selftests/bpf/test_tcp_check_syncookie.sh | 85 -------
.../selftests/bpf/test_tcp_check_syncookie_user.c | 213 -----------------
7 files changed, 222 insertions(+), 601 deletions(-)
---
base-commit: 030207b7fce8bad6827615cfc2c6592916e2c336
change-id: 20241015-syncookie-ea7686264586
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Hi Linus,
Please pull the following kselftest fixes update for Linux 6.12-rc4.
-- fixes test makefile to install tests directory without which
the test fails with errors.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4ee5ca9a29384fcf3f18232fdf8474166dea8dca:
ftrace/selftest: Test combination of function_graph tracer and function profiler
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.12-rc4
for you to fetch changes up to fe05c40ca9c18cfdb003f639a30fc78a7ab49519:
selftest: hid: add the missing tests directory (2024-10-16 15:55:14 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.12-rc4
kselftest fixes for Linux 6.12-rc4
-- fixes test makefile to install tests directory without which
the test fails with errors.
----------------------------------------------------------------
Yun Lu (1):
selftest: hid: add the missing tests directory
tools/testing/selftests/hid/Makefile | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suze.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (4):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 ++++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++--
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1168 ++++++++++++++++++++++
18 files changed, 1564 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.46.2
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
The previous patches incorrectly assumed that the parent would
always initialize the pthread_barrier for the child thread. This
reverts the change and replaces the fix for wp-fork-with-event with the
original use of atomic_bool.
Edward Liaw (3):
Revert "selftests/mm: fix deadlock for fork after pthread_create on
ARM"
Revert "selftests/mm: replace atomic_bool with pthread_barrier_t"
selftests/mm: fix deadlock for fork after pthread_create with
atomic_bool
tools/testing/selftests/mm/uffd-common.c | 5 ++--
tools/testing/selftests/mm/uffd-common.h | 3 ++-
tools/testing/selftests/mm/uffd-unit-tests.c | 24 ++++++++------------
3 files changed, 14 insertions(+), 18 deletions(-)
--
2.47.0.105.g07ac214952-goog
Changes since V2:
- V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.chatre@intel.com/
- Add fix to protect against buffer overflow when parsing text from sysfs files.
- Add cleanup patch to address use of magic constants as pointed out by
Ilpo.
- Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache
size to determine "fill_buf" buffer size" that changed too much since
receiving the Reviewed-by tag.
- Please see individual patches for detailed changes.
Changes since V1:
- V1: https://lore.kernel.org/cover.1724970211.git.reinette.chatre@intel.com/
- V2 contains the same general solutions to stated problem as V1 but these
are now preceded by more fixes (patches 1 to 5) and improved robustness
(patches 6 to 9) to existing tests before the series gets back
to solving the original problem with more confidence in patches 10 to 13.
- The posibility of making "memflush = false" for CMT test was discussed
during V1. Modifying this setting does not have a significant impact on the
observed results that are already well within acceptable range and this
version thus keeps original default. If performance was a goal it may
be possible to do further experimentation where "memflush = false" could
eliminate the need for the sleep(1) within the test wrapper, but
improving the performance is not a goal of this work.
- (New) Support what seems to be unintended ability for user space to provide
parameters to "fill_buf" by making the parsing robust and only support
changing parameters that are supported to be changed. Drop support for
"write" operation since it has never been measured.
- (New) Improve wraparound handling. (Ilpo)
- (New) A couple of new fixes addressing issues discovered during development.
- (Change from V1) To support fill_buf parameters provided by user space as
well as test specific fill_buf parameters struct fill_buf_param is no longer
just a member of struct resctrl_val_param, instead there could be at most
two instances of struct fill_buf_param, the immutable parameters provided
by user space and the parameters used by individual tests. (Ilpo)
- Please see individual patches for detailed changes.
V1 cover:
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
MBA and MBM selftests measure memory traffic for which a hardcoded
250MB buffer has been sufficient so far. On platforms with L3 cache
larger than the buffer, the buffer fits in the L3 cache and thus
no/very little memory traffic is generated during the "memory
bandwidth" tests.
2) Some platform features, for example RAS features or memory
performance features that generate memory traffic may drive accesses
that are counted differently by performance counters and MBM
respectively, for instance generating "overhead" traffic which is not
counted against any specific RMID. Until now these counting
differences have always been "in the noise". On Emerald Rapids
systems the maximum MBA throttling (10% memory bandwidth)
throttles memory bandwidth to where memory accesses by these other
platform features push the memory bandwidth difference between
memory controller performance counters and resctrl (MBM) beyond the
tests' hardcoded tolerance.
Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
be used to measurement. Rework the tests to ensure that buffer
initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
bandwidth. The value of "low" is hardcoded to 750MiB based on
measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
systems. This limit is not applicable to AMD systems since it
only applies to the MBA and MBM tests that are isolated to Intel.
[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plat…
Reinette Chatre (15):
selftests/resctrl: Make functions only used in same file static
selftests/resctrl: Print accurate buffer size as part of MBM results
selftests/resctrl: Fix memory overflow due to unhandled wraparound
selftests/resctrl: Protect against array overrun during iMC config
parsing
selftests/resctrl: Protect against array overflow when reading strings
selftests/resctrl: Make wraparound handling obvious
selftests/resctrl: Remove "once" parameter required to be false
selftests/resctrl: Only support measured read operation
selftests/resctrl: Remove unused measurement code
selftests/resctrl: Make benchmark parameter passing robust
selftests/resctrl: Ensure measurements skip initialization of default
benchmark
selftests/resctrl: Use cache size to determine "fill_buf" buffer size
selftests/resctrl: Do not compare performance counters and resctrl at
low bandwidth
selftests/resctrl: Keep results from first test run
selftests/resctrl: Replace magic constants used as array size
tools/testing/selftests/resctrl/cmt_test.c | 37 +-
tools/testing/selftests/resctrl/fill_buf.c | 45 +-
tools/testing/selftests/resctrl/mba_test.c | 54 ++-
tools/testing/selftests/resctrl/mbm_test.c | 37 +-
tools/testing/selftests/resctrl/resctrl.h | 79 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 95 +++-
tools/testing/selftests/resctrl/resctrl_val.c | 447 +++++-------------
tools/testing/selftests/resctrl/resctrlfs.c | 19 +-
8 files changed, 354 insertions(+), 459 deletions(-)
--
2.46.2
On Android arm, pthread_create followed by a fork caused a deadlock in
the case where the fork required work to be completed by the created
thread.
Updated the synchronization primitive to use pthread_barrier instead of
atomic_bool.
Applied the same fix to the wp-fork-with-event test.
Edward Liaw (2):
selftests/mm: replace atomic_bool with pthread_barrier_t
selftests/mm: fix deadlock for fork after pthread_create on ARM
tools/testing/selftests/mm/uffd-common.c | 5 +++--
tools/testing/selftests/mm/uffd-common.h | 3 +--
tools/testing/selftests/mm/uffd-unit-tests.c | 21 ++++++++++++++------
3 files changed, 19 insertions(+), 10 deletions(-)
--
2.46.1.824.gd892dcdcdd-goog
If the kunit being run generates a WARN for some reason kunit.py ignores it
and declares the tested PASSED. This is very much not desirable, as tests that
are hitting WARN's are probably actually failing.
Take the simple approach to reducing this by setting panic_on_warn when
running the kernel. The kernel crashes and kunit.py shows the WARN and reports
the test fails.
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
tools/testing/kunit/kunit_kernel.py | 2 ++
1 file changed, 2 insertions(+)
I saw there was an earlier series working to make tests that deliberately made
WARNs not do that, so this would be consistent with that idea, tests should not
make WARNs, and WARNs should not be ignored..
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 61931c4926fd66..7a4228568dd73c 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -342,6 +342,8 @@ class LinuxSourceTree:
if filter_action:
args.append('kunit.filter_action=' + filter_action)
args.append('kunit.enable=1')
+ args.append('panic_on_warn=1')
+ args.append('panic=-1')
process = self._ops.start(args, build_dir)
assert process.stdout is not None # tell mypy it's set
base-commit: 2872987b1d009df556c0061ecdeede6a5f9bf42c
--
2.46.2
1. In order to make rtctest more explicit and robust, we propose to use
RTC_PARAM_GET ioctl interface to check rtc alarm feature state before
running alarm related tests.
2. The rtctest requires the read permission on /dev/rtc0. The rtctest will
be skipped if the /dev/rtc0 is not readable.
Joseph Jang (2):
selftest: rtc: Add to check rtc alarm status for alarm related test
selftest: rtc: Check if could access /dev/rtc0 before testing
tools/testing/selftests/rtc/Makefile | 2 +-
tools/testing/selftests/rtc/rtctest.c | 71 ++++++++++++++++++++++++++-
2 files changed, 71 insertions(+), 2 deletions(-)
--
2.34.1