Hi Reinette, Fenghua,
This series introduces a new mount option enabling an alternate mode for
MBM to work around an issue on present AMD implementations and any other
resctrl implementation where there are more RMIDs (or equivalent) than
hardware counters.
The L3 External Bandwidth Monitoring feature of the AMD PQoS
extension[1] only guarantees that RMIDs currently assigned to a
processor will be tracked by hardware. The counters of any other RMIDs
which are no longer being tracked will be reset to zero. The MBM event
counters return "Unavailable" to indicate when this has happened.
An interval for effectively measuring memory bandwidth typically needs
to be multiple seconds long. In Google's workloads, it is not feasible
to bound the number of jobs with different RMIDs which will run in a
cache domain over any period of time. Consequently, on a
fully-committed system where all RMIDs are allocated, few groups'
counters return non-zero values.
To demonstrate the underlying issue, the first patch provides a test
case in tools/testing/selftests/resctrl/test_rmids.sh.
On an AMD EPYC 7B12 64-Core Processor with the default behavior:
# ./test_rmids.sh
Created 255 monitoring groups.
g1: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g2: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g3: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
[..]
g238: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g239: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g240: mbm_total_bytes: Unavailable -> Unavailable (FAIL)
g241: mbm_total_bytes: Unavailable -> 660497472
g242: mbm_total_bytes: Unavailable -> 660793344
g243: mbm_total_bytes: Unavailable -> 660477312
g244: mbm_total_bytes: Unavailable -> 660495360
g245: mbm_total_bytes: Unavailable -> 660775360
g246: mbm_total_bytes: Unavailable -> 660645504
g247: mbm_total_bytes: Unavailable -> 660696128
g248: mbm_total_bytes: Unavailable -> 660605248
g249: mbm_total_bytes: Unavailable -> 660681280
g250: mbm_total_bytes: Unavailable -> 660834240
g251: mbm_total_bytes: Unavailable -> 660440064
g252: mbm_total_bytes: Unavailable -> 660501504
g253: mbm_total_bytes: Unavailable -> 660590720
g254: mbm_total_bytes: Unavailable -> 660548352
g255: mbm_total_bytes: Unavailable -> 660607296
255 groups, 0 returned counts in first pass, 15 in second
successfully measured bandwidth from 15/255 groups
To compare, here is the output from an Intel(R) Xeon(R) Platinum 8173M
CPU:
# ./test_rmids.sh
Created 223 monitoring groups.
g1: mbm_total_bytes: 0 -> 606126080
g2: mbm_total_bytes: 0 -> 613236736
g3: mbm_total_bytes: 0 -> 610254848
[..]
g221: mbm_total_bytes: 0 -> 584679424
g222: mbm_total_bytes: 0 -> 588808192
g223: mbm_total_bytes: 0 -> 587317248
223 groups, 223 returned counts in first pass, 223 in second
successfully measured bandwidth from 223/223 groups
To make better use of the hardware in such a use case, this patchset
introduces a "soft" RMID implementation, where each CPU is permanently
assigned a "hard" RMID. On context switches which change the current
soft RMID, the difference between each CPU's current event counts and
most recent counts is added to the totals for the current or outgoing
soft RMID.
This technique does not work for cache occupancy counters, so this patch
series disables cache occupancy events when soft RMIDs are enabled.
This series adds the "mbm_soft_rmid" mount option to allow users to
opt-in to the functionaltiy when they deem it helpful.
When the same system from the earlier AMD example enables the
mbm_soft_rmid mount option:
# ./test_rmids.sh
Created 255 monitoring groups.
g1: mbm_total_bytes: 0 -> 686560576
g2: mbm_total_bytes: 0 -> 668204416
[..]
g252: mbm_total_bytes: 0 -> 672651200
g253: mbm_total_bytes: 0 -> 666956800
g254: mbm_total_bytes: 0 -> 665917056
g255: mbm_total_bytes: 0 -> 671049600
255 groups, 255 returned counts in first pass, 255 in second
successfully measured bandwidth from 255/255 groups
(patches are based on tip/master)
[1] https://www.amd.com/system/files/TechDocs/56375_1.03_PUB.pdf
Peter Newman (8):
selftests/resctrl: Verify all RMIDs count together
x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events
x86/resctrl: Flush MBM event counts on soft RMID change
x86/resctrl: Call mon_event_count() directly for soft RMIDs
x86/resctrl: Create soft RMID version of __mon_event_count()
x86/resctrl: Assign HW RMIDs to CPUs for soft RMID
x86/resctrl: Use mbm_update() to push soft RMID counts
x86/resctrl: Add mount option to enable soft RMID
Stephane Eranian (1):
x86/resctrl: Hold a spinlock in __rmid_read() on AMD
arch/x86/include/asm/resctrl.h | 29 +++-
arch/x86/kernel/cpu/resctrl/core.c | 80 ++++++++-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 9 +-
arch/x86/kernel/cpu/resctrl/internal.h | 19 ++-
arch/x86/kernel/cpu/resctrl/monitor.c | 158 +++++++++++++++++-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++
tools/testing/selftests/resctrl/test_rmids.sh | 93 +++++++++++
7 files changed, 425 insertions(+), 15 deletions(-)
create mode 100755 tools/testing/selftests/resctrl/test_rmids.sh
base-commit: dd806e2f030e57dd5bac973372aa252b6c175b73
--
2.40.0.634.g4ca3ef3211-goog
In kunit_debugfs_create_suite() give up and skip creating the debugfs
file if any of the alloc_string_stream() calls return an error or NULL.
Only put a value in the log pointer of kunit_suite and kunit_test if it
is a valid pointer to a log.
This prevents the potential invalid dereference reported by smatch:
lib/kunit/debugfs.c:115 kunit_debugfs_create_suite() error: 'suite->log'
dereferencing possible ERR_PTR()
lib/kunit/debugfs.c:119 kunit_debugfs_create_suite() error: 'test_case->log'
dereferencing possible ERR_PTR()
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Fixes: 05e2006ce493 ("kunit: Use string_stream for test log")
---
lib/kunit/debugfs.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)
diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c
index 270d185737e6..9d167adfa746 100644
--- a/lib/kunit/debugfs.c
+++ b/lib/kunit/debugfs.c
@@ -109,14 +109,28 @@ static const struct file_operations debugfs_results_fops = {
void kunit_debugfs_create_suite(struct kunit_suite *suite)
{
struct kunit_case *test_case;
+ struct string_stream *stream;
- /* Allocate logs before creating debugfs representation. */
- suite->log = alloc_string_stream(GFP_KERNEL);
- string_stream_set_append_newlines(suite->log, true);
+ /*
+ * Allocate logs before creating debugfs representation.
+ * The suite->log and test_case->log pointer are expected to be NULL
+ * if there isn't a log, so only set it if the log stream was created
+ * successfully.
+ */
+ stream = alloc_string_stream(GFP_KERNEL);
+ if (IS_ERR_OR_NULL(stream))
+ return;
+
+ string_stream_set_append_newlines(stream, true);
+ suite->log = stream;
kunit_suite_for_each_test_case(suite, test_case) {
- test_case->log = alloc_string_stream(GFP_KERNEL);
- string_stream_set_append_newlines(test_case->log, true);
+ stream = alloc_string_stream(GFP_KERNEL);
+ if (IS_ERR_OR_NULL(stream))
+ goto err;
+
+ string_stream_set_append_newlines(stream, true);
+ test_case->log = stream;
}
suite->debugfs = debugfs_create_dir(suite->name, debugfs_rootdir);
@@ -124,6 +138,12 @@ void kunit_debugfs_create_suite(struct kunit_suite *suite)
debugfs_create_file(KUNIT_DEBUGFS_RESULTS, S_IFREG | 0444,
suite->debugfs,
suite, &debugfs_results_fops);
+ return;
+
+err:
+ string_stream_destroy(suite->log);
+ kunit_suite_for_each_test_case(suite, test_case)
+ string_stream_destroy(test_case->log);
}
void kunit_debugfs_destroy_suite(struct kunit_suite *suite)
--
2.30.2
Check the stream pointer passed to string_stream_destroy() for
IS_ERR_OR_NULL() instead of only NULL.
Whatever alloc_string_stream() returns should be safe to pass
to string_stream_destroy(), and that will be an ERR_PTR.
It's obviously good practise and generally helpful to also check
for NULL pointers so that client cleanup code can call
string_stream_destroy() unconditionally - which could include
pointers that have never been set to anything and so are NULL.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
---
lib/kunit/string-stream.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/string-stream.c b/lib/kunit/string-stream.c
index a6f3616c2048..54f4fdcbfac8 100644
--- a/lib/kunit/string-stream.c
+++ b/lib/kunit/string-stream.c
@@ -173,7 +173,7 @@ void string_stream_destroy(struct string_stream *stream)
{
KUNIT_STATIC_STUB_REDIRECT(string_stream_destroy, stream);
- if (!stream)
+ if (IS_ERR_OR_NULL(stream))
return;
string_stream_clear(stream);
--
2.30.2
Change namespace creation for root and non-root
user differently in create_and_enter_ns() function
Test result with root user:
$sudo make TARGETS="capabilities" kselftest
...
TAP version 13
1..1
timeout set to 45
selftests: capabilities: test_execve
TAP version 13
1..12
[RUN] +++ Tests with uid == 0 +++
[NOTE] Using global UIDs for tests
[RUN] Root => ep
...
ok 12 Passed
Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
==================================================
TAP version 13
1..9
[RUN] +++ Tests with uid != 0 +++
[NOTE] Using global UIDs for tests
[RUN] Non-root => no caps
...
ok 9 Passed
Totals: pass:9 fail:0 xfail:0 xpass:0 skip:0 error:0
Test result without root or normal user:
$make TARGETS="capabilities" kselftest
...
timeout set to 45
selftests: capabilities: test_execve
TAP version 13
1..12
[RUN] +++ Tests with uid == 0 +++
[NOTE] Using a user namespace for tests
[RUN] Root => ep
validate_cap:: Capabilities after execve were correct
ok 1 Passed
Check cap_ambient manipulation rules
ok 2 PR_CAP_AMBIENT_RAISE failed on non-inheritable cap
ok 3 PR_CAP_AMBIENT_RAISE failed on non-permitted cap
ok 4 PR_CAP_AMBIENT_RAISE worked
ok 5 Basic manipulation appears to work
[RUN] Root +i => eip
validate_cap:: Capabilities after execve were correct
ok 6 Passed
[RUN] UID 0 +ia => eipa
validate_cap:: Capabilities after execve were correct
ok 7 Passed
ok 8 # SKIP SUID/SGID tests (needs privilege)
Planned tests != run tests (12 != 8)
Totals: pass:7 fail:0 xfail:0 xpass:0 skip:1 error:0
==================================================
TAP version 13
1..9
[RUN] +++ Tests with uid != 0 +++
[NOTE] Using a user namespace for tests
[RUN] Non-root => no caps
validate_cap:: Capabilities after execve were correct
ok 1 Passed
Check cap_ambient manipulation rules
ok 2 PR_CAP_AMBIENT_RAISE failed on non-inheritable cap
ok 3 PR_CAP_AMBIENT_RAISE failed on non-permitted cap
ok 4 PR_CAP_AMBIENT_RAISE worked
ok 5 Basic manipulation appears to work
[RUN] Non-root +i => i
validate_cap:: Capabilities after execve were correct
ok 6 Passed
[RUN] UID 1 +ia => eipa
validate_cap:: Capabilities after execve were correct
ok 7 Passed
ok 8 # SKIP SUID/SGID tests (needs privilege)
Planned tests != run tests (9 != 8)
Totals: pass:7 fail:0 xfail:0 xpass:0 skip:1 error:0
Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi(a)gmail.com>
---
tools/testing/selftests/capabilities/test_execve.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/tools/testing/selftests/capabilities/test_execve.c b/tools/testing/selftests/capabilities/test_execve.c
index df0ef02b4036..8236150d377e 100644
--- a/tools/testing/selftests/capabilities/test_execve.c
+++ b/tools/testing/selftests/capabilities/test_execve.c
@@ -96,11 +96,7 @@ static bool create_and_enter_ns(uid_t inner_uid)
outer_uid = getuid();
outer_gid = getgid();
- /*
- * TODO: If we're already root, we could skip creating the userns.
- */
-
- if (unshare(CLONE_NEWNS) == 0) {
+ if (outer_uid == 0 && unshare(CLONE_NEWNS) == 0) {
ksft_print_msg("[NOTE]\tUsing global UIDs for tests\n");
if (prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0) != 0)
ksft_exit_fail_msg("PR_SET_KEEPCAPS - %s\n",
--
2.34.1
From: David Woodhouse <dwmw(a)amazon.co.uk>
Using -MD without -MP causes build failures when a header file is deleted
or moved. With -MP, the compiler will emit phony targets for the header
files it lists as dependencies, and the Makefiles won't refuse to attempt
to rebuild a C unit which no longer includes the deleted header.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
---
tools/testing/selftests/kvm/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a3bb36fb3cfc..20ea549da570 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -211,7 +211,7 @@ else
LINUX_TOOL_ARCH_INCLUDE = $(top_srcdir)/tools/arch/$(ARCH)/include
endif
CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
- -Wno-gnu-variable-sized-type-not-at-end -MD\
+ -Wno-gnu-variable-sized-type-not-at-end -MD -MP \
-fno-builtin-memcmp -fno-builtin-memcpy -fno-builtin-memset \
-fno-builtin-strnlen \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
--
2.41.0
We have started printing more and more intentional stack traces. Whether
it's testing KASAN is able to detect use after frees or it's part of a
kunit test.
These stack traces can be problematic. They suddenly show up as a new
failure. Now the test team has to contact the developers. A bunch of
people have to investigate the bug. We finally decide that it's
intentional so now the test team has to update their filter scripts to
mark it as intentional. These filters are ad-hoc because there is no
standard format for warnings.
A better way would be to mark it as intentional from the start.
Here, I have marked the beginning and the end of the trace. It's more
tricky for things like lkdtm_FORTIFY_MEM_MEMBER() where the flow doesn't
reach the end of the function. I guess I would print a different
warning for stack traces that can't have a
"Intentional warning finished\n" message at the end.
I haven't actually tested this patch... Daniel, do you have a
list of intentional stack traces we could annotate?
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
drivers/gpu/drm/tests/drm_rect_test.c | 2 ++
include/kunit/test.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/tests/drm_rect_test.c b/drivers/gpu/drm/tests/drm_rect_test.c
index 76332cd2ead8..367738254493 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -409,8 +409,10 @@ static void drm_test_rect_calc_hscale(struct kunit *test)
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
+ START_INTENTIONAL_WARNING();
scaling_factor = drm_rect_calc_hscale(¶ms->src, ¶ms->dst,
params->min_range, params->max_range);
+ END_INTENTIONAL_WARNING();
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
}
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 20ed9f9275c9..1f01d4c81055 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -337,6 +337,9 @@ void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
void kunit_exec_run_tests(struct kunit_suite_set *suite_set, bool builtin);
void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool include_attr);
+#define START_INTENTIONAL_WARNING() pr_info("Triggering a stack trace\n")
+#define END_INTENTIONAL_WARNING() pr_info("Intentional warning finished\n")
+
#if IS_BUILTIN(CONFIG_KUNIT)
int kunit_run_all_tests(void);
#else
--
2.42.0
In public cloud scenario, if kdump service works abnormally,
users cannot get vmcore. Without vmcore, user has no idea why the
kernel crashed. Meanwhile, there is no additional information
to find the reason why the kdump service is abnormal.
One way is to obtain console messages through VNC. The drawback
is that VNC is real-time, if user missed the timing to get the VNC
output, the crash needs to be retriggered.
Another way is to enable the console frontend of pstore and record the
console messages to the pstore backend. On the one hand, the console
logs only contain kernel printk logs and does not cover
user-mode print logs. Although we can redirect user-mode logs to the
pmsg frontend provided by pstore, user-mode information related to
booting and kdump service vary from systemd, kdump.sh, and so on which
makes redirection troublesome. So we added a tty frontend and save all
logs of tty driver to the pstore backend.
Another problem is that currently pstore only supports a single backend.
For debugging kdump problems, we hope to save the console logs and tty
logs to the ramoops backend of pstore, as it will not be lost after
rebooting. If the user has enabled another backend, the ramoops backend
will not be registered. To this end, we add the multi-backend function
to support simultaneous registration of multiple backends.
Based on the above changes, we can enable pstore in the crashdump kernel
and save the console logs and tty logs to the ramoops backend of pstore.
After rebooting, we can view the relevant logs by mounting the pstore
file system.
Furthermore, we also modified kexec-tools referring to crash-utils for
reading memory, so that pstore ramoops information can be read without
enabling pstore in first kernel. As we set the address and size of ramoops,
as well as the sizes of console and tty, we can infer the physical address
of console logs and tty logs in memory. Referring to the read method of
crash-utils, the console logs and tty logs are read from the memory,
user can get pstore debug information without affecting the first kernel
at all.
kexec-tools modification can be seen at
https://github.com/shuyuanmen/kexec-tools/blob/main/Add-pstore-segment.patch
Yuanhe Shu (5):
pstore: add tty frontend
pstore: add multi-backends support
pstore: add subdirs for multi-backends
pstore: remove the module parameter "backend"
tools/pstore: update pstore selftests
drivers/tty/n_tty.c | 1 +
fs/pstore/Kconfig | 23 ++
fs/pstore/Makefile | 2 +
fs/pstore/blk.c | 10 +
fs/pstore/ftrace.c | 22 +-
fs/pstore/inode.c | 86 ++++++-
fs/pstore/internal.h | 16 +-
fs/pstore/platform.c | 238 ++++++++++++--------
fs/pstore/pmsg.c | 23 +-
fs/pstore/ram.c | 40 +++-
fs/pstore/tty.c | 56 +++++
fs/pstore/zone.c | 42 +++-
include/linux/pstore.h | 33 +++
include/linux/pstore_blk.h | 3 +
include/linux/pstore_ram.h | 1 +
include/linux/pstore_zone.h | 2 +
include/linux/tty.h | 14 ++
tools/testing/selftests/pstore/common_tests | 4 -
18 files changed, 500 insertions(+), 116 deletions(-)
create mode 100644 fs/pstore/tty.c
--
2.39.3