[PATCH 0/5] selftests/resctrl: Fixes to failing tests

List overview All Threads
Download

newer

older

[PATCH 1/2] kunit: tool: add...

[PATCH net-next v3 0/6] bridge:...

Ilpo Järvinen

11 Sep 2023 11 Sep '23

11:19 a.m.

Fix three issues with resctrl selftests.

The signal handling fix became necessary after the mount/umount fixes.

The other two came up when I ran resctrl selftests across the server fleet in our lab to validate the upcoming CAT test rewrite (the rewrite is not part of this series).

These are developed and should apply cleanly at least on top the benchmark cleanup series (might apply cleanly also w/o the benchmark series, I didn't test).

Ilpo Järvinen (5): selftests/resctrl: Extend signal handler coverage to unmount on receiving signal selftests/resctrl: Remove duplicate feature check from CMT test selftests/resctrl: Refactor feature check to use resource and feature name selftests/resctrl: Fix feature checks selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

tools/testing/selftests/resctrl/cat_test.c | 8 --- tools/testing/selftests/resctrl/cmt_test.c | 3 - tools/testing/selftests/resctrl/mba_test.c | 2 +- tools/testing/selftests/resctrl/mbm_test.c | 2 +- tools/testing/selftests/resctrl/resctrl.h | 6 +- .../testing/selftests/resctrl/resctrl_tests.c | 37 ++++++++-- tools/testing/selftests/resctrl/resctrl_val.c | 22 +++--- tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++++----------- 8 files changed, 73 insertions(+), 76 deletions(-)

-- 2.30.2

Show replies by date

Ilpo Järvinen

11 Sep 11 Sep

11:19 a.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org --- tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no; - } else { - ret = signal_handler_register(); - if (ret) { - kill(bm_pid, SIGKILL); - goto out; - } }

remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL); - signal_handler_unregister(); }

-out: cat_test_cleanup();

return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)

ksft_print_msg("Starting MBM BW change ...\n");

+ res = signal_handler_register(); + if (res) + return; + res = mount_resctrlfs(); if (res) { + signal_handler_unregister(); ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; } @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)

umount: umount_resctrlfs(); + signal_handler_unregister(); }

static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)

ksft_print_msg("Starting MBA Schemata change ...\n");

+ res = signal_handler_register(); + if (res) + return; + res = mount_resctrlfs(); if (res) { + signal_handler_unregister(); ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; } @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)

umount: umount_resctrlfs(); + signal_handler_unregister(); }

static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) @@ -123,8 +135,13 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)

ksft_print_msg("Starting CMT test ...\n");

+ res = signal_handler_register(); + if (res) + return; + res = mount_resctrlfs(); if (res) { + signal_handler_unregister(); ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; } @@ -141,6 +158,7 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)

umount: umount_resctrlfs(); + signal_handler_unregister(); }

static void run_cat_test(int cpu_no, int no_of_bits) @@ -149,8 +167,13 @@ static void run_cat_test(int cpu_no, int no_of_bits)

ksft_print_msg("Starting CAT test ...\n");

+ res = signal_handler_register(); + if (res) + return; + res = mount_resctrlfs(); if (res) { + signal_handler_unregister(); ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; } @@ -165,6 +188,7 @@ static void run_cat_test(int cpu_no, int no_of_bits)

umount: umount_resctrlfs(); + signal_handler_unregister(); }

int main(int argc, char **argv) diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 51963a6f2186..a9fe61133119 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -468,7 +468,9 @@ pid_t bm_pid, ppid;

void ctrlc_handler(int signum, siginfo_t *info, void *ptr) { - kill(bm_pid, SIGKILL); + /* Only kill child after bm_pid is set after fork() */ + if (bm_pid) + kill(bm_pid, SIGKILL); umount_resctrlfs(); tests_cleanup(); ksft_print_msg("Ending\n\n"); @@ -485,6 +487,8 @@ int signal_handler_register(void) struct sigaction sigact; int ret = 0;

+ bm_pid = 0; + sigact.sa_sigaction = ctrlc_handler; sigemptyset(&sigact.sa_mask); sigact.sa_flags = SA_SIGINFO; @@ -706,10 +710,6 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par

ksft_print_msg("Benchmark PID: %d\n", bm_pid);

- ret = signal_handler_register(); - if (ret) - goto out; - /* * The cast removes constness but nothing mutates benchmark_cmd within * the context of this process. At the receiving process, it becomes @@ -721,19 +721,19 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par /* Taskset benchmark to specified cpu */ ret = taskset_benchmark(bm_pid, param->cpu_no); if (ret) - goto unregister; + goto out;

/* Write benchmark to specified control&monitoring grp in resctrl FS */ ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp, resctrl_val); if (ret) - goto unregister; + goto out;

if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) || !strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) { ret = initialize_mem_bw_imc(); if (ret) - goto unregister; + goto out;

initialize_mem_bw_resctrl(param->ctrlgrp, param->mongrp, param->cpu_no, resctrl_val); @@ -748,7 +748,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par sizeof(pipe_message)) { perror("# failed reading message from child process"); close(pipefd[0]); - goto unregister; + goto out; } } close(pipefd[0]); @@ -757,7 +757,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par if (sigqueue(bm_pid, SIGUSR1, value) == -1) { perror("# sigqueue SIGUSR1 to child"); ret = errno; - goto unregister; + goto out; }

/* Give benchmark enough time to fully run */ @@ -786,8 +786,6 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par } }

-unregister: - signal_handler_unregister(); out: kill(bm_pid, SIGKILL);

-- 2.30.2

Reinette Chatre

12 Sep 12 Sep

10:06 p.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...

Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Thank you for catching this.

...

Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no;
} else {
ret = signal_handler_register();
if (ret) {
	kill(bm_pid, SIGKILL);
	goto out;
}
}
remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL);
signal_handler_unregister();
}
-out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBM BW change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

} static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

}

This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?

Reinette

Ilpo Järvinen

13 Sep 13 Sep

10:01 a.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Thank you for catching this.

...
Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no;
} else {
ret = signal_handler_register();
if (ret) {
	kill(bm_pid, SIGKILL);
	goto out;
}
}
remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL);
signal_handler_unregister();
}
-out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBM BW change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

} static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

}
This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?

Yes. Consolidating all these is among my not-yet submitted patches. I just had to do a backport-friendly Fixes patch first for this.

-- i.

Reinette Chatre

8:58 p.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

Hi Ilpo,

On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:

...

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Thank you for catching this.

...
Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no;
} else {
ret = signal_handler_register();
if (ret) {
	kill(bm_pid, SIGKILL);
	goto out;
}
}
remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL);
signal_handler_unregister();
}
-out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBM BW change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

} static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

}
This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?
Yes. Consolidating all these is among my not-yet submitted patches. I just had to do a backport-friendly Fixes patch first for this.

Could you please help me understand how the duplicate calls are more backport friendly?

Reinette

Ilpo Järvinen

14 Sep 14 Sep

10:16 a.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

On Wed, 13 Sep 2023, Reinette Chatre wrote:

...

On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Thank you for catching this.

...
Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no;
} else {
ret = signal_handler_register();
if (ret) {
	kill(bm_pid, SIGKILL);
	goto out;
}
}
remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL);
signal_handler_unregister();
}
-out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBM BW change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

} static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

}
This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?
Yes. Consolidating all these is among my not-yet submitted patches. I just had to do a backport-friendly Fixes patch first for this.
Could you please help me understand how the duplicate calls are more backport friendly?

Hi,

It's simply because the refactoring that has to be done to be able to introduce the generalized test framework is much more invasive and far reaching than this patch. Essentially, all the call signatures of the test functions need to match and the feature checks need to be done in new per test functions too. This is the diffstat of those changes alone:

tools/testing/selftests/resctrl/cat_test.c | 21 +++-- tools/testing/selftests/resctrl/cmt_test.c | 26 +++-- tools/testing/selftests/resctrl/mba_test.c | 20 +++- tools/testing/selftests/resctrl/mbm_test.c | 20 +++- tools/testing/selftests/resctrl/resctrl.h | 43 ++++++++- tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++---------------------------- tools/testing/selftests/resctrl/resctrlfs.c | 5 +

(tools/testing/selftests/resctrl/resctrl_tests.c --- part would be slightly less if I'd reorder this patch but that only 24 lines off as per diffstat of this patch).

But that's not all.... To be able to push the generalized test framework to stable, you need to also count in the benchmark cmd changes which worked towards making the call signatures identical. So here's the diffstat for that series for quick reference:

tools/testing/selftests/resctrl/cache.c | 5 +- tools/testing/selftests/resctrl/cat_test.c | 13 +-- tools/testing/selftests/resctrl/cmt_test.c | 34 ++++-- tools/testing/selftests/resctrl/mba_test.c | 4 +- tools/testing/selftests/resctrl/mbm_test.c | 7 +- tools/testing/selftests/resctrl/resctrl.h | 16 +-- .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++---------- tools/testing/selftests/resctrl/resctrl_val.c | 10 +-

That's ~500 lines changed vs ~50 so it's a magnitude worse and much less localized.

And rest assured, I did not like introducing the duplicated calls any more than you do (I did not write the generalized test framework for nothing, after all) but the way taken in this patch seemed the most reasonable option under these circumstances.

-- i.

Reinette Chatre

3:04 p.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

Hi Ilpo,

On 9/14/2023 3:16 AM, Ilpo Järvinen wrote:

...

On Wed, 13 Sep 2023, Reinette Chatre wrote:

...
On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Thank you for catching this.

...
Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no;
} else {
ret = signal_handler_register();
if (ret) {
	kill(bm_pid, SIGKILL);
	goto out;
}
}
remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL);
signal_handler_unregister();
}
-out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBM BW change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

} static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

}
This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?
Yes. Consolidating all these is among my not-yet submitted patches. I just had to do a backport-friendly Fixes patch first for this.
Could you please help me understand how the duplicate calls are more backport friendly?
Hi,

It's simply because the refactoring that has to be done to be able to introduce the generalized test framework is much more invasive and far reaching than this patch. Essentially, all the call signatures of the test functions need to match and the feature checks need to be done in new per test functions too. This is the diffstat of those changes alone:

tools/testing/selftests/resctrl/cat_test.c | 21 +++-- tools/testing/selftests/resctrl/cmt_test.c | 26 +++-- tools/testing/selftests/resctrl/mba_test.c | 20 +++- tools/testing/selftests/resctrl/mbm_test.c | 20 +++- tools/testing/selftests/resctrl/resctrl.h | 43 ++++++++- tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++---------------------------- tools/testing/selftests/resctrl/resctrlfs.c | 5 +

(tools/testing/selftests/resctrl/resctrl_tests.c --- part would be slightly less if I'd reorder this patch but that only 24 lines off as per diffstat of this patch).

But that's not all.... To be able to push the generalized test framework to stable, you need to also count in the benchmark cmd changes which worked towards making the call signatures identical. So here's the diffstat for that series for quick reference:

tools/testing/selftests/resctrl/cache.c | 5 +- tools/testing/selftests/resctrl/cat_test.c | 13 +-- tools/testing/selftests/resctrl/cmt_test.c | 34 ++++-- tools/testing/selftests/resctrl/mba_test.c | 4 +- tools/testing/selftests/resctrl/mbm_test.c | 7 +- tools/testing/selftests/resctrl/resctrl.h | 16 +-- .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++---------- tools/testing/selftests/resctrl/resctrl_val.c | 10 +-

That's ~500 lines changed vs ~50 so it's a magnitude worse and much less localized.

And rest assured, I did not like introducing the duplicated calls any more than you do (I did not write the generalized test framework for nothing, after all) but the way taken in this patch seemed the most reasonable option under these circumstances.

hmmm ... I did not expect that a total refactoring would be needed.

I was thinking about a change from this:

testX(...) { res = signal_handler_register(); /* error handling */ res = mount_resctrlfs(); /* error handling */ /* test */

unmount_resctrlfs(); signal_handler_register();

}

to this:

int test_setup(...) { res = signal_handler_register(); /* error handling */ res = mount_resctrlfs(); /* error handling */ }

void test_cleanup(...) { unmount_resctrlfs(); signal_handler_register(); }

testX(...) {

res = test_setup(..); /* error handling */

/* test */

test_cleanup(); }

I expect this to also support the bigger refactoring.

Reinette

Ilpo Järvinen

5:05 p.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

On Thu, 14 Sep 2023, Reinette Chatre wrote:

...

On 9/14/2023 3:16 AM, Ilpo Järvinen wrote:

...
On Wed, 13 Sep 2023, Reinette Chatre wrote:

...
On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted.

Move signal_handler_register() and signal_handler_unregister() call into the test functions in resctrl_tests.c to properly unmount resctrl fs. Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked.

Thank you for catching this.

...
Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

tools/testing/selftests/resctrl/cat_test.c | 8 ------- .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no;
} else {
ret = signal_handler_register();
if (ret) {
	kill(bm_pid, SIGKILL);
	goto out;
}
}
remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL);
signal_handler_unregister();
}
-out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..3d66fbdc2df3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBM BW change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

} static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n");
res = signal_handler_register();

if (res)
return;
res = mount_resctrlfs(); if (res) {
signal_handler_unregister();
ksft_exit_fail_msg("Failed to mount resctrl FS\n"); return; }
@@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) umount: umount_resctrlfs();

signal_handler_unregister();

}
This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?
Yes. Consolidating all these is among my not-yet submitted patches. I just had to do a backport-friendly Fixes patch first for this.
Could you please help me understand how the duplicate calls are more backport friendly?
Hi,

It's simply because the refactoring that has to be done to be able to introduce the generalized test framework is much more invasive and far reaching than this patch. Essentially, all the call signatures of the test functions need to match and the feature checks need to be done in new per test functions too. This is the diffstat of those changes alone:

tools/testing/selftests/resctrl/cat_test.c | 21 +++-- tools/testing/selftests/resctrl/cmt_test.c | 26 +++-- tools/testing/selftests/resctrl/mba_test.c | 20 +++- tools/testing/selftests/resctrl/mbm_test.c | 20 +++- tools/testing/selftests/resctrl/resctrl.h | 43 ++++++++- tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++---------------------------- tools/testing/selftests/resctrl/resctrlfs.c | 5 +

(tools/testing/selftests/resctrl/resctrl_tests.c --- part would be slightly less if I'd reorder this patch but that only 24 lines off as per diffstat of this patch).

But that's not all.... To be able to push the generalized test framework to stable, you need to also count in the benchmark cmd changes which worked towards making the call signatures identical. So here's the diffstat for that series for quick reference:

tools/testing/selftests/resctrl/cache.c | 5 +- tools/testing/selftests/resctrl/cat_test.c | 13 +-- tools/testing/selftests/resctrl/cmt_test.c | 34 ++++-- tools/testing/selftests/resctrl/mba_test.c | 4 +- tools/testing/selftests/resctrl/mbm_test.c | 7 +- tools/testing/selftests/resctrl/resctrl.h | 16 +-- .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++---------- tools/testing/selftests/resctrl/resctrl_val.c | 10 +-

That's ~500 lines changed vs ~50 so it's a magnitude worse and much less localized.

And rest assured, I did not like introducing the duplicated calls any more than you do (I did not write the generalized test framework for nothing, after all) but the way taken in this patch seemed the most reasonable option under these circumstances.
hmmm ... I did not expect that a total refactoring would be needed.

I was thinking about a change from this:

testX(...) { res = signal_handler_register(); /* error handling */ res = mount_resctrlfs(); /* error handling */ /* test */
unmount_resctrlfs();
signal_handler_register();
}

to this:

int test_setup(...) { res = signal_handler_register(); /* error handling */ res = mount_resctrlfs(); /* error handling */ }

void test_cleanup(...) { unmount_resctrlfs(); signal_handler_register(); }

testX(...) {
res = test_setup(..);
/* error handling */

/* test */

test_cleanup();
}

I expect this to also support the bigger refactoring.

Okay, I'll do so then.

However, having already written the generic run_single_test() function that is part of the generic test framework, I definitely don't feel those helpers would be that helpful for it. It more feels like they'd make the flow less obvious by adding those two extra calls there but that's of course matter of taste.

-- i.

Reinette Chatre

5:29 p.m.

New subject: [PATCH 1/5] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

Hi Ilpo,

On 9/14/2023 10:05 AM, Ilpo Järvinen wrote:

...

On Thu, 14 Sep 2023, Reinette Chatre wrote:

...
On 9/14/2023 3:16 AM, Ilpo Järvinen wrote:

...
On Wed, 13 Sep 2023, Reinette Chatre wrote:

...
On 9/13/2023 3:01 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote: > Unmounting resctrl FS has been moved into the per test functions in > resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move > resctrl FS mount/umount to higher level"). In case a signal (SIGINT, > SIGTERM, or SIGHUP) is received, the running selftest is aborted by > ctrlc_handler() which then unmounts resctrl fs before exiting. The > current section between signal_handler_register() and > signal_handler_unregister(), however, does not cover the entire > duration when resctrl FS is mounted. > > Move signal_handler_register() and signal_handler_unregister() call > into the test functions in resctrl_tests.c to properly unmount resctrl > fs. Adjust child process kill() call in ctrlc_handler() to only be > invoked if the child was already forked.

Thank you for catching this.

> > Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") > Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com > Cc: stable@vger.kernel.org > --- > tools/testing/selftests/resctrl/cat_test.c | 8 ------- > .../testing/selftests/resctrl/resctrl_tests.c | 24 +++++++++++++++++++ > tools/testing/selftests/resctrl/resctrl_val.c | 22 ++++++++--------- > 3 files changed, 34 insertions(+), 20 deletions(-) > > diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c > index 97b87285ab2a..224ba8544d8a 100644 > --- a/tools/testing/selftests/resctrl/cat_test.c > +++ b/tools/testing/selftests/resctrl/cat_test.c > @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) > strcpy(param.filename, RESULT_FILE_NAME1); > param.num_of_runs = 0; > param.cpu_no = sibling_cpu_no; > - } else { > - ret = signal_handler_register(); > - if (ret) { > - kill(bm_pid, SIGKILL); > - goto out; > - } > } > > remove(param.filename); > @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) > } > close(pipefd[0]); > kill(bm_pid, SIGKILL); > - signal_handler_unregister(); > } > > -out: > cat_test_cleanup(); > > return ret; > diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c > index 823672a20a43..3d66fbdc2df3 100644 > --- a/tools/testing/selftests/resctrl/resctrl_tests.c > +++ b/tools/testing/selftests/resctrl/resctrl_tests.c > @@ -73,8 +73,13 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > > ksft_print_msg("Starting MBM BW change ...\n"); > > + res = signal_handler_register(); > + if (res) > + return; > + > res = mount_resctrlfs(); > if (res) { > + signal_handler_unregister(); > ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > return; > } > @@ -91,6 +96,7 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > > umount: > umount_resctrlfs(); > + signal_handler_unregister(); > } > > static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > @@ -99,8 +105,13 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > > ksft_print_msg("Starting MBA Schemata change ...\n"); > > + res = signal_handler_register(); > + if (res) > + return; > + > res = mount_resctrlfs(); > if (res) { > + signal_handler_unregister(); > ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > return; > } > @@ -115,6 +126,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > > umount: > umount_resctrlfs(); > + signal_handler_unregister(); > } >

This adds more duplicated code for every test. Have you considered a single test setup function that can be used to mount resctrl FS and setup the signal handler paired with a single test teardown function?

Yes. Consolidating all these is among my not-yet submitted patches. I just had to do a backport-friendly Fixes patch first for this.

Could you please help me understand how the duplicate calls are more backport friendly?

Hi,

It's simply because the refactoring that has to be done to be able to introduce the generalized test framework is much more invasive and far reaching than this patch. Essentially, all the call signatures of the test functions need to match and the feature checks need to be done in new per test functions too. This is the diffstat of those changes alone:

tools/testing/selftests/resctrl/cat_test.c | 21 +++-- tools/testing/selftests/resctrl/cmt_test.c | 26 +++-- tools/testing/selftests/resctrl/mba_test.c | 20 +++- tools/testing/selftests/resctrl/mbm_test.c | 20 +++- tools/testing/selftests/resctrl/resctrl.h | 43 ++++++++- tools/testing/selftests/resctrl/resctrl_tests.c | 220 +++++++++++++++---------------------------- tools/testing/selftests/resctrl/resctrlfs.c | 5 +

(tools/testing/selftests/resctrl/resctrl_tests.c --- part would be slightly less if I'd reorder this patch but that only 24 lines off as per diffstat of this patch).

But that's not all.... To be able to push the generalized test framework to stable, you need to also count in the benchmark cmd changes which worked towards making the call signatures identical. So here's the diffstat for that series for quick reference:

tools/testing/selftests/resctrl/cache.c | 5 +- tools/testing/selftests/resctrl/cat_test.c | 13 +-- tools/testing/selftests/resctrl/cmt_test.c | 34 ++++-- tools/testing/selftests/resctrl/mba_test.c | 4 +- tools/testing/selftests/resctrl/mbm_test.c | 7 +- tools/testing/selftests/resctrl/resctrl.h | 16 +-- .../testing/selftests/resctrl/resctrl_tests.c | 100 ++++++++---------- tools/testing/selftests/resctrl/resctrl_val.c | 10 +-

That's ~500 lines changed vs ~50 so it's a magnitude worse and much less localized.

And rest assured, I did not like introducing the duplicated calls any more than you do (I did not write the generalized test framework for nothing, after all) but the way taken in this patch seemed the most reasonable option under these circumstances.

hmmm ... I did not expect that a total refactoring would be needed.

I was thinking about a change from this:

testX(...) { res = signal_handler_register(); /* error handling */ res = mount_resctrlfs(); /* error handling */ /* test */
unmount_resctrlfs();
signal_handler_register();
}

to this:

int test_setup(...) { res = signal_handler_register(); /* error handling */ res = mount_resctrlfs(); /* error handling */ }

void test_cleanup(...) { unmount_resctrlfs(); signal_handler_register(); }

testX(...) {
res = test_setup(..);
/* error handling */

/* test */

test_cleanup();
}

I expect this to also support the bigger refactoring.
Okay, I'll do so then.

However, having already written the generic run_single_test() function that is part of the generic test framework, I definitely don't feel those helpers would be that helpful for it. It more feels like they'd make the flow less obvious by adding those two extra calls there but that's of course matter of taste.

Sounds like there is some room for improvement here, perhaps open coding the test_setup() and test_cleanup() helpers within run_single_test(). This is purely speculation on my part as I have not seen the code.

Reinette

Ilpo Järvinen

11 Sep 11 Sep

11:19 a.m.

New subject: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test

The test runner run_cmt_test() in resctrl_tests.c checks for CMT feature and does not run cmt_resctrl_val() if CMT is not supported. Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org --- tools/testing/selftests/resctrl/cmt_test.c | 3 --- 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c index cf2f5e92dea6..50bdbce9fba9 100644 --- a/tools/testing/selftests/resctrl/cmt_test.c +++ b/tools/testing/selftests/resctrl/cmt_test.c @@ -80,9 +80,6 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd) size_t span; int ret, i;

- if (!validate_resctrl_feature_request(CMT_STR)) - return -1; - ret = get_cbm_mask("L3", cbm_mask); if (ret) return ret;

-- 2.30.2

Reinette Chatre

12 Sep 12 Sep

10:06 p.m.

New subject: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...

The test runner run_cmt_test() in resctrl_tests.c checks for CMT feature and does not run cmt_resctrl_val() if CMT is not supported. Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not look like stable material to me.

Reinette

Ilpo Järvinen

13 Sep 13 Sep

11:11 a.m.

New subject: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
The test runner run_cmt_test() in resctrl_tests.c checks for CMT feature and does not run cmt_resctrl_val() if CMT is not supported. Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not look like stable material to me.

I know but when constructing this series I had 2 options:

Either convert also this when changing validate_resctrl_feature_request() or remove this call entirely.

Given it's duplicate of the other CMT check, I chose to just remove it (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires this, this has to go stable if 4/5 goes too.

-- i.

Reinette Chatre

8:58 p.m.

New subject: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test

Hi Ilpo,

On 9/13/2023 4:11 AM, Ilpo Järvinen wrote:

...

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
The test runner run_cmt_test() in resctrl_tests.c checks for CMT feature and does not run cmt_resctrl_val() if CMT is not supported. Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not look like stable material to me.

I know but when constructing this series I had 2 options:

Either convert also this when changing validate_resctrl_feature_request() or remove this call entirely.

Given it's duplicate of the other CMT check, I chose to just remove it (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires this, this has to go stable if 4/5 goes too.

Understood. This makes it a dependency of an actual fix, which is addressed in 4/5's sign-off area. This notation is new to me but it is not clear to me that the dependency should also be tagged as stable material (without a fixes tag). Since it is not an actual fix by itself yet is sent to @stable I think it may cause confusion. Is just listing it as a dependency of the actual fix not sufficient (as you already do in 4/5)? Perhaps as compromise this patch can also get a note to the stable team. Something like:

Cc: stable@vger.kernel.org # dependency of "selftests/resctrl: Fix feature checks"

I am not sure though - I would like to avoid confusion and not burden the stable team. If this is a flow you have used before successfully I'd defer to your experience.

Reinette

Ilpo Järvinen

14 Sep 14 Sep

9:58 a.m.

New subject: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test

On Wed, 13 Sep 2023, Reinette Chatre wrote:

...

On 9/13/2023 4:11 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
The test runner run_cmt_test() in resctrl_tests.c checks for CMT feature and does not run cmt_resctrl_val() if CMT is not supported. Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not look like stable material to me.

I know but when constructing this series I had 2 options:

Either convert also this when changing validate_resctrl_feature_request() or remove this call entirely.

Given it's duplicate of the other CMT check, I chose to just remove it (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires this, this has to go stable if 4/5 goes too.

Understood. This makes it a dependency of an actual fix, which is addressed in 4/5's sign-off area. This notation is new to me but it is not clear to me that the dependency should also be tagged as stable material (without a fixes tag). Since it is not an actual fix by itself yet is sent to @stable I think it may cause confusion. Is just listing it as a dependency of the actual fix not sufficient (as you already do in 4/5)? Perhaps as compromise this patch can also get a note to the stable team. Something like:

Cc: stable@vger.kernel.org # dependency of "selftests/resctrl: Fix feature checks"

I am not sure though - I would like to avoid confusion and not burden the stable team. If this is a flow you have used before successfully I'd defer to your experience.

I came across that dependency format when Greg KH replied to somebody how to deal with the cases where there isn't yet a commit id (the cases mentioned in Documentation/process/stable-kernel-rules.rst assumes there is already a commit id). Unfortunately it's long time ago so I cannot easily find the link.

Documentation/process/stable-kernel-rules.rst doesn't state that the stable address should be only used for the patches with Fixes. In general, I believe this doesn't matter much because whether something is Cc'ed or not to stable@vger.kernel.org doesn't seems to impact the decision if a patch goes into stable or not (even if even some maintainers seem to pretend leaving it out makes a difference so I tend to play along and smile myself how incorrect that assumption is :-)).

-- i.

Reinette Chatre

3:04 p.m.

New subject: [PATCH 2/5] selftests/resctrl: Remove duplicate feature check from CMT test

Hi Ilpo,

On 9/14/2023 2:58 AM, Ilpo Järvinen wrote:

...

On Wed, 13 Sep 2023, Reinette Chatre wrote:

...
On 9/13/2023 4:11 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
The test runner run_cmt_test() in resctrl_tests.c checks for CMT feature and does not run cmt_resctrl_val() if CMT is not supported. Then cmt_resctrl_val() also check is CMT is supported.

Remove the duplicated feature check for CMT from cmt_resctrl_val().

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not look like stable material to me.

I know but when constructing this series I had 2 options:

Either convert also this when changing validate_resctrl_feature_request() or remove this call entirely.

Given it's duplicate of the other CMT check, I chose to just remove it (which I'd do anyway). As patch 4/5 requires 3/5 which in turn requires this, this has to go stable if 4/5 goes too.

Understood. This makes it a dependency of an actual fix, which is addressed in 4/5's sign-off area. This notation is new to me but it is not clear to me that the dependency should also be tagged as stable material (without a fixes tag). Since it is not an actual fix by itself yet is sent to @stable I think it may cause confusion. Is just listing it as a dependency of the actual fix not sufficient (as you already do in 4/5)? Perhaps as compromise this patch can also get a note to the stable team. Something like:

Cc: stable@vger.kernel.org # dependency of "selftests/resctrl: Fix feature checks"

I am not sure though - I would like to avoid confusion and not burden the stable team. If this is a flow you have used before successfully I'd defer to your experience.

I came across that dependency format when Greg KH replied to somebody how to deal with the cases where there isn't yet a commit id (the cases mentioned in Documentation/process/stable-kernel-rules.rst assumes there is already a commit id). Unfortunately it's long time ago so I cannot easily find the link.

I see, thank you. I was not aware of this custom.

Reinette

Ilpo Järvinen

11 Sep 11 Sep

11:19 a.m.

New subject: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name

Feature check in validate_resctrl_feature_request() takes in the test name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request() directly rather than deriving them from the test name inside the function which makes the feature check easier to extend for new test cases.

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org --- tools/testing/selftests/resctrl/resctrl.h | 6 +- .../testing/selftests/resctrl/resctrl_tests.c | 10 +-- tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++++----------- 3 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index dd07463cdf48..89ced4152933 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -28,10 +28,6 @@ #define RESCTRL_PATH "/sys/fs/resctrl" #define PHYS_ID_PATH "/sys/devices/system/cpu/cpu" #define INFO_PATH "/sys/fs/resctrl/info" -#define L3_PATH "/sys/fs/resctrl/info/L3" -#define MB_PATH "/sys/fs/resctrl/info/MB" -#define L3_MON_PATH "/sys/fs/resctrl/info/L3_MON" -#define L3_MON_FEATURES_PATH "/sys/fs/resctrl/info/L3_MON/mon_features"

#define ARCH_INTEL 1 #define ARCH_AMD 2 @@ -88,7 +84,7 @@ int get_resource_id(int cpu_no, int *resource_id); int mount_resctrlfs(void); int umount_resctrlfs(void); int validate_bw_report_request(char *bw_report); -bool validate_resctrl_feature_request(const char *resctrl_val); +bool validate_resctrl_feature_request(const char *resource, const char *feature); char *fgrep(FILE *inf, const char *str); int taskset_benchmark(pid_t bm_pid, int cpu_no); void run_benchmark(int signum, siginfo_t *info, void *ucontext); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 3d66fbdc2df3..3052394ca884 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -84,7 +84,9 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) return; }

- if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) { + if (!validate_resctrl_feature_request("L3_MON", "mbm_total_bytes") || + !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") || + (get_vendor() != ARCH_INTEL)) { ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n"); goto umount; } @@ -116,7 +118,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) return; }

- if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) { + if (!validate_resctrl_feature_request("MB", NULL) || (get_vendor() != ARCH_INTEL)) { ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n"); goto umount; } @@ -146,7 +148,7 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) return; }

- if (!validate_resctrl_feature_request(CMT_STR)) { + if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy")) { ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n"); goto umount; } @@ -178,7 +180,7 @@ static void run_cat_test(int cpu_no, int no_of_bits) return; }

- if (!validate_resctrl_feature_request(CAT_STR)) { + if (!validate_resctrl_feature_request("L3", NULL)) { ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n"); goto umount; } diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index bd36ee206602..bd547a10791c 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -10,6 +10,8 @@ */ #include "resctrl.h"

+#include <limits.h> + static int find_resctrl_mount(char *buffer) { FILE *mounts; @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str)

/* * validate_resctrl_feature_request - Check if requested feature is valid. - * @resctrl_val: Requested feature + * @resource: Required resource (e.g., MB, L3, L2, L3_MON, etc.) + * @feature: Feature to be checked under resource (can be NULL). This path + * is relative to the resource path. * - * Return: True if the feature is supported, else false. False is also - * returned if resctrl FS is not mounted. + * Return: True if the resource/feature is supported, else false. False is + * also returned if resctrl FS is not mounted. */ -bool validate_resctrl_feature_request(const char *resctrl_val) +bool validate_resctrl_feature_request(const char *resource, const char *feature) { struct stat statbuf; - bool found = false; + char res_path[PATH_MAX]; char *res; FILE *inf; int ret;

- if (!resctrl_val) + if (!resource) return false;

ret = find_resctrl_mount(NULL); if (ret) return false;

- if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) { - if (!stat(L3_PATH, &statbuf)) - return true; - } else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) { - if (!stat(MB_PATH, &statbuf)) - return true; - } else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) || - !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) { - if (!stat(L3_MON_PATH, &statbuf)) { - inf = fopen(L3_MON_FEATURES_PATH, "r"); - if (!inf) - return false; - - if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) { - res = fgrep(inf, "llc_occupancy"); - if (res) { - found = true; - free(res); - } - } - - if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) { - res = fgrep(inf, "mbm_total_bytes"); - if (res) { - free(res); - res = fgrep(inf, "mbm_local_bytes"); - if (res) { - found = true; - free(res); - } - } - } - fclose(inf); - } - } + snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource); + + if (stat(res_path, &statbuf)) + return false; + + if (!feature) + return true; + + snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource); + inf = fopen(res_path, "r"); + if (!inf) + return false; + + res = fgrep(inf, feature); + free(res); + fclose(inf);

- return found; + return res; }

int filter_dmesg(void)

-- 2.30.2

Reinette Chatre

12 Sep 12 Sep

10:09 p.m.

New subject: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...

Feature check in validate_resctrl_feature_request() takes in the test name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request() directly rather than deriving them from the test name inside the function which makes the feature check easier to extend for new test cases.

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not seem to be stable material.

...

tools/testing/selftests/resctrl/resctrl.h | 6 +- .../testing/selftests/resctrl/resctrl_tests.c | 10 +-- tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++++----------- 3 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index dd07463cdf48..89ced4152933 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -28,10 +28,6 @@ #define RESCTRL_PATH "/sys/fs/resctrl" #define PHYS_ID_PATH "/sys/devices/system/cpu/cpu" #define INFO_PATH "/sys/fs/resctrl/info" -#define L3_PATH "/sys/fs/resctrl/info/L3" -#define MB_PATH "/sys/fs/resctrl/info/MB" -#define L3_MON_PATH "/sys/fs/resctrl/info/L3_MON" -#define L3_MON_FEATURES_PATH "/sys/fs/resctrl/info/L3_MON/mon_features" #define ARCH_INTEL 1 #define ARCH_AMD 2 @@ -88,7 +84,7 @@ int get_resource_id(int cpu_no, int *resource_id); int mount_resctrlfs(void); int umount_resctrlfs(void); int validate_bw_report_request(char *bw_report); -bool validate_resctrl_feature_request(const char *resctrl_val); +bool validate_resctrl_feature_request(const char *resource, const char *feature); char *fgrep(FILE *inf, const char *str); int taskset_benchmark(pid_t bm_pid, int cpu_no); void run_benchmark(int signum, siginfo_t *info, void *ucontext); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 3d66fbdc2df3..3052394ca884 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -84,7 +84,9 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) return; }

if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) {
if (!validate_resctrl_feature_request("L3_MON", "mbm_total_bytes") ||
   !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
   (get_vendor() != ARCH_INTEL)) {
ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n"); goto umount; }
@@ -116,7 +118,7 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) return; }

if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) {

if (!validate_resctrl_feature_request("MB", NULL) || (get_vendor() != ARCH_INTEL)) { ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n"); goto umount; }

@@ -146,7 +148,7 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) return; }

if (!validate_resctrl_feature_request(CMT_STR)) {

if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy")) { ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n"); goto umount; }

@@ -178,7 +180,7 @@ static void run_cat_test(int cpu_no, int no_of_bits) return; }

if (!validate_resctrl_feature_request(CAT_STR)) {

if (!validate_resctrl_feature_request("L3", NULL)) { ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n"); goto umount; }

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index bd36ee206602..bd547a10791c 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -10,6 +10,8 @@ */ #include "resctrl.h" +#include <limits.h>

Could you please include <limits.h> before the local resctrl.h?

...

static int find_resctrl_mount(char *buffer) { FILE *mounts; @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str) /*

validate_resctrl_feature_request - Check if requested feature is valid.

@resctrl_val: Requested feature
@resource: Required resource (e.g., MB, L3, L2, L3_MON, etc.)

@feature: Feature to be checked under resource (can be NULL). This path
is relative to the resource path.

I do not think "this path" is accurate. @feature is not a path but an entry within the mon_features file.

Also please note that mon_features only exists for L3_MON, none of the other listed resources have an associated mon_features file in resctrl. This function is created to be generic has specific requirements on what valid (never checked) parameters should be. This may be ok with the usage but it should not pretend to be generic.

...

Return: True if the feature is supported, else false. False is also
    returned if resctrl FS is not mounted.
Return: True if the resource/feature is supported, else false. False is
    also returned if resctrl FS is not mounted.
*/
-bool validate_resctrl_feature_request(const char *resctrl_val) +bool validate_resctrl_feature_request(const char *resource, const char *feature) { struct stat statbuf;

bool found = false;

char res_path[PATH_MAX];

Please maintain reverse fir order.

...

char *res; FILE *inf; int ret;

if (!resctrl_val)

if (!resource) return false;

ret = find_resctrl_mount(NULL); if (ret) return false;
if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
if (!stat(L3_PATH, &statbuf))
	return true;
} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
if (!stat(MB_PATH, &statbuf))
	return true;
} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
if (!stat(L3_MON_PATH, &statbuf)) {
	inf = fopen(L3_MON_FEATURES_PATH, "r");
	if (!inf)
		return false;
	if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
		res = fgrep(inf, "llc_occupancy");
		if (res) {
			found = true;
			free(res);
		}
	}
	if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
		res = fgrep(inf, "mbm_total_bytes");
		if (res) {
			free(res);
			res = fgrep(inf, "mbm_local_bytes");
			if (res) {
				found = true;
				free(res);
			}
		}
	}
	fclose(inf);
}
}
snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);

if (stat(res_path, &statbuf))
return false;
if (!feature)
return true;
snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);

inf = fopen(res_path, "r");

if (!inf)
return false;
res = fgrep(inf, feature);

free(res);

fclose(inf);
return found;

return res;

This is unexpected. Function should return bool but instead returns a char * that has been freed?

...

} int filter_dmesg(void)

Reinette

Ilpo Järvinen

13 Sep 13 Sep

11:02 a.m.

New subject: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Feature check in validate_resctrl_feature_request() takes in the test name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request() directly rather than deriving them from the test name inside the function which makes the feature check easier to extend for new test cases.

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not seem to be stable material.

Alone it isn't, but both 2/5 and this 3/5 are prerequisites for 4/5 as shown by the tags there.

...

...

tools/testing/selftests/resctrl/resctrl.h | 6 +- .../testing/selftests/resctrl/resctrl_tests.c | 10 +-- tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++++----------- 3 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index dd07463cdf48..89ced4152933 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h

...

...
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index bd36ee206602..bd547a10791c 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -10,6 +10,8 @@ */ #include "resctrl.h" +#include <limits.h>

Could you please include <limits.h> before the local resctrl.h?

Believe me I tried that first but it did not work. So this intentionally in the current order as resctrl.h defines _GNU_SOURCE which is among things that tends to alter many things. If I reorder them, the build gives me these issues:

resctrlfs.c: In function ‘taskset_benchmark’: resctrlfs.c:284:2: warning: implicit declaration of function ‘CPU_ZERO’; did you mean ‘FP_ZERO’? [-Wimplicit-function-declaration] 284 | CPU_ZERO(&my_set); | ^~~~~~~~ | FP_ZERO resctrlfs.c:285:2: warning: implicit declaration of function ‘CPU_SET’ [-Wimplicit-function-declaration] 285 | CPU_SET(cpu_no, &my_set); | ^~~~~~~ resctrlfs.c:287:6: warning: implicit declaration of function ‘sched_setaffinity’ [-Wimplicit-function-declaration] 287 | if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) { | ^~~~~~~~~~~~~~~~~

It might be useful to move _GNU_SOURCE define into Makefile though to avoid these kind of issues (but that's not material for this patch).

...

...
static int find_resctrl_mount(char *buffer) { FILE *mounts; @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str) /*

validate_resctrl_feature_request - Check if requested feature is valid.

@resctrl_val: Requested feature
@resource: Required resource (e.g., MB, L3, L2, L3_MON, etc.)

@feature: Feature to be checked under resource (can be NULL). This path
is relative to the resource path.
I do not think "this path" is accurate. @feature is not a path but an entry within the mon_features file.

Yes, agreed.

...

Also please note that mon_features only exists for L3_MON, none of the other listed resources have an associated mon_features file in resctrl. This function is created to be generic has specific requirements on what valid (never checked) parameters should be. This may be ok with the usage but it should not pretend to be generic.

So are you recommending I split this function into two where the new one would do the mon_features check?

...

...
char *res; FILE *inf; int ret;

if (!resctrl_val)

if (!resource) return false;

ret = find_resctrl_mount(NULL); if (ret) return false;
if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
if (!stat(L3_PATH, &statbuf))
	return true;
} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
if (!stat(MB_PATH, &statbuf))
	return true;
} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
if (!stat(L3_MON_PATH, &statbuf)) {
	inf = fopen(L3_MON_FEATURES_PATH, "r");
	if (!inf)
		return false;
	if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
		res = fgrep(inf, "llc_occupancy");
		if (res) {
			found = true;
			free(res);
		}
	}
	if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
		res = fgrep(inf, "mbm_total_bytes");
		if (res) {
			free(res);
			res = fgrep(inf, "mbm_local_bytes");
			if (res) {
				found = true;
				free(res);
			}
		}
	}
	fclose(inf);
}
}
snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);

if (stat(res_path, &statbuf))
return false;
if (!feature)
return true;
snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);

inf = fopen(res_path, "r");

if (!inf)
return false;
res = fgrep(inf, feature);

free(res);

fclose(inf);
return found;

return res;
This is unexpected. Function should return bool but instead returns a char * that has been freed?

Okay, I understand it looks confusing when relying on implicit conversion to boolean (despite being functionally correct).

I can do this instead to make it more obvious the intention is to convert the result to boolean and not return a pointer: return !!res;

-- i.

Reinette Chatre

8:59 p.m.

New subject: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name

Hi Ilpo,

On 9/13/2023 4:02 AM, Ilpo Järvinen wrote:

...

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Feature check in validate_resctrl_feature_request() takes in the test name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request() directly rather than deriving them from the test name inside the function which makes the feature check easier to extend for new test cases.

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not seem to be stable material.

Alone it isn't, but both 2/5 and this 3/5 are prerequisites for 4/5 as shown by the tags there.

...
...

tools/testing/selftests/resctrl/resctrl.h | 6 +- .../testing/selftests/resctrl/resctrl_tests.c | 10 +-- tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++++----------- 3 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index dd07463cdf48..89ced4152933 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h

...
...
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index bd36ee206602..bd547a10791c 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -10,6 +10,8 @@ */ #include "resctrl.h" +#include <limits.h>

Could you please include <limits.h> before the local resctrl.h?

Believe me I tried that first but it did not work. So this intentionally in the current order as resctrl.h defines _GNU_SOURCE which is among things that tends to alter many things. If I reorder them, the build gives me these issues:

resctrlfs.c: In function ‘taskset_benchmark’: resctrlfs.c:284:2: warning: implicit declaration of function ‘CPU_ZERO’; did you mean ‘FP_ZERO’? [-Wimplicit-function-declaration] 284 | CPU_ZERO(&my_set); | ^~~~~~~~ | FP_ZERO resctrlfs.c:285:2: warning: implicit declaration of function ‘CPU_SET’ [-Wimplicit-function-declaration] 285 | CPU_SET(cpu_no, &my_set); | ^~~~~~~ resctrlfs.c:287:6: warning: implicit declaration of function ‘sched_setaffinity’ [-Wimplicit-function-declaration] 287 | if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) { | ^~~~~~~~~~~~~~~~~

It might be useful to move _GNU_SOURCE define into Makefile though to avoid these kind of issues (but that's not material for this patch).

How about a #define _GNU_SOURCE in this file as an intermediate step? I did see your patch making this change but cannot see how it is coordinated with fixing the include order in this file.

...

...
...
static int find_resctrl_mount(char *buffer) { FILE *mounts; @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str) /*

validate_resctrl_feature_request - Check if requested feature is valid.

@resctrl_val: Requested feature
@resource: Required resource (e.g., MB, L3, L2, L3_MON, etc.)

@feature: Feature to be checked under resource (can be NULL). This path
is relative to the resource path.
I do not think "this path" is accurate. @feature is not a path but an entry within the mon_features file.
Yes, agreed.

...
Also please note that mon_features only exists for L3_MON, none of the other listed resources have an associated mon_features file in resctrl. This function is created to be generic has specific requirements on what valid (never checked) parameters should be. This may be ok with the usage but it should not pretend to be generic.

So are you recommending I split this function into two where the new one would do the mon_features check?

No need to split the function. That seems overkill considering its captive usage. I think a snippet making its usage clear will be helpful. Something like:

@feature: <description>. Can only be set for L3_MON. Must be NULL for all other resources.

Please feel free to improve.

...

...
...
char *res; FILE *inf; int ret;

if (!resctrl_val)

if (!resource) return false;

ret = find_resctrl_mount(NULL); if (ret) return false;
if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR))) {
if (!stat(L3_PATH, &statbuf))
	return true;
} else if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
if (!stat(MB_PATH, &statbuf))
	return true;
} else if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
   !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
if (!stat(L3_MON_PATH, &statbuf)) {
	inf = fopen(L3_MON_FEATURES_PATH, "r");
	if (!inf)
		return false;
	if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
		res = fgrep(inf, "llc_occupancy");
		if (res) {
			found = true;
			free(res);
		}
	}
	if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR))) {
		res = fgrep(inf, "mbm_total_bytes");
		if (res) {
			free(res);
			res = fgrep(inf, "mbm_local_bytes");
			if (res) {
				found = true;
				free(res);
			}
		}
	}
	fclose(inf);
}
}
snprintf(res_path, sizeof(res_path), "%s/%s", INFO_PATH, resource);

if (stat(res_path, &statbuf))
return false;
if (!feature)
return true;
snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);

inf = fopen(res_path, "r");

if (!inf)
return false;
res = fgrep(inf, feature);

free(res);

fclose(inf);
return found;

return res;
This is unexpected. Function should return bool but instead returns a char * that has been freed?
Okay, I understand it looks confusing when relying on implicit conversion to boolean (despite being functionally correct).

I can do this instead to make it more obvious the intention is to convert the result to boolean and not return a pointer: return !!res;

Looks good to me.

Reinette

Ilpo Järvinen

14 Sep 14 Sep

11:06 a.m.

New subject: [PATCH 3/5] selftests/resctrl: Refactor feature check to use resource and feature name

On Wed, 13 Sep 2023, Reinette Chatre wrote:

...

On 9/13/2023 4:02 AM, Ilpo Järvinen wrote:

...
On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
Feature check in validate_resctrl_feature_request() takes in the test name string and maps that to what to check per test.

Pass resource and feature names to validate_resctrl_feature_request() directly rather than deriving them from the test name inside the function which makes the feature check easier to extend for new test cases.

Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Cc: stable@vger.kernel.org

This does not seem to be stable material.

Alone it isn't, but both 2/5 and this 3/5 are prerequisites for 4/5 as shown by the tags there.

...
...

tools/testing/selftests/resctrl/resctrl.h | 6 +- .../testing/selftests/resctrl/resctrl_tests.c | 10 +-- tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++++----------- 3 files changed, 34 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index dd07463cdf48..89ced4152933 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h

...
...
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index bd36ee206602..bd547a10791c 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -10,6 +10,8 @@ */ #include "resctrl.h" +#include <limits.h>

Could you please include <limits.h> before the local resctrl.h?

Believe me I tried that first but it did not work. So this intentionally in the current order as resctrl.h defines _GNU_SOURCE which is among things that tends to alter many things. If I reorder them, the build gives me these issues:

resctrlfs.c: In function ‘taskset_benchmark’: resctrlfs.c:284:2: warning: implicit declaration of function ‘CPU_ZERO’; did you mean ‘FP_ZERO’? [-Wimplicit-function-declaration] 284 | CPU_ZERO(&my_set); | ^~~~~~~~ | FP_ZERO resctrlfs.c:285:2: warning: implicit declaration of function ‘CPU_SET’ [-Wimplicit-function-declaration] 285 | CPU_SET(cpu_no, &my_set); | ^~~~~~~ resctrlfs.c:287:6: warning: implicit declaration of function ‘sched_setaffinity’ [-Wimplicit-function-declaration] 287 | if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) { | ^~~~~~~~~~~~~~~~~

It might be useful to move _GNU_SOURCE define into Makefile though to avoid these kind of issues (but that's not material for this patch).

How about a #define _GNU_SOURCE in this file as an intermediate step? I did see your patch making this change but cannot see how it is coordinated with fixing the include order in this file.

I'll just make that change part of this series and use also it as dependency. Making an intermediate step just for stable that is going to immediately removed in mainline would just causing the code to diverge unnecessarily, IMO.

There's also a small risk for some other bug that does not cause compile to fail due to differences because of a late define for _GNU_SOURCE. I don't find it very likely but seems possible due to differences in some constant values (not that the resctrl selftest code is very good at using those defined constants in the first place, there are plenty of literals still to cleanup).

...

...
...
...
static int find_resctrl_mount(char *buffer) { FILE *mounts; @@ -604,63 +606,46 @@ char *fgrep(FILE *inf, const char *str) /*

validate_resctrl_feature_request - Check if requested feature is valid.

@resctrl_val: Requested feature
@resource: Required resource (e.g., MB, L3, L2, L3_MON, etc.)

@feature: Feature to be checked under resource (can be NULL). This path
is relative to the resource path.
I do not think "this path" is accurate. @feature is not a path but an entry within the mon_features file.
Yes, agreed.

...
Also please note that mon_features only exists for L3_MON, none of the other listed resources have an associated mon_features file in resctrl. This function is created to be generic has specific requirements on what valid (never checked) parameters should be. This may be ok with the usage but it should not pretend to be generic.

So are you recommending I split this function into two where the new one would do the mon_features check?
No need to split the function. That seems overkill considering its captive usage. I think a snippet making its usage clear will be helpful. Something like:

@feature: <description>. Can only be set for L3_MON. Must be NULL for all other resources.

Please feel free to improve.

Thanks, I'll do that.

-- i.

Ilpo Järvinen

11 Sep 11 Sep

11:19 a.m.

New subject: [PATCH 4/5] selftests/resctrl: Fix feature checks

The MBA and CMT tests expect support of other features to be able to run.

When platform only supports MBA but not MBM, MBA test will fail with: Failed to open total bw file: No such file or directory

When platform only supports CMT but not CAT, CMT test will fail with: Failed to open bit mask file '/sys/fs/resctrl/info/L3/cbm_mask': No such file or directory

Extend feature checks to cover these two conditions.

Fixes: ee0415681eb6 ("selftests/resctrl: Use resctrl/info for feature detection") Cc: stable@vger.kernel.org # selftests/resctrl: Refactor feature check to use resource and feature name Cc: stable@vger.kernel.org # selftests/resctrl: Remove duplicate feature check from CMT test Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- tools/testing/selftests/resctrl/resctrl_tests.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 3052394ca884..4e9b392d28dc 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -118,7 +118,9 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) return; }

- if (!validate_resctrl_feature_request("MB", NULL) || (get_vendor() != ARCH_INTEL)) { + if (!validate_resctrl_feature_request("MB", NULL) || + !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") || + (get_vendor() != ARCH_INTEL)) { ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n"); goto umount; } @@ -148,7 +150,8 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) return; }

- if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy")) { + if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy") || + !validate_resctrl_feature_request("L3", NULL)) { ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n"); goto umount; }

-- 2.30.2

Ilpo Järvinen

11:19 a.m.

New subject: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

5% difference upper bound for success is a bit on the low side for the MBA and MBM tests. Some platforms produce outliers that are slightly above that, typically 6-7%.

Relaxing the MBA/MBM success bound to 8% removes most of the failures due those frequent outliers.

Fixes: 06bd03a57f8c ("selftests/resctrl: Fix MBA/MBM results reporting format") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- tools/testing/selftests/resctrl/mba_test.c | 2 +- tools/testing/selftests/resctrl/mbm_test.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index cf8284dadcb2..d3bf4368341e 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -12,7 +12,7 @@

#define RESULT_FILE_NAME "result_mba" #define NUM_OF_RUNS 5 -#define MAX_DIFF_PERCENT 5 +#define MAX_DIFF_PERCENT 8 #define ALLOCATION_MAX 100 #define ALLOCATION_MIN 10 #define ALLOCATION_STEP 10 diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 1ae131a2e246..d3c0d30c676a 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -11,7 +11,7 @@ #include "resctrl.h"

#define RESULT_FILE_NAME "result_mbm" -#define MAX_DIFF_PERCENT 5 +#define MAX_DIFF_PERCENT 8 #define NUM_OF_RUNS 5

static int

-- 2.30.2

Reinette Chatre

12 Sep 12 Sep

10:10 p.m.

New subject: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...

5% difference upper bound for success is a bit on the low side for the

"a bit on the low side" is very vague.

...

MBA and MBM tests. Some platforms produce outliers that are slightly above that, typically 6-7%.

Relaxing the MBA/MBM success bound to 8% removes most of the failures due those frequent outliers.

This description needs more context on what issue is being solved here. What does the % difference represent? How was new percentage determined?

Did you investigate why there are differences between platforms? From what I understand these tests measure memory bandwidth using perf and resctrl and then compare the difference. Are there interesting things about the platforms on which the difference is higher than 5%? Could those be systems with multiple sockets (and thus multiple PMUs that need to be setup, reset, and read)? Can the reading of the counters be improved instead of relaxing the success criteria? A quick comparison between get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference is not surprising ... note how the PMU counters are started and reset (potentially on multiple sockets) at every iteration while the resctrl counters keep rolling and new values are just subtracted from previous.

Reinette

Ilpo Järvinen

13 Sep 13 Sep

11:43 a.m.

New subject: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
5% difference upper bound for success is a bit on the low side for the

"a bit on the low side" is very vague.

The commit that introduced that 5% bound plainly admitted it's "randomly chosen value". At least that wasn't vague, I guess. :-)

So what I'm trying to do here is to have "randomly chosen value" replaced with a value that seems to work well enough based on measurements on a large set of platforms.

Personally, I don't care much about this, I can just ignore the failures due to outliers (and also reports about failing MBA/MBM test if somebody ever sends one to me), but if I'd be one running automated tests it would be annoying to have a problem like this unaddressed.

...

...
MBA and MBM tests. Some platforms produce outliers that are slightly above that, typically 6-7%.

Relaxing the MBA/MBM success bound to 8% removes most of the failures due those frequent outliers.

This description needs more context on what issue is being solved here. What does the % difference represent? How was new percentage determined?

Did you investigate why there are differences between platforms? From what I understand these tests measure memory bandwidth using perf and resctrl and then compare the difference. Are there interesting things about the platforms on which the difference is higher than 5%?

Not really I think. The number just isn't that stable to always remain below 5% (even if it usually does).

Only systematic thing I've come across is that if I play with the read pattern for defeating the hw prefetcher (you've seen a patch earlier and it will be among the series I'll send after this one), it has an impact which looks more systematic across all MBM/MBA tests. But it's not what I'm trying now address with this patch.

...

Could those be systems with multiple sockets (and thus multiple PMUs that need to be setup, reset, and read)? Can the reading of the counters be improved instead of relaxing the success criteria? A quick comparison between get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference is not surprising ... note how the PMU counters are started and reset (potentially on multiple sockets) at every iteration while the resctrl counters keep rolling and new values are just subtracted from previous.

Perhaps, I can try to look into it (add to my todo list so I won't forget). But in the meantime, this new value is picked using a criteria that looks better than "randomly chosen value". If I ever manage to address the outliers, the bound could be lowered again.

I'll update the changelog to explain things better.

-- i.

Reinette Chatre

9 p.m.

New subject: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

Hi Ilpo,

On 9/13/2023 4:43 AM, Ilpo Järvinen wrote:

...

On Tue, 12 Sep 2023, Reinette Chatre wrote:

...
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:

...
5% difference upper bound for success is a bit on the low side for the

"a bit on the low side" is very vague.

The commit that introduced that 5% bound plainly admitted it's "randomly chosen value". At least that wasn't vague, I guess. :-)

So what I'm trying to do here is to have "randomly chosen value" replaced with a value that seems to work well enough based on measurements on a large set of platforms.

Already a better motivation for this change. Your cover letter also hints at this but this description does not mention that this is not just another number pulled from the air but indeed one that is based on significant testing on a large variety of systems. This description can surely mention all the work you did that ended with proposing this new number, no?

...

Personally, I don't care much about this, I can just ignore the failures due to outliers (and also reports about failing MBA/MBM test if somebody ever sends one to me), but if I'd be one running automated tests it would be annoying to have a problem like this unaddressed.

In no way was I suggesting that this should not be addressed.

...

...
...
MBA and MBM tests. Some platforms produce outliers that are slightly above that, typically 6-7%.

Relaxing the MBA/MBM success bound to 8% removes most of the failures due those frequent outliers.

This description needs more context on what issue is being solved here. What does the % difference represent? How was new percentage determined?

Did you investigate why there are differences between platforms? From what I understand these tests measure memory bandwidth using perf and resctrl and then compare the difference. Are there interesting things about the platforms on which the difference is higher than 5%?

Not really I think. The number just isn't that stable to always remain below 5% (even if it usually does).

Only systematic thing I've come across is that if I play with the read pattern for defeating the hw prefetcher (you've seen a patch earlier and it will be among the series I'll send after this one), it has an impact which looks more systematic across all MBM/MBA tests. But it's not what I'm trying now address with this patch.

...
Could those be systems with multiple sockets (and thus multiple PMUs that need to be setup, reset, and read)? Can the reading of the counters be improved instead of relaxing the success criteria? A quick comparison between get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference is not surprising ... note how the PMU counters are started and reset (potentially on multiple sockets) at every iteration while the resctrl counters keep rolling and new values are just subtracted from previous.

Perhaps, I can try to look into it (add to my todo list so I won't forget). But in the meantime, this new value is picked using a criteria that looks better than "randomly chosen value". If I ever manage to address the outliers, the bound could be lowered again.

I'll update the changelog to explain things better.

ok, thank you.

Reinette

833

days inactive

836

days old

linux-kselftest-mirror@lists.linaro.org

24 comments

participants

tags (0)

participants (2)

Ilpo Järvinen
Reinette Chatre