Changes since V3: - V3: https://lore.kernel.org/all/cover.1729218182.git.reinette.chatre@intel.com/ - Rebased on HEAD 2a027d6bb660 of kselftest/next. - Fix empty string parsing issues pointed out by Ilpo. - Add Reviewed-by tags. - Please see individual patches for detailed changes.
Changes since V2: - V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.chatre@intel.com/ - Add fix to protect against buffer overflow when parsing text from sysfs files. - Add cleanup patch to address use of magic constants as pointed out by Ilpo. - Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache size to determine "fill_buf" buffer size" that changed too much since receiving the Reviewed-by tag. - Please see individual patches for detailed changes.
Changes since V1: - V1: https://lore.kernel.org/cover.1724970211.git.reinette.chatre@intel.com/ - V2 contains the same general solutions to stated problem as V1 but these are now preceded by more fixes (patches 1 to 5) and improved robustness (patches 6 to 9) to existing tests before the series gets back to solving the original problem with more confidence in patches 10 to 13. - The posibility of making "memflush = false" for CMT test was discussed during V1. Modifying this setting does not have a significant impact on the observed results that are already well within acceptable range and this version thus keeps original default. If performance was a goal it may be possible to do further experimentation where "memflush = false" could eliminate the need for the sleep(1) within the test wrapper, but improving the performance is not a goal of this work. - (New) Support what seems to be unintended ability for user space to provide parameters to "fill_buf" by making the parsing robust and only support changing parameters that are supported to be changed. Drop support for "write" operation since it has never been measured. - (New) Improve wraparound handling. (Ilpo) - (New) A couple of new fixes addressing issues discovered during development. - (Change from V1) To support fill_buf parameters provided by user space as well as test specific fill_buf parameters struct fill_buf_param is no longer just a member of struct resctrl_val_param, instead there could be at most two instances of struct fill_buf_param, the immutable parameters provided by user space and the parameters used by individual tests. (Ilpo) - Please see individual patches for detailed changes.
V1 cover:
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald Rapids systems. The test failures result from the following two properties of these systems: 1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl MBA and MBM selftests measure memory traffic for which a hardcoded 250MB buffer has been sufficient so far. On platforms with L3 cache larger than the buffer, the buffer fits in the L3 cache and thus no/very little memory traffic is generated during the "memory bandwidth" tests. 2) Some platform features, for example RAS features or memory performance features that generate memory traffic may drive accesses that are counted differently by performance counters and MBM respectively, for instance generating "overhead" traffic which is not counted against any specific RMID. Until now these counting differences have always been "in the noise". On Emerald Rapids systems the maximum MBA throttling (10% memory bandwidth) throttles memory bandwidth to where memory accesses by these other platform features push the memory bandwidth difference between memory controller performance counters and resctrl (MBM) beyond the tests' hardcoded tolerance.
Make the tests more robust against platform variations: 1) Let the buffer used by memory bandwidth tests be guided by the size of the L3 cache. 2) Larger buffers require longer initialization time before the buffer can be used to measurement. Rework the tests to ensure that buffer initialization is complete before measurements start. 3) Do not compare performance counters and MBM measurements at low bandwidth. The value of "low" is hardcoded to 750MiB based on measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake systems. This limit is not applicable to AMD systems since it only applies to the MBA and MBM tests that are isolated to Intel.
[1] https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plati...
Reinette Chatre (15): selftests/resctrl: Make functions only used in same file static selftests/resctrl: Print accurate buffer size as part of MBM results selftests/resctrl: Fix memory overflow due to unhandled wraparound selftests/resctrl: Protect against array overrun during iMC config parsing selftests/resctrl: Protect against array overflow when reading strings selftests/resctrl: Make wraparound handling obvious selftests/resctrl: Remove "once" parameter required to be false selftests/resctrl: Only support measured read operation selftests/resctrl: Remove unused measurement code selftests/resctrl: Make benchmark parameter passing robust selftests/resctrl: Ensure measurements skip initialization of default benchmark selftests/resctrl: Use cache size to determine "fill_buf" buffer size selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth selftests/resctrl: Keep results from first test run selftests/resctrl: Replace magic constants used as array size
tools/testing/selftests/resctrl/cmt_test.c | 37 +- tools/testing/selftests/resctrl/fill_buf.c | 45 +- tools/testing/selftests/resctrl/mba_test.c | 54 ++- tools/testing/selftests/resctrl/mbm_test.c | 37 +- tools/testing/selftests/resctrl/resctrl.h | 79 +++- .../testing/selftests/resctrl/resctrl_tests.c | 95 +++- tools/testing/selftests/resctrl/resctrl_val.c | 447 +++++------------- tools/testing/selftests/resctrl/resctrlfs.c | 19 +- 8 files changed, 354 insertions(+), 459 deletions(-)
base-commit: 2a027d6bb66002c8e50e974676f932b33c5fce10
Fix following sparse warnings: tools/testing/selftests/resctrl/resctrl_val.c:47:6: warning: symbol 'membw_initialize_perf_event_attr' was not declared. Should it be static? tools/testing/selftests/resctrl/resctrl_val.c:64:6: warning: symbol 'membw_ioctl_perf_event_ioc_reset_enable' was not declared. Should it be static? tools/testing/selftests/resctrl/resctrl_val.c:70:6: warning: symbol 'membw_ioctl_perf_event_ioc_disable' was not declared. Should it be static? tools/testing/selftests/resctrl/resctrl_val.c:81:6: warning: symbol 'get_event_and_umask' was not declared. Should it be static?
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since v1: - Add Ilpo's Reviewed-by tag. - Let subject describe the change, not the tool that found it. (checkpatch.pl) --- tools/testing/selftests/resctrl/resctrl_val.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 8c275f6b4dd7..70e8e31f5d1a 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -44,7 +44,7 @@ static int imcs; static struct imc_counter_config imc_counters_config[MAX_IMCS][2]; static const struct resctrl_test *current_test;
-void membw_initialize_perf_event_attr(int i, int j) +static void membw_initialize_perf_event_attr(int i, int j) { memset(&imc_counters_config[i][j].pe, 0, sizeof(struct perf_event_attr)); @@ -61,13 +61,13 @@ void membw_initialize_perf_event_attr(int i, int j) PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING; }
-void membw_ioctl_perf_event_ioc_reset_enable(int i, int j) +static void membw_ioctl_perf_event_ioc_reset_enable(int i, int j) { ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_RESET, 0); ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_ENABLE, 0); }
-void membw_ioctl_perf_event_ioc_disable(int i, int j) +static void membw_ioctl_perf_event_ioc_disable(int i, int j) { ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_DISABLE, 0); } @@ -78,7 +78,7 @@ void membw_ioctl_perf_event_ioc_disable(int i, int j) * @count: iMC number * @op: Operation (read/write) */ -void get_event_and_umask(char *cas_count_cfg, int count, bool op) +static void get_event_and_umask(char *cas_count_cfg, int count, bool op) { char *token[MAX_TOKENS]; int i = 0;
By default the MBM test uses the "fill_buf" benchmark to keep reading from a buffer with size DEFAULT_SPAN while measuring memory bandwidth. User space can provide an alternate benchmark or amend the size of the buffer "fill_buf" should use.
Analysis of the MBM measurements do not require that a buffer be used and thus do not require knowing the size of the buffer if it was used during testing. Even so, the buffer size is printed as informational as part of the MBM test results. What is printed as buffer size is hardcoded as DEFAULT_SPAN, even if the test relied on another benchmark (that may or may not use a buffer) or if user space amended the buffer size.
Ensure that accurate buffer size is printed when using "fill_buf" benchmark and omit the buffer size information if another benchmark is used.
Fixes: ecdbb911f22d ("selftests/resctrl: Add MBM test") Signed-off-by: Reinette Chatre reinette.chatre@intel.com --- Backporting is not recommended. Backporting this fix will be a challenge with all the refactoring done since then. This issue does not impact default tests and there is no sign that folks run these tests with anything but the defaults. This issue is also minor since it does not impact actual test runs or results, just the information printed during a test run.
Changes since V3: - Ensure string parsing handles case when user provides "". (Ilpo) - Fix error returned. (Ilpo)
Changes since V2: - Make user input checks more robust. (Ilpo)
Changes since V1: - New patch. --- tools/testing/selftests/resctrl/mbm_test.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 6b5a3b52d861..cf08ba5e314e 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -40,7 +40,8 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span) ksft_print_msg("%s Check MBM diff within %d%%\n", ret ? "Fail:" : "Pass:", MAX_DIFF_PERCENT); ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per); - ksft_print_msg("Span (MB): %zu\n", span / MB); + if (span) + ksft_print_msg("Span (MB): %zu\n", span / MB); ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc); ksft_print_msg("avg_bw_resc: %lu\n", avg_bw_resc);
@@ -138,15 +139,26 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param .setup = mbm_setup, .measure = mbm_measure, }; + char *endptr = NULL; + size_t span = 0; int ret;
remove(RESULT_FILE_NAME);
+ if (uparams->benchmark_cmd[0] && strcmp(uparams->benchmark_cmd[0], "fill_buf") == 0) { + if (uparams->benchmark_cmd[1] && *uparams->benchmark_cmd[1] != '\0') { + errno = 0; + span = strtoul(uparams->benchmark_cmd[1], &endptr, 10); + if (errno || *endptr != '\0') + return -EINVAL; + } + } + ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m); if (ret) return ret;
- ret = check_results(DEFAULT_SPAN); + ret = check_results(span); if (ret && (get_vendor() == ARCH_INTEL)) ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
On Thu, 24 Oct 2024, Reinette Chatre wrote:
By default the MBM test uses the "fill_buf" benchmark to keep reading from a buffer with size DEFAULT_SPAN while measuring memory bandwidth. User space can provide an alternate benchmark or amend the size of the buffer "fill_buf" should use.
Analysis of the MBM measurements do not require that a buffer be used and thus do not require knowing the size of the buffer if it was used during testing. Even so, the buffer size is printed as informational as part of the MBM test results. What is printed as buffer size is hardcoded as DEFAULT_SPAN, even if the test relied on another benchmark (that may or may not use a buffer) or if user space amended the buffer size.
Ensure that accurate buffer size is printed when using "fill_buf" benchmark and omit the buffer size information if another benchmark is used.
Fixes: ecdbb911f22d ("selftests/resctrl: Add MBM test") Signed-off-by: Reinette Chatre reinette.chatre@intel.com
Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
-- i.
Backporting is not recommended. Backporting this fix will be a challenge with all the refactoring done since then. This issue does not impact default tests and there is no sign that folks run these tests with anything but the defaults. This issue is also minor since it does not impact actual test runs or results, just the information printed during a test run.
Changes since V3:
- Ensure string parsing handles case when user provides "". (Ilpo)
- Fix error returned. (Ilpo)
Changes since V2:
- Make user input checks more robust. (Ilpo)
Changes since V1:
- New patch.
tools/testing/selftests/resctrl/mbm_test.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 6b5a3b52d861..cf08ba5e314e 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -40,7 +40,8 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span) ksft_print_msg("%s Check MBM diff within %d%%\n", ret ? "Fail:" : "Pass:", MAX_DIFF_PERCENT); ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
- ksft_print_msg("Span (MB): %zu\n", span / MB);
- if (span)
ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc); ksft_print_msg("avg_bw_resc: %lu\n", avg_bw_resc);ksft_print_msg("Span (MB): %zu\n", span / MB);
@@ -138,15 +139,26 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param .setup = mbm_setup, .measure = mbm_measure, };
- char *endptr = NULL;
- size_t span = 0; int ret;
remove(RESULT_FILE_NAME);
- if (uparams->benchmark_cmd[0] && strcmp(uparams->benchmark_cmd[0], "fill_buf") == 0) {
if (uparams->benchmark_cmd[1] && *uparams->benchmark_cmd[1] != '\0') {
errno = 0;
span = strtoul(uparams->benchmark_cmd[1], &endptr, 10);
if (errno || *endptr != '\0')
return -EINVAL;
}
- }
- ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m); if (ret) return ret;
- ret = check_results(DEFAULT_SPAN);
- ret = check_results(span); if (ret && (get_vendor() == ARCH_INTEL)) ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
alloc_buffer() allocates and initializes (with random data) a buffer of requested size. The initialization starts from the beginning of the allocated buffer and incrementally assigns sizeof(uint64_t) random data to each cache line. The initialization uses the size of the buffer to control the initialization flow, decrementing the amount of buffer needing to be initialized after each iteration.
The size of the buffer is stored in an unsigned (size_t) variable s64 and the test "s64 > 0" is used to decide if initialization is complete. The problem is that decrementing the buffer size may wrap around if the buffer size is not divisible by "CL_SIZE / sizeof(uint64_t)" resulting in the "s64 > 0" test being true and memory beyond the buffer "initialized".
Use a signed value for the buffer size to support all buffer sizes.
Fixes: a2561b12fe39 ("selftests/resctrl: Add built in benchmark") Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V2: - Add Ilpo's Reviewed-by tag.
Changes since V1: - New patch. --- tools/testing/selftests/resctrl/fill_buf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index ae120f1735c0..34e5df721430 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -127,7 +127,7 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush) { void *buf = NULL; uint64_t *p64; - size_t s64; + ssize_t s64; int ret;
ret = posix_memalign(&buf, PAGE_SIZE, buf_size);
The MBM and MBA tests need to discover the event and umask with which to configure the performance event used to measure read memory bandwidth. This is done by parsing the /sys/bus/event_source/devices/uncore_imc_<imc instance>/events/cas_count_read file for each iMC instance that contains the formatted output: "event=<event>,umask=<umask>"
Parsing of cas_count_read contents is done by initializing an array of MAX_TOKENS elements with tokens (deliminated by "=,") from this file. Remove the unnecessary append of a delimiter to the string needing to be parsed. Per the strtok() man page: "delimiter bytes at the start or end of the string are ignored". This has no impact on the token placement within the array.
After initialization, the actual event and umask is determined by parsing the tokens directly following the "event" and "umask" tokens respectively.
Iterating through the array up to index "i < MAX_TOKENS" but then accessing index "i + 1" risks array overrun during the final iteration. Avoid array overrun by ensuring that the index used within for loop will always be valid.
Fixes: 1d3f08687d76 ("selftests/resctrl: Read memory bandwidth from perf IMC counter and from resctrl file system") Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V2: - Rephrase changelog. (Ilpo) - Add Ilpo's Reviewed-by tag.
Changes since V1: - New patch. --- tools/testing/selftests/resctrl/resctrl_val.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 70e8e31f5d1a..e88d5ca30517 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -83,13 +83,12 @@ static void get_event_and_umask(char *cas_count_cfg, int count, bool op) char *token[MAX_TOKENS]; int i = 0;
- strcat(cas_count_cfg, ","); token[0] = strtok(cas_count_cfg, "=,");
for (i = 1; i < MAX_TOKENS; i++) token[i] = strtok(NULL, "=,");
- for (i = 0; i < MAX_TOKENS; i++) { + for (i = 0; i < MAX_TOKENS - 1; i++) { if (!token[i]) break; if (strcmp(token[i], "event") == 0) {
resctrl selftests discover system properties via a variety of sysfs files. The MBM and MBA tests need to discover the event and umask with which to configure the performance event used to measure read memory bandwidth. This is done by parsing the contents of /sys/bus/event_source/devices/uncore_imc_<imc instance>/events/cas_count_read Similarly, the resctrl selftests discover the cache size via /sys/bus/cpu/devices/cpu<id>/cache/index<index>/size.
Take care to do bounds checking when using fscanf() to read the contents of files into a string buffer because by default fscanf() assumes arbitrarily long strings. If the file contains more bytes than the array can accommodate then an overflow will occur.
Provide a maximum field width to the conversion specifier to protect against array overflow. The maximum is one less than the array size because string input stores a terminating null byte that is not covered by the maximum field width.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- This makes the code robust against any changes in information read from sysfs. The existing sysfs content fit well into the arrays, thus this is not considered a bugfix.
Changes since V3: - Add Ilpo's Reviewed-by tag.
Changes since V2: - New patch --- tools/testing/selftests/resctrl/resctrl_val.c | 4 ++-- tools/testing/selftests/resctrl/resctrlfs.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index e88d5ca30517..c9dd70ce3ea8 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -159,7 +159,7 @@ static int read_from_imc_dir(char *imc_dir, int count)
return -1; } - if (fscanf(fp, "%s", cas_count_cfg) <= 0) { + if (fscanf(fp, "%1023s", cas_count_cfg) <= 0) { ksft_perror("Could not get iMC cas count read"); fclose(fp);
@@ -177,7 +177,7 @@ static int read_from_imc_dir(char *imc_dir, int count)
return -1; } - if (fscanf(fp, "%s", cas_count_cfg) <= 0) { + if (fscanf(fp, "%1023s", cas_count_cfg) <= 0) { ksft_perror("Could not get iMC cas count write"); fclose(fp);
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 250c320349a7..a53cd1cb6e0c 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -182,7 +182,7 @@ int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size
return -1; } - if (fscanf(fp, "%s", cache_str) <= 0) { + if (fscanf(fp, "%63s", cache_str) <= 0) { ksft_perror("Could not get cache_size"); fclose(fp);
Within mba_setup() the programmed bandwidth delay value starts at the maximum (100, or rather ALLOCATION_MAX) and progresses towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.
The programmed bandwidth delay should never be negative, so representing it with an unsigned int is most appropriate. This may introduce confusion because of the "allocation > ALLOCATION_MAX" check used to check wraparound of the subtraction.
Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN, and incrementally, with ALLOCATION_STEP steps, adjust the bandwidth delay value. This avoids wraparound while making the purpose of "allocation > ALLOCATION_MAX" clear and eliminates the need for the "allocation < ALLOCATION_MIN" check.
Reported-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Closes: https://lore.kernel.org/lkml/1903ac13-5c9c-ef8d-78e0-417ac34a971b@linux.inte... Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V1: - New patch - Add Ilpo's Reviewed-by tag. --- tools/testing/selftests/resctrl/mba_test.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index ab8496a4925b..da40a8ed4413 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test, const struct user_params *uparams, struct resctrl_val_param *p) { - static int runs_per_allocation, allocation = 100; + static unsigned int allocation = ALLOCATION_MIN; + static int runs_per_allocation; char allocation_str[64]; int ret;
@@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test, if (runs_per_allocation++ != 0) return 0;
- if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX) + if (allocation > ALLOCATION_MAX) return END_OF_TESTS;
sprintf(allocation_str, "%d", allocation); @@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test, if (ret < 0) return ret;
- allocation -= ALLOCATION_STEP; + allocation += ALLOCATION_STEP;
return 0; } @@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc) { - int allocation, runs; + unsigned int allocation; bool ret = false; + int runs;
ksft_print_msg("Results are displayed in (MB)\n"); /* Memory bandwidth from 100% down to 10% */ @@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc) avg_diff_per > MAX_DIFF_PERCENT ? "Fail:" : "Pass:", MAX_DIFF_PERCENT, - ALLOCATION_MAX - ALLOCATION_STEP * allocation); + ALLOCATION_MIN + ALLOCATION_STEP * allocation);
ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per); ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);
The CMT, MBM, and MBA tests rely on a benchmark that runs while the test makes changes to needed configuration (for example memory bandwidth allocation) and takes needed measurements. By default the "fill_buf" benchmark is used and by default (via its "once = false" setting) "fill_buf" is configured to run until terminated after the test completes.
An unintended consequence of enabling the user to override the benchmark also enables the user to change parameters to the "fill_buf" benchmark. This enables the user to set "fill_buf" to only cycle through the buffer once (by setting "once = true") and thus breaking the CMT, MBA, and MBM tests that expect workload/interference to be reflected by their measurements.
Prevent user space from changing the "once" parameter and ensure that it is always false for the CMT, MBA, and MBM tests.
Suggested-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V3: - Add Ilpo's Reviewed-by tag.
Changes since V2: - Remove unnecessary assignment to benchmark_cmd[5]. (Ilpo)
Changes since V1: - New patch --- tools/testing/selftests/resctrl/fill_buf.c | 7 ++++--- tools/testing/selftests/resctrl/resctrl.h | 2 +- tools/testing/selftests/resctrl/resctrl_tests.c | 9 +++++++-- tools/testing/selftests/resctrl/resctrl_val.c | 11 +---------- 4 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 34e5df721430..854f0108d8e6 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -151,7 +151,7 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush) return buf; }
-int run_fill_buf(size_t buf_size, int memflush, int op, bool once) +int run_fill_buf(size_t buf_size, int memflush, int op) { unsigned char *buf;
@@ -160,9 +160,10 @@ int run_fill_buf(size_t buf_size, int memflush, int op, bool once) return -1;
if (op == 0) - fill_cache_read(buf, buf_size, once); + fill_cache_read(buf, buf_size, false); else - fill_cache_write(buf, buf_size, once); + fill_cache_write(buf, buf_size, false); + free(buf);
return 0; diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 2dda56084588..51f5f4b25e06 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -142,7 +142,7 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, unsigned char *alloc_buffer(size_t buf_size, int memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); -int run_fill_buf(size_t buf_size, int memflush, int op, bool once); +int run_fill_buf(size_t buf_size, int memflush, int op); int initialize_mem_bw_imc(void); int measure_mem_bw(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid, diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index ecbb7605a981..e7878077883f 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -266,8 +266,13 @@ int main(int argc, char **argv) uparams.benchmark_cmd[1] = span_str; uparams.benchmark_cmd[2] = "1"; uparams.benchmark_cmd[3] = "0"; - uparams.benchmark_cmd[4] = "false"; - uparams.benchmark_cmd[5] = NULL; + /* + * Fourth parameter was previously used to indicate + * how long "fill_buf" should run for, with "false" + * ("fill_buf" will keep running until terminated) + * the only option that works. + */ + uparams.benchmark_cmd[4] = NULL; }
ksft_set_plan(tests); diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index c9dd70ce3ea8..b0f3c594c4da 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -625,7 +625,6 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) int operation, ret, memflush; char **benchmark_cmd; size_t span; - bool once; FILE *fp;
benchmark_cmd = info->si_ptr; @@ -645,16 +644,8 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) span = strtoul(benchmark_cmd[1], NULL, 10); memflush = atoi(benchmark_cmd[2]); operation = atoi(benchmark_cmd[3]); - if (!strcmp(benchmark_cmd[4], "true")) { - once = true; - } else if (!strcmp(benchmark_cmd[4], "false")) { - once = false; - } else { - ksft_print_msg("Invalid once parameter\n"); - parent_exit(ppid); - }
- if (run_fill_buf(span, memflush, operation, once)) + if (run_fill_buf(span, memflush, operation)) fprintf(stderr, "Error in running fill buffer\n"); } else { /* Execute specified benchmark */
The CMT, MBM, and MBA tests rely on a benchmark to generate memory traffic. By default this is the "fill_buf" benchmark that can be replaced via the "-b" command line argument.
The original intent of the "-b" command line parameter was to replace the default "fill_buf" benchmark, but the implementation also exposes an alternative use case where the "fill_buf" parameters itself can be modified. One of the parameters to "fill_buf" is the "operation" that can be either "read" or "write" and indicates whether the "fill_buf" should use "read" or "write" operations on the allocated buffer.
While replacing "fill_buf" default parameters is technically possible, replacing the default "read" parameter with "write" is not supported because the MBA and MBM tests only measure "read" operations. The "read" operation is also most appropriate for the CMT test that aims to use the benchmark to allocate into the cache.
Avoid any potential inconsistencies between test and measurement by removing code for unsupported "write" operations to the buffer. Ignore any attempt from user space to enable this unsupported test configuration, instead always use read operations.
Keep the initialization of the, now unused, "fill_buf" parameters to reserve these parameter positions since it has been exposed as an API. Future parameter additions cannot use these parameter positions.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V3: - Add Ilpo's Reviewed-by tag.
Changes since V2: - Update changelog to justify keeping the assignment to benchmark_cmd[4]. (Ilpo)
Changes since V1: - New patch. --- tools/testing/selftests/resctrl/fill_buf.c | 28 ++----------------- tools/testing/selftests/resctrl/resctrl.h | 2 +- .../testing/selftests/resctrl/resctrl_tests.c | 5 +++- tools/testing/selftests/resctrl/resctrl_val.c | 5 ++-- 4 files changed, 9 insertions(+), 31 deletions(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 854f0108d8e6..e4f1cea317f1 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -88,18 +88,6 @@ static int fill_one_span_read(unsigned char *buf, size_t buf_size) return sum; }
-static void fill_one_span_write(unsigned char *buf, size_t buf_size) -{ - unsigned char *end_ptr = buf + buf_size; - unsigned char *p; - - p = buf; - while (p < end_ptr) { - *p = '1'; - p += (CL_SIZE / 2); - } -} - void fill_cache_read(unsigned char *buf, size_t buf_size, bool once) { int ret = 0; @@ -114,15 +102,6 @@ void fill_cache_read(unsigned char *buf, size_t buf_size, bool once) *value_sink = ret; }
-static void fill_cache_write(unsigned char *buf, size_t buf_size, bool once) -{ - while (1) { - fill_one_span_write(buf, buf_size); - if (once) - break; - } -} - unsigned char *alloc_buffer(size_t buf_size, int memflush) { void *buf = NULL; @@ -151,7 +130,7 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush) return buf; }
-int run_fill_buf(size_t buf_size, int memflush, int op) +int run_fill_buf(size_t buf_size, int memflush) { unsigned char *buf;
@@ -159,10 +138,7 @@ int run_fill_buf(size_t buf_size, int memflush, int op) if (!buf) return -1;
- if (op == 0) - fill_cache_read(buf, buf_size, false); - else - fill_cache_write(buf, buf_size, false); + fill_cache_read(buf, buf_size, false);
free(buf);
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 51f5f4b25e06..ba1ce1b35699 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -142,7 +142,7 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, unsigned char *alloc_buffer(size_t buf_size, int memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); -int run_fill_buf(size_t buf_size, int memflush, int op); +int run_fill_buf(size_t buf_size, int memflush); int initialize_mem_bw_imc(void); int measure_mem_bw(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid, diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index e7878077883f..0f91c475b255 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -265,13 +265,16 @@ int main(int argc, char **argv) ksft_exit_fail_msg("Out of memory!\n"); uparams.benchmark_cmd[1] = span_str; uparams.benchmark_cmd[2] = "1"; - uparams.benchmark_cmd[3] = "0"; /* + * Third parameter was previously used for "operation" + * (read/write) of which only (now default) "read"/"0" + * works. * Fourth parameter was previously used to indicate * how long "fill_buf" should run for, with "false" * ("fill_buf" will keep running until terminated) * the only option that works. */ + uparams.benchmark_cmd[3] = NULL; uparams.benchmark_cmd[4] = NULL; }
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index b0f3c594c4da..113ca18d67c1 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -622,8 +622,8 @@ int measure_mem_bw(const struct user_params *uparams, */ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) { - int operation, ret, memflush; char **benchmark_cmd; + int ret, memflush; size_t span; FILE *fp;
@@ -643,9 +643,8 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) /* Execute default fill_buf benchmark */ span = strtoul(benchmark_cmd[1], NULL, 10); memflush = atoi(benchmark_cmd[2]); - operation = atoi(benchmark_cmd[3]);
- if (run_fill_buf(span, memflush, operation)) + if (run_fill_buf(span, memflush)) fprintf(stderr, "Error in running fill buffer\n"); } else { /* Execute specified benchmark */
The MBM and MBA resctrl selftests run a benchmark during which it takes measurements of read memory bandwidth via perf. Code exists to support measurements of write memory bandwidth but there exists no path with which this code can execute.
While code exists for write memory bandwidth measurement there has not yet been a use case for it. Remove this unused code. Rename relevant functions to include "read" so that it is clear that it relates only to memory bandwidth reads, while renaming the functions also add consistency by changing the "membw" instances to more prevalent "mem_bw".
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V2: - Add Ilpo's Reviewed-by tag.
Changes since V1: - New patch. --- tools/testing/selftests/resctrl/mba_test.c | 4 +- tools/testing/selftests/resctrl/mbm_test.c | 4 +- tools/testing/selftests/resctrl/resctrl.h | 8 +- tools/testing/selftests/resctrl/resctrl_val.c | 234 ++++++------------ tools/testing/selftests/resctrl/resctrlfs.c | 17 -- 5 files changed, 85 insertions(+), 182 deletions(-)
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index da40a8ed4413..be0ead73e55d 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -21,7 +21,7 @@ static int mba_init(const struct resctrl_val_param *param, int domain_id) { int ret;
- ret = initialize_mem_bw_imc(); + ret = initialize_read_mem_bw_imc(); if (ret) return ret;
@@ -68,7 +68,7 @@ static int mba_setup(const struct resctrl_test *test, static int mba_measure(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid) { - return measure_mem_bw(uparams, param, bm_pid, "reads"); + return measure_read_mem_bw(uparams, param, bm_pid); }
static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc) diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index cf08ba5e314e..defa94293915 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -91,7 +91,7 @@ static int mbm_init(const struct resctrl_val_param *param, int domain_id) { int ret;
- ret = initialize_mem_bw_imc(); + ret = initialize_read_mem_bw_imc(); if (ret) return ret;
@@ -122,7 +122,7 @@ static int mbm_setup(const struct resctrl_test *test, static int mbm_measure(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid) { - return measure_mem_bw(uparams, param, bm_pid, "reads"); + return measure_read_mem_bw(uparams, param, bm_pid); }
static void mbm_test_cleanup(void) diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index ba1ce1b35699..82801245e4c1 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -126,7 +126,6 @@ int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); int mount_resctrlfs(void); int umount_resctrlfs(void); -const char *get_bw_report_type(const char *bw_report); bool resctrl_resource_exists(const char *resource); bool resctrl_mon_feature_exists(const char *resource, const char *feature); bool resource_info_file_exists(const char *resource, const char *file); @@ -143,10 +142,9 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); int run_fill_buf(size_t buf_size, int memflush); -int initialize_mem_bw_imc(void); -int measure_mem_bw(const struct user_params *uparams, - struct resctrl_val_param *param, pid_t bm_pid, - const char *bw_report); +int initialize_read_mem_bw_imc(void); +int measure_read_mem_bw(const struct user_params *uparams, + struct resctrl_val_param *param, pid_t bm_pid); void initialize_mem_bw_resctrl(const struct resctrl_val_param *param, int domain_id); int resctrl_val(const struct resctrl_test *test, diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 113ca18d67c1..c4ebf70a46ef 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -12,13 +12,10 @@
#define UNCORE_IMC "uncore_imc" #define READ_FILE_NAME "events/cas_count_read" -#define WRITE_FILE_NAME "events/cas_count_write" #define DYN_PMU_PATH "/sys/bus/event_source/devices" #define SCALE 0.00006103515625 #define MAX_IMCS 20 #define MAX_TOKENS 5 -#define READ 0 -#define WRITE 1
#define CON_MBM_LOCAL_BYTES_PATH \ "%s/%s/mon_data/mon_L3_%02d/mbm_local_bytes" @@ -41,44 +38,43 @@ struct imc_counter_config {
static char mbm_total_path[1024]; static int imcs; -static struct imc_counter_config imc_counters_config[MAX_IMCS][2]; +static struct imc_counter_config imc_counters_config[MAX_IMCS]; static const struct resctrl_test *current_test;
-static void membw_initialize_perf_event_attr(int i, int j) +static void read_mem_bw_initialize_perf_event_attr(int i) { - memset(&imc_counters_config[i][j].pe, 0, + memset(&imc_counters_config[i].pe, 0, sizeof(struct perf_event_attr)); - imc_counters_config[i][j].pe.type = imc_counters_config[i][j].type; - imc_counters_config[i][j].pe.size = sizeof(struct perf_event_attr); - imc_counters_config[i][j].pe.disabled = 1; - imc_counters_config[i][j].pe.inherit = 1; - imc_counters_config[i][j].pe.exclude_guest = 0; - imc_counters_config[i][j].pe.config = - imc_counters_config[i][j].umask << 8 | - imc_counters_config[i][j].event; - imc_counters_config[i][j].pe.sample_type = PERF_SAMPLE_IDENTIFIER; - imc_counters_config[i][j].pe.read_format = + imc_counters_config[i].pe.type = imc_counters_config[i].type; + imc_counters_config[i].pe.size = sizeof(struct perf_event_attr); + imc_counters_config[i].pe.disabled = 1; + imc_counters_config[i].pe.inherit = 1; + imc_counters_config[i].pe.exclude_guest = 0; + imc_counters_config[i].pe.config = + imc_counters_config[i].umask << 8 | + imc_counters_config[i].event; + imc_counters_config[i].pe.sample_type = PERF_SAMPLE_IDENTIFIER; + imc_counters_config[i].pe.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING; }
-static void membw_ioctl_perf_event_ioc_reset_enable(int i, int j) +static void read_mem_bw_ioctl_perf_event_ioc_reset_enable(int i) { - ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_RESET, 0); - ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_ENABLE, 0); + ioctl(imc_counters_config[i].fd, PERF_EVENT_IOC_RESET, 0); + ioctl(imc_counters_config[i].fd, PERF_EVENT_IOC_ENABLE, 0); }
-static void membw_ioctl_perf_event_ioc_disable(int i, int j) +static void read_mem_bw_ioctl_perf_event_ioc_disable(int i) { - ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_DISABLE, 0); + ioctl(imc_counters_config[i].fd, PERF_EVENT_IOC_DISABLE, 0); }
/* - * get_event_and_umask: Parse config into event and umask + * get_read_event_and_umask: Parse config into event and umask * @cas_count_cfg: Config * @count: iMC number - * @op: Operation (read/write) */ -static void get_event_and_umask(char *cas_count_cfg, int count, bool op) +static void get_read_event_and_umask(char *cas_count_cfg, int count) { char *token[MAX_TOKENS]; int i = 0; @@ -91,34 +87,22 @@ static void get_event_and_umask(char *cas_count_cfg, int count, bool op) for (i = 0; i < MAX_TOKENS - 1; i++) { if (!token[i]) break; - if (strcmp(token[i], "event") == 0) { - if (op == READ) - imc_counters_config[count][READ].event = - strtol(token[i + 1], NULL, 16); - else - imc_counters_config[count][WRITE].event = - strtol(token[i + 1], NULL, 16); - } - if (strcmp(token[i], "umask") == 0) { - if (op == READ) - imc_counters_config[count][READ].umask = - strtol(token[i + 1], NULL, 16); - else - imc_counters_config[count][WRITE].umask = - strtol(token[i + 1], NULL, 16); - } + if (strcmp(token[i], "event") == 0) + imc_counters_config[count].event = strtol(token[i + 1], NULL, 16); + if (strcmp(token[i], "umask") == 0) + imc_counters_config[count].umask = strtol(token[i + 1], NULL, 16); } }
-static int open_perf_event(int i, int cpu_no, int j) +static int open_perf_read_event(int i, int cpu_no) { - imc_counters_config[i][j].fd = - perf_event_open(&imc_counters_config[i][j].pe, -1, cpu_no, -1, + imc_counters_config[i].fd = + perf_event_open(&imc_counters_config[i].pe, -1, cpu_no, -1, PERF_FLAG_FD_CLOEXEC);
- if (imc_counters_config[i][j].fd == -1) { + if (imc_counters_config[i].fd == -1) { fprintf(stderr, "Error opening leader %llx\n", - imc_counters_config[i][j].pe.config); + imc_counters_config[i].pe.config);
return -1; } @@ -126,7 +110,7 @@ static int open_perf_event(int i, int cpu_no, int j) return 0; }
-/* Get type and config (read and write) of an iMC counter */ +/* Get type and config of an iMC counter's read event. */ static int read_from_imc_dir(char *imc_dir, int count) { char cas_count_cfg[1024], imc_counter_cfg[1024], imc_counter_type[1024]; @@ -140,7 +124,7 @@ static int read_from_imc_dir(char *imc_dir, int count)
return -1; } - if (fscanf(fp, "%u", &imc_counters_config[count][READ].type) <= 0) { + if (fscanf(fp, "%u", &imc_counters_config[count].type) <= 0) { ksft_perror("Could not get iMC type"); fclose(fp);
@@ -148,9 +132,6 @@ static int read_from_imc_dir(char *imc_dir, int count) } fclose(fp);
- imc_counters_config[count][WRITE].type = - imc_counters_config[count][READ].type; - /* Get read config */ sprintf(imc_counter_cfg, "%s%s", imc_dir, READ_FILE_NAME); fp = fopen(imc_counter_cfg, "r"); @@ -167,34 +148,19 @@ static int read_from_imc_dir(char *imc_dir, int count) } fclose(fp);
- get_event_and_umask(cas_count_cfg, count, READ); - - /* Get write config */ - sprintf(imc_counter_cfg, "%s%s", imc_dir, WRITE_FILE_NAME); - fp = fopen(imc_counter_cfg, "r"); - if (!fp) { - ksft_perror("Failed to open iMC config file"); - - return -1; - } - if (fscanf(fp, "%1023s", cas_count_cfg) <= 0) { - ksft_perror("Could not get iMC cas count write"); - fclose(fp); - - return -1; - } - fclose(fp); - - get_event_and_umask(cas_count_cfg, count, WRITE); + get_read_event_and_umask(cas_count_cfg, count);
return 0; }
/* * A system can have 'n' number of iMC (Integrated Memory Controller) - * counters, get that 'n'. For each iMC counter get it's type and config. - * Also, each counter has two configs, one for read and the other for write. - * A config again has two parts, event and umask. + * counters, get that 'n'. Discover the properties of the available + * counters in support of needed performance measurement via perf. + * For each iMC counter get it's type and config. Also obtain each + * counter's event and umask for the memory read events that will be + * measured. + * * Enumerate all these details into an array of structures. * * Return: >= 0 on success. < 0 on failure. @@ -255,55 +221,46 @@ static int num_of_imcs(void) return count; }
-int initialize_mem_bw_imc(void) +int initialize_read_mem_bw_imc(void) { - int imc, j; + int imc;
imcs = num_of_imcs(); if (imcs <= 0) return imcs;
/* Initialize perf_event_attr structures for all iMC's */ - for (imc = 0; imc < imcs; imc++) { - for (j = 0; j < 2; j++) - membw_initialize_perf_event_attr(imc, j); - } + for (imc = 0; imc < imcs; imc++) + read_mem_bw_initialize_perf_event_attr(imc);
return 0; }
-static void perf_close_imc_mem_bw(void) +static void perf_close_imc_read_mem_bw(void) { int mc;
for (mc = 0; mc < imcs; mc++) { - if (imc_counters_config[mc][READ].fd != -1) - close(imc_counters_config[mc][READ].fd); - if (imc_counters_config[mc][WRITE].fd != -1) - close(imc_counters_config[mc][WRITE].fd); + if (imc_counters_config[mc].fd != -1) + close(imc_counters_config[mc].fd); } }
/* - * perf_open_imc_mem_bw - Open perf fds for IMCs + * perf_open_imc_read_mem_bw - Open perf fds for IMCs * @cpu_no: CPU number that the benchmark PID is bound to * * Return: = 0 on success. < 0 on failure. */ -static int perf_open_imc_mem_bw(int cpu_no) +static int perf_open_imc_read_mem_bw(int cpu_no) { int imc, ret;
- for (imc = 0; imc < imcs; imc++) { - imc_counters_config[imc][READ].fd = -1; - imc_counters_config[imc][WRITE].fd = -1; - } + for (imc = 0; imc < imcs; imc++) + imc_counters_config[imc].fd = -1;
for (imc = 0; imc < imcs; imc++) { - ret = open_perf_event(imc, cpu_no, READ); - if (ret) - goto close_fds; - ret = open_perf_event(imc, cpu_no, WRITE); + ret = open_perf_read_event(imc, cpu_no); if (ret) goto close_fds; } @@ -311,60 +268,52 @@ static int perf_open_imc_mem_bw(int cpu_no) return 0;
close_fds: - perf_close_imc_mem_bw(); + perf_close_imc_read_mem_bw(); return -1; }
/* - * do_mem_bw_test - Perform memory bandwidth test + * do_imc_read_mem_bw_test - Perform memory bandwidth test * * Runs memory bandwidth test over one second period. Also, handles starting * and stopping of the IMC perf counters around the test. */ -static void do_imc_mem_bw_test(void) +static void do_imc_read_mem_bw_test(void) { int imc;
- for (imc = 0; imc < imcs; imc++) { - membw_ioctl_perf_event_ioc_reset_enable(imc, READ); - membw_ioctl_perf_event_ioc_reset_enable(imc, WRITE); - } + for (imc = 0; imc < imcs; imc++) + read_mem_bw_ioctl_perf_event_ioc_reset_enable(imc);
sleep(1);
- /* Stop counters after a second to get results (both read and write) */ - for (imc = 0; imc < imcs; imc++) { - membw_ioctl_perf_event_ioc_disable(imc, READ); - membw_ioctl_perf_event_ioc_disable(imc, WRITE); - } + /* Stop counters after a second to get results. */ + for (imc = 0; imc < imcs; imc++) + read_mem_bw_ioctl_perf_event_ioc_disable(imc); }
/* - * get_mem_bw_imc - Memory bandwidth as reported by iMC counters - * @bw_report: Bandwidth report type (reads, writes) + * get_read_mem_bw_imc - Memory read bandwidth as reported by iMC counters * - * Memory bandwidth utilized by a process on a socket can be calculated - * using iMC counters. Perf events are used to read these counters. + * Memory read bandwidth utilized by a process on a socket can be calculated + * using iMC counters' read events. Perf events are used to read these + * counters. * * Return: = 0 on success. < 0 on failure. */ -static int get_mem_bw_imc(const char *bw_report, float *bw_imc) +static int get_read_mem_bw_imc(float *bw_imc) { - float reads, writes, of_mul_read, of_mul_write; + float reads = 0, of_mul_read = 1; int imc;
- /* Start all iMC counters to log values (both read and write) */ - reads = 0, writes = 0, of_mul_read = 1, of_mul_write = 1; - /* - * Get results which are stored in struct type imc_counter_config + * Log read event values from all iMC counters into + * struct imc_counter_config. * Take overflow into consideration before calculating total bandwidth. */ for (imc = 0; imc < imcs; imc++) { struct imc_counter_config *r = - &imc_counters_config[imc][READ]; - struct imc_counter_config *w = - &imc_counters_config[imc][WRITE]; + &imc_counters_config[imc];
if (read(r->fd, &r->return_value, sizeof(struct membw_read_format)) == -1) { @@ -372,12 +321,6 @@ static int get_mem_bw_imc(const char *bw_report, float *bw_imc) return -1; }
- if (read(w->fd, &w->return_value, - sizeof(struct membw_read_format)) == -1) { - ksft_perror("Couldn't get write bandwidth through iMC"); - return -1; - } - __u64 r_time_enabled = r->return_value.time_enabled; __u64 r_time_running = r->return_value.time_running;
@@ -385,27 +328,10 @@ static int get_mem_bw_imc(const char *bw_report, float *bw_imc) of_mul_read = (float)r_time_enabled / (float)r_time_running;
- __u64 w_time_enabled = w->return_value.time_enabled; - __u64 w_time_running = w->return_value.time_running; - - if (w_time_enabled != w_time_running) - of_mul_write = (float)w_time_enabled / - (float)w_time_running; reads += r->return_value.value * of_mul_read * SCALE; - writes += w->return_value.value * of_mul_write * SCALE; - } - - if (strcmp(bw_report, "reads") == 0) { - *bw_imc = reads; - return 0; - } - - if (strcmp(bw_report, "writes") == 0) { - *bw_imc = writes; - return 0; }
- *bw_imc = reads + writes; + *bw_imc = reads; return 0; }
@@ -551,35 +477,31 @@ static int print_results_bw(char *filename, pid_t bm_pid, float bw_imc, }
/* - * measure_mem_bw - Measures memory bandwidth numbers while benchmark runs + * measure_read_mem_bw - Measures read memory bandwidth numbers while benchmark runs * @uparams: User supplied parameters * @param: Parameters passed to resctrl_val() * @bm_pid: PID that runs the benchmark - * @bw_report: Bandwidth report type (reads, writes) * * Measure memory bandwidth from resctrl and from another source which is * perf imc value or could be something else if perf imc event is not * available. Compare the two values to validate resctrl value. It takes * 1 sec to measure the data. + * resctrl does not distinguish between read and write operations so + * its data includes all memory operations. */ -int measure_mem_bw(const struct user_params *uparams, - struct resctrl_val_param *param, pid_t bm_pid, - const char *bw_report) +int measure_read_mem_bw(const struct user_params *uparams, + struct resctrl_val_param *param, pid_t bm_pid) { unsigned long bw_resc, bw_resc_start, bw_resc_end; FILE *mem_bw_fp; float bw_imc; int ret;
- bw_report = get_bw_report_type(bw_report); - if (!bw_report) - return -1; - mem_bw_fp = open_mem_bw_resctrl(mbm_total_path); if (!mem_bw_fp) return -1;
- ret = perf_open_imc_mem_bw(uparams->cpu); + ret = perf_open_imc_read_mem_bw(uparams->cpu); if (ret < 0) goto close_fp;
@@ -589,17 +511,17 @@ int measure_mem_bw(const struct user_params *uparams,
rewind(mem_bw_fp);
- do_imc_mem_bw_test(); + do_imc_read_mem_bw_test();
ret = get_mem_bw_resctrl(mem_bw_fp, &bw_resc_end); if (ret < 0) goto close_imc;
- ret = get_mem_bw_imc(bw_report, &bw_imc); + ret = get_read_mem_bw_imc(&bw_imc); if (ret < 0) goto close_imc;
- perf_close_imc_mem_bw(); + perf_close_imc_read_mem_bw(); fclose(mem_bw_fp);
bw_resc = (bw_resc_end - bw_resc_start) / MB; @@ -607,7 +529,7 @@ int measure_mem_bw(const struct user_params *uparams, return print_results_bw(param->filename, bm_pid, bw_imc, bw_resc);
close_imc: - perf_close_imc_mem_bw(); + perf_close_imc_read_mem_bw(); close_fp: fclose(mem_bw_fp); return ret; diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index a53cd1cb6e0c..d38d6dd90be4 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -831,23 +831,6 @@ int filter_dmesg(void) return 0; }
-const char *get_bw_report_type(const char *bw_report) -{ - if (strcmp(bw_report, "reads") == 0) - return bw_report; - if (strcmp(bw_report, "writes") == 0) - return bw_report; - if (strcmp(bw_report, "nt-writes") == 0) { - return "writes"; - } - if (strcmp(bw_report, "total") == 0) - return bw_report; - - fprintf(stderr, "Requested iMC bandwidth report type unavailable\n"); - - return NULL; -} - int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags) {
The benchmark used during the CMT, MBM, and MBA tests can be provided by the user via (-b) parameter, if not provided the default "fill_buf" benchmark is used. The user is additionally able to override any of the "fill_buf" default parameters when running the tests with "-b fill_buf <fill_buf parameters>".
The "fill_buf" parameters are managed as an array of strings. Using an array of strings is complex because it requires transformations to/from strings at every producer and consumer. This is made worse for the individual tests where the default benchmark parameters values may not be appropriate and additional data wrangling is required. For example, the CMT test duplicates the entire array of strings in order to replace one of the parameters.
More issues appear when combining the usage of an array of strings with the use case of user overriding default parameters by specifying "-b fill_buf <parameters>". This use case is fragile with opportunities to trigger a SIGSEGV because of opportunities for NULL pointers to exist in the array of strings. For example, by running below (thus by specifying "fill_buf" should be used but all parameters are NULL): $ sudo resctrl_tests -t mbm -b fill_buf
Replace the "array of strings" parameters used for "fill_buf" with new struct fill_buf_param that contains the "fill_buf" parameters that can be used directly without transformations to/from strings. Two instances of struct fill_buf_param may exist at any point in time: * If the user provides new parameters to "fill_buf", the user parameter structure (struct user_params) will point to a fully initialized and immutable struct fill_buf_param containing the user provided parameters. * If "fill_buf" is the benchmark that should be used by a test, then the test parameter structure (struct resctrl_val_param) will point to a fully initialized struct fill_buf_param. The latter may contain (a) the user provided parameters verbatim, (b) user provided parameters adjusted to be appropriate for the test, or (c) the default parameters for "fill_buf" that is appropriate for the test if the user did not provide "fill_buf" parameters nor an alternate benchmark.
The existing behavior of CMT test is to use test defined value for the buffer size even if the user provides another value via command line. This behavior is maintained since the test requires that the buffer size matches the size of the cache allocated, and the amount of cache allocated can instead be changed by the user with the "-n" command line parameter.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com --- Changes since V3: - Handle empty string input. (Ilpo)
Changes since V2: - Use empty initializers. (Ilpo) - Let memflush be bool instead of int. (Ilpo) - Make user input checks more robust. (Ilpo) - Assign values as part of local variable definition. (Ilpo)
Changes since V1: - Maintain original behavior where user can override "fill_buf" parameters via command line ... but only those that can actually be changed. (Ilpo) - Fix parsing issues associated with original behavior to ensure any parameter is valid before any attempt to use it. - Move patch earlier in series to highlight that this fixes existing issues. - Make struct fill_buf_param dynamic to support user provided parameters as well as test specific parameters. - Rewrite changelog. --- tools/testing/selftests/resctrl/cmt_test.c | 32 ++---- tools/testing/selftests/resctrl/fill_buf.c | 4 +- tools/testing/selftests/resctrl/mba_test.c | 13 ++- tools/testing/selftests/resctrl/mbm_test.c | 22 ++-- tools/testing/selftests/resctrl/resctrl.h | 59 +++++++--- .../testing/selftests/resctrl/resctrl_tests.c | 103 ++++++++++++++---- tools/testing/selftests/resctrl/resctrl_val.c | 41 ++++--- 7 files changed, 178 insertions(+), 96 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c index 0c045080d808..4c3cf2c25a38 100644 --- a/tools/testing/selftests/resctrl/cmt_test.c +++ b/tools/testing/selftests/resctrl/cmt_test.c @@ -116,15 +116,13 @@ static void cmt_test_cleanup(void)
static int cmt_run_test(const struct resctrl_test *test, const struct user_params *uparams) { - const char * const *cmd = uparams->benchmark_cmd; - const char *new_cmd[BENCHMARK_ARGS]; + struct fill_buf_param fill_buf = {}; unsigned long cache_total_size = 0; int n = uparams->bits ? : 5; unsigned long long_mask; - char *span_str = NULL; int count_of_bits; size_t span; - int ret, i; + int ret;
ret = get_full_cbm("L3", &long_mask); if (ret) @@ -155,32 +153,26 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
span = cache_portion_size(cache_total_size, param.mask, long_mask);
- if (strcmp(cmd[0], "fill_buf") == 0) { - /* Duplicate the command to be able to replace span in it */ - for (i = 0; uparams->benchmark_cmd[i]; i++) - new_cmd[i] = uparams->benchmark_cmd[i]; - new_cmd[i] = NULL; - - ret = asprintf(&span_str, "%zu", span); - if (ret < 0) - return -1; - new_cmd[1] = span_str; - cmd = new_cmd; + if (uparams->fill_buf) { + fill_buf.buf_size = span; + fill_buf.memflush = uparams->fill_buf->memflush; + param.fill_buf = &fill_buf; + } else if (!uparams->benchmark_cmd[0]) { + fill_buf.buf_size = span; + fill_buf.memflush = true; + param.fill_buf = &fill_buf; }
remove(RESULT_FILE_NAME);
- ret = resctrl_val(test, uparams, cmd, ¶m); + ret = resctrl_val(test, uparams, ¶m); if (ret) - goto out; + return ret;
ret = check_results(¶m, span, n); if (ret && (get_vendor() == ARCH_INTEL)) ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
-out: - free(span_str); - return ret; }
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index e4f1cea317f1..39545f9369e8 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -102,7 +102,7 @@ void fill_cache_read(unsigned char *buf, size_t buf_size, bool once) *value_sink = ret; }
-unsigned char *alloc_buffer(size_t buf_size, int memflush) +unsigned char *alloc_buffer(size_t buf_size, bool memflush) { void *buf = NULL; uint64_t *p64; @@ -130,7 +130,7 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush) return buf; }
-int run_fill_buf(size_t buf_size, int memflush) +int run_fill_buf(size_t buf_size, bool memflush) { unsigned char *buf;
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index be0ead73e55d..74d95c460bd0 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -172,11 +172,22 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param .setup = mba_setup, .measure = mba_measure, }; + struct fill_buf_param fill_buf = {}; int ret;
remove(RESULT_FILE_NAME);
- ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m); + if (uparams->fill_buf) { + fill_buf.buf_size = uparams->fill_buf->buf_size; + fill_buf.memflush = uparams->fill_buf->memflush; + param.fill_buf = &fill_buf; + } else if (!uparams->benchmark_cmd[0]) { + fill_buf.buf_size = DEFAULT_SPAN; + fill_buf.memflush = true; + param.fill_buf = &fill_buf; + } + + ret = resctrl_val(test, uparams, ¶m); if (ret) return ret;
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index defa94293915..72261413c868 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -139,26 +139,26 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param .setup = mbm_setup, .measure = mbm_measure, }; - char *endptr = NULL; - size_t span = 0; + struct fill_buf_param fill_buf = {}; int ret;
remove(RESULT_FILE_NAME);
- if (uparams->benchmark_cmd[0] && strcmp(uparams->benchmark_cmd[0], "fill_buf") == 0) { - if (uparams->benchmark_cmd[1] && *uparams->benchmark_cmd[1] != '\0') { - errno = 0; - span = strtoul(uparams->benchmark_cmd[1], &endptr, 10); - if (errno || *endptr != '\0') - return -EINVAL; - } + if (uparams->fill_buf) { + fill_buf.buf_size = uparams->fill_buf->buf_size; + fill_buf.memflush = uparams->fill_buf->memflush; + param.fill_buf = &fill_buf; + } else if (!uparams->benchmark_cmd[0]) { + fill_buf.buf_size = DEFAULT_SPAN; + fill_buf.memflush = true; + param.fill_buf = &fill_buf; }
- ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m); + ret = resctrl_val(test, uparams, ¶m); if (ret) return ret;
- ret = check_results(span); + ret = check_results(param.fill_buf ? param.fill_buf->buf_size : 0); if (ret && (get_vendor() == ARCH_INTEL)) ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 82801245e4c1..c9336f9c2cae 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -43,16 +43,36 @@
#define DEFAULT_SPAN (250 * MB)
+/* + * fill_buf_param: "fill_buf" benchmark parameters + * @buf_size: Size (in bytes) of buffer used in benchmark. + * "fill_buf" allocates and initializes buffer of + * @buf_size. User can change value via command line. + * @memflush: If false the buffer will not be flushed after + * allocation and initialization, otherwise the + * buffer will be flushed. User can change value via + * command line (via integers with 0 interpreted as + * false and anything else as true). + */ +struct fill_buf_param { + size_t buf_size; + bool memflush; +}; + /* * user_params: User supplied parameters * @cpu: CPU number to which the benchmark will be bound to * @bits: Number of bits used for cache allocation size * @benchmark_cmd: Benchmark command to run during (some of the) tests + * @fill_buf: Pointer to user provided parameters for "fill_buf", + * NULL if user did not provide parameters and test + * specific defaults should be used. */ struct user_params { int cpu; int bits; const char *benchmark_cmd[BENCHMARK_ARGS]; + const struct fill_buf_param *fill_buf; };
/* @@ -87,21 +107,29 @@ struct resctrl_test { * @init: Callback function to initialize test environment * @setup: Callback function to setup per test run environment * @measure: Callback that performs the measurement (a single test) + * @fill_buf: Parameters for default "fill_buf" benchmark. + * Initialized with user provided parameters, possibly + * adapted to be relevant to the test. If user does + * not provide parameters for "fill_buf" nor a + * replacement benchmark then initialized with defaults + * appropriate for test. NULL if user provided + * benchmark. */ struct resctrl_val_param { - const char *ctrlgrp; - const char *mongrp; - char filename[64]; - unsigned long mask; - int num_of_runs; - int (*init)(const struct resctrl_val_param *param, - int domain_id); - int (*setup)(const struct resctrl_test *test, - const struct user_params *uparams, - struct resctrl_val_param *param); - int (*measure)(const struct user_params *uparams, - struct resctrl_val_param *param, - pid_t bm_pid); + const char *ctrlgrp; + const char *mongrp; + char filename[64]; + unsigned long mask; + int num_of_runs; + int (*init)(const struct resctrl_val_param *param, + int domain_id); + int (*setup)(const struct resctrl_test *test, + const struct user_params *uparams, + struct resctrl_val_param *param); + int (*measure)(const struct user_params *uparams, + struct resctrl_val_param *param, + pid_t bm_pid); + struct fill_buf_param *fill_buf; };
struct perf_event_read { @@ -138,10 +166,10 @@ int write_schemata(const char *ctrlgrp, char *schemata, int cpu_no, int write_bm_pid_to_resctrl(pid_t bm_pid, const char *ctrlgrp, const char *mongrp); int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags); -unsigned char *alloc_buffer(size_t buf_size, int memflush); +unsigned char *alloc_buffer(size_t buf_size, bool memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); -int run_fill_buf(size_t buf_size, int memflush); +int run_fill_buf(size_t buf_size, bool memflush); int initialize_read_mem_bw_imc(void); int measure_read_mem_bw(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid); @@ -149,7 +177,6 @@ void initialize_mem_bw_resctrl(const struct resctrl_val_param *param, int domain_id); int resctrl_val(const struct resctrl_test *test, const struct user_params *uparams, - const char * const *benchmark_cmd, struct resctrl_val_param *param); unsigned long create_bit_mask(unsigned int start, unsigned int len); unsigned int count_contiguous_bits(unsigned long val, unsigned int *start); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 0f91c475b255..24daf76b4039 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -148,6 +148,78 @@ static void run_single_test(const struct resctrl_test *test, const struct user_p test_cleanup(test); }
+/* + * Allocate and initialize a struct fill_buf_param with user provided + * (via "-b fill_buf <fill_buf parameters>") parameters. + * + * Use defaults (that may not be appropriate for all tests) for any + * fill_buf parameters omitted by the user. + * + * Historically it may have been possible for user space to provide + * additional parameters, "operation" ("read" vs "write") in + * benchmark_cmd[3] and "once" (run "once" or until terminated) in + * benchmark_cmd[4]. Changing these parameters have never been + * supported with the default of "read" operation and running until + * terminated built into the tests. Any unsupported values for + * (original) "fill_buf" parameters are treated as failure. + * + * Return: On failure, forcibly exits the test on any parsing failure, + * returns NULL if no parsing needed (user did not actually provide + * "-b fill_buf"). + * On success, returns pointer to newly allocated and fully + * initialized struct fill_buf_param that caller must free. + */ +static struct fill_buf_param *alloc_fill_buf_param(struct user_params *uparams) +{ + struct fill_buf_param *fill_param = NULL; + char *endptr = NULL; + + if (!uparams->benchmark_cmd[0] || strcmp(uparams->benchmark_cmd[0], "fill_buf")) + return NULL; + + fill_param = malloc(sizeof(*fill_param)); + if (!fill_param) + ksft_exit_skip("Unable to allocate memory for fill_buf parameters.\n"); + + if (uparams->benchmark_cmd[1] && *uparams->benchmark_cmd[1] != '\0') { + errno = 0; + fill_param->buf_size = strtoul(uparams->benchmark_cmd[1], &endptr, 10); + if (errno || *endptr != '\0') { + free(fill_param); + ksft_exit_skip("Unable to parse benchmark buffer size.\n"); + } + } else { + fill_param->buf_size = DEFAULT_SPAN; + } + + if (uparams->benchmark_cmd[2] && *uparams->benchmark_cmd[2] != '\0') { + errno = 0; + fill_param->memflush = strtol(uparams->benchmark_cmd[2], &endptr, 10) != 0; + if (errno || *endptr != '\0') { + free(fill_param); + ksft_exit_skip("Unable to parse benchmark memflush parameter.\n"); + } + } else { + fill_param->memflush = true; + } + + if (uparams->benchmark_cmd[3] && *uparams->benchmark_cmd[3] != '\0') { + if (strcmp(uparams->benchmark_cmd[3], "0")) { + free(fill_param); + ksft_exit_skip("Only read operations supported.\n"); + } + } + + if (uparams->benchmark_cmd[4] && *uparams->benchmark_cmd[4] != '\0') { + if (strcmp(uparams->benchmark_cmd[4], "false")) { + free(fill_param); + ksft_exit_skip("fill_buf is required to run until termination.\n"); + } + } + + return fill_param; +} + static void init_user_params(struct user_params *uparams) { memset(uparams, 0, sizeof(*uparams)); @@ -158,11 +230,11 @@ static void init_user_params(struct user_params *uparams)
int main(int argc, char **argv) { + struct fill_buf_param *fill_param = NULL; int tests = ARRAY_SIZE(resctrl_tests); bool test_param_seen = false; struct user_params uparams; - char *span_str = NULL; - int ret, c, i; + int c, i;
init_user_params(&uparams);
@@ -239,6 +311,10 @@ int main(int argc, char **argv) } last_arg:
+ fill_param = alloc_fill_buf_param(&uparams); + if (fill_param) + uparams.fill_buf = fill_param; + ksft_print_header();
/* @@ -257,32 +333,11 @@ int main(int argc, char **argv)
filter_dmesg();
- if (!uparams.benchmark_cmd[0]) { - /* If no benchmark is given by "-b" argument, use fill_buf. */ - uparams.benchmark_cmd[0] = "fill_buf"; - ret = asprintf(&span_str, "%u", DEFAULT_SPAN); - if (ret < 0) - ksft_exit_fail_msg("Out of memory!\n"); - uparams.benchmark_cmd[1] = span_str; - uparams.benchmark_cmd[2] = "1"; - /* - * Third parameter was previously used for "operation" - * (read/write) of which only (now default) "read"/"0" - * works. - * Fourth parameter was previously used to indicate - * how long "fill_buf" should run for, with "false" - * ("fill_buf" will keep running until terminated) - * the only option that works. - */ - uparams.benchmark_cmd[3] = NULL; - uparams.benchmark_cmd[4] = NULL; - } - ksft_set_plan(tests);
for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++) run_single_test(resctrl_tests[i], &uparams);
- free(span_str); + free(fill_param); ksft_finished(); } diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index c4ebf70a46ef..00b3808d3bca 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -535,6 +535,11 @@ int measure_read_mem_bw(const struct user_params *uparams, return ret; }
+struct benchmark_info { + const struct user_params *uparams; + struct resctrl_val_param *param; +}; + /* * run_benchmark - Run a specified benchmark or fill_buf (default benchmark) * in specified signal. Direct benchmark stdio to /dev/null. @@ -544,12 +549,11 @@ int measure_read_mem_bw(const struct user_params *uparams, */ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) { - char **benchmark_cmd; - int ret, memflush; - size_t span; + struct benchmark_info *benchmark_info = info->si_ptr; + const struct user_params *uparams = benchmark_info->uparams; + struct resctrl_val_param *param = benchmark_info->param; FILE *fp; - - benchmark_cmd = info->si_ptr; + int ret;
/* * Direct stdio of child to /dev/null, so that only parent writes to @@ -561,16 +565,13 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) parent_exit(ppid); }
- if (strcmp(benchmark_cmd[0], "fill_buf") == 0) { - /* Execute default fill_buf benchmark */ - span = strtoul(benchmark_cmd[1], NULL, 10); - memflush = atoi(benchmark_cmd[2]); - - if (run_fill_buf(span, memflush)) + if (param->fill_buf) { + if (run_fill_buf(param->fill_buf->buf_size, + param->fill_buf->memflush)) fprintf(stderr, "Error in running fill buffer\n"); - } else { + } else if (uparams->benchmark_cmd[0]) { /* Execute specified benchmark */ - ret = execvp(benchmark_cmd[0], benchmark_cmd); + ret = execvp(uparams->benchmark_cmd[0], (char **)uparams->benchmark_cmd); if (ret) ksft_perror("execvp"); } @@ -585,16 +586,15 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext) * the benchmark * @test: test information structure * @uparams: user supplied parameters - * @benchmark_cmd: benchmark command and its arguments * @param: parameters passed to resctrl_val() * * Return: 0 when the test was run, < 0 on error. */ int resctrl_val(const struct resctrl_test *test, const struct user_params *uparams, - const char * const *benchmark_cmd, struct resctrl_val_param *param) { + struct benchmark_info benchmark_info; struct sigaction sigact; int ret = 0, pipefd[2]; char pipe_message = 0; @@ -610,6 +610,9 @@ int resctrl_val(const struct resctrl_test *test, return ret; }
+ benchmark_info.uparams = uparams; + benchmark_info.param = param; + /* * If benchmark wasn't successfully started by child, then child should * kill parent, so save parent's pid @@ -671,13 +674,7 @@ int resctrl_val(const struct resctrl_test *test,
ksft_print_msg("Benchmark PID: %d\n", (int)bm_pid);
- /* - * The cast removes constness but nothing mutates benchmark_cmd within - * the context of this process. At the receiving process, it becomes - * argv, which is mutable, on exec() but that's after fork() so it - * doesn't matter for the process running the tests. - */ - value.sival_ptr = (void *)benchmark_cmd; + value.sival_ptr = (void *)&benchmark_info;
/* Taskset benchmark to specified cpu */ ret = taskset_benchmark(bm_pid, uparams->cpu, NULL);
On Thu, 24 Oct 2024, Reinette Chatre wrote:
The benchmark used during the CMT, MBM, and MBA tests can be provided by the user via (-b) parameter, if not provided the default "fill_buf" benchmark is used. The user is additionally able to override any of the "fill_buf" default parameters when running the tests with "-b fill_buf <fill_buf parameters>".
The "fill_buf" parameters are managed as an array of strings. Using an array of strings is complex because it requires transformations to/from strings at every producer and consumer. This is made worse for the individual tests where the default benchmark parameters values may not be appropriate and additional data wrangling is required. For example, the CMT test duplicates the entire array of strings in order to replace one of the parameters.
More issues appear when combining the usage of an array of strings with the use case of user overriding default parameters by specifying "-b fill_buf <parameters>". This use case is fragile with opportunities to trigger a SIGSEGV because of opportunities for NULL pointers to exist in the array of strings. For example, by running below (thus by specifying "fill_buf" should be used but all parameters are NULL): $ sudo resctrl_tests -t mbm -b fill_buf
Replace the "array of strings" parameters used for "fill_buf" with new struct fill_buf_param that contains the "fill_buf" parameters that can be used directly without transformations to/from strings. Two instances of struct fill_buf_param may exist at any point in time:
- If the user provides new parameters to "fill_buf", the user parameter structure (struct user_params) will point to a fully initialized and immutable struct fill_buf_param containing the user provided parameters.
- If "fill_buf" is the benchmark that should be used by a test, then the test parameter structure (struct resctrl_val_param) will point to a fully initialized struct fill_buf_param. The latter may contain (a) the user provided parameters verbatim, (b) user provided parameters adjusted to be appropriate for the test, or (c) the default parameters for "fill_buf" that is appropriate for the test if the user did not provide "fill_buf" parameters nor an alternate benchmark.
The existing behavior of CMT test is to use test defined value for the buffer size even if the user provides another value via command line. This behavior is maintained since the test requires that the buffer size matches the size of the cache allocated, and the amount of cache allocated can instead be changed by the user with the "-n" command line parameter.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com
Thanks for the update.
Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
The CMT, MBA, and MBM tests rely on the resctrl_val() wrapper to start and run a benchmark while providing test specific flows via callbacks to do test specific configuration and measurements.
At a high level, the resctrl_val() flow is: a) Start by fork()ing a child process that installs a signal handler for SIGUSR1 that, on receipt of SIGUSR1, will start running a benchmark. b) Assign the child process created in (a) to the resctrl control and monitoring group that dictates the memory and cache allocations with which the process can run and will contain all resctrl monitoring data of that process. c) Once parent and child are considered "ready" (determined via a message over a pipe) the parent signals the child (via SIGUSR1) to start the benchmark, waits one second for the benchmark to run, and then starts collecting monitoring data for the tests, potentially also changing allocation configuration depending on the various test callbacks.
A problem with the above flow is the "black box" view of the benchmark that is combined with an arbitrarily chosen "wait one second" before measurements start. No matter what the benchmark does, it is given one second to initialize before measurements start.
The default benchmark "fill_buf" consists of two parts, first it prepares a buffer (allocate, initialize, then flush), then it reads from the buffer (in unpredictable ways) until terminated. Depending on the system and the size of the buffer, the first "prepare" part may not be complete by the time the one second delay expires. Test measurements may thus start before the work needing to be measured runs.
Split the default benchmark into its "prepare" and "runtime" parts and simplify the resctrl_val() wrapper while doing so. This same split cannot be done for the user provided benchmark (without a user interface change), so the current behavior is maintained for user provided benchmark.
Assign the test itself to the control and monitoring group and run the "prepare" part of the benchmark in this context, ensuring it runs with required cache and memory bandwidth allocations. With the benchmark preparation complete it is only needed to fork() the "runtime" part of the benchmark (or entire user provided benchmark).
Keep the "wait one second" delay before measurements start. For the default "fill_buf" benchmark this time now covers only the "runtime" portion that needs to be measured. For the user provided benchmark this delay maintains current behavior.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V2: - Add Ilpo's Reviewed-by tag.
Changes since V1: - Keep the fflush(stdout) before fork() to avoid duplicate messages. (Ilpo) - Re-order in series to that the new behavior is introduced after issues with existing behavior is addressed. --- tools/testing/selftests/resctrl/fill_buf.c | 15 -- tools/testing/selftests/resctrl/resctrl.h | 1 - tools/testing/selftests/resctrl/resctrl_val.c | 195 +++++------------- 3 files changed, 50 insertions(+), 161 deletions(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 39545f9369e8..380cc35f10c6 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -129,18 +129,3 @@ unsigned char *alloc_buffer(size_t buf_size, bool memflush)
return buf; } - -int run_fill_buf(size_t buf_size, bool memflush) -{ - unsigned char *buf; - - buf = alloc_buffer(buf_size, memflush); - if (!buf) - return -1; - - fill_cache_read(buf, buf_size, false); - - free(buf); - - return 0; -} diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index c9336f9c2cae..032cd9ebd761 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -169,7 +169,6 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, unsigned char *alloc_buffer(size_t buf_size, bool memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); -int run_fill_buf(size_t buf_size, bool memflush); int initialize_read_mem_bw_imc(void); int measure_read_mem_bw(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid); diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 00b3808d3bca..7c08e936572d 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -373,7 +373,7 @@ static int get_mem_bw_resctrl(FILE *fp, unsigned long *mbm_total) return 0; }
-static pid_t bm_pid, ppid; +static pid_t bm_pid;
void ctrlc_handler(int signum, siginfo_t *info, void *ptr) { @@ -431,13 +431,6 @@ void signal_handler_unregister(void) } }
-static void parent_exit(pid_t ppid) -{ - kill(ppid, SIGKILL); - umount_resctrlfs(); - exit(EXIT_FAILURE); -} - /* * print_results_bw: the memory bandwidth results are stored in a file * @filename: file that stores the results @@ -535,52 +528,6 @@ int measure_read_mem_bw(const struct user_params *uparams, return ret; }
-struct benchmark_info { - const struct user_params *uparams; - struct resctrl_val_param *param; -}; - -/* - * run_benchmark - Run a specified benchmark or fill_buf (default benchmark) - * in specified signal. Direct benchmark stdio to /dev/null. - * @signum: signal number - * @info: signal info - * @ucontext: user context in signal handling - */ -static void run_benchmark(int signum, siginfo_t *info, void *ucontext) -{ - struct benchmark_info *benchmark_info = info->si_ptr; - const struct user_params *uparams = benchmark_info->uparams; - struct resctrl_val_param *param = benchmark_info->param; - FILE *fp; - int ret; - - /* - * Direct stdio of child to /dev/null, so that only parent writes to - * stdio (console) - */ - fp = freopen("/dev/null", "w", stdout); - if (!fp) { - ksft_perror("Unable to direct benchmark status to /dev/null"); - parent_exit(ppid); - } - - if (param->fill_buf) { - if (run_fill_buf(param->fill_buf->buf_size, - param->fill_buf->memflush)) - fprintf(stderr, "Error in running fill buffer\n"); - } else if (uparams->benchmark_cmd[0]) { - /* Execute specified benchmark */ - ret = execvp(uparams->benchmark_cmd[0], (char **)uparams->benchmark_cmd); - if (ret) - ksft_perror("execvp"); - } - - fclose(stdout); - ksft_print_msg("Unable to run specified benchmark\n"); - parent_exit(ppid); -} - /* * resctrl_val: execute benchmark and measure memory bandwidth on * the benchmark @@ -594,12 +541,11 @@ int resctrl_val(const struct resctrl_test *test, const struct user_params *uparams, struct resctrl_val_param *param) { - struct benchmark_info benchmark_info; - struct sigaction sigact; - int ret = 0, pipefd[2]; - char pipe_message = 0; - union sigval value; + unsigned char *buf = NULL; + cpu_set_t old_affinity; int domain_id; + int ret = 0; + pid_t ppid;
if (strcmp(param->filename, "") == 0) sprintf(param->filename, "stdio"); @@ -610,108 +556,65 @@ int resctrl_val(const struct resctrl_test *test, return ret; }
- benchmark_info.uparams = uparams; - benchmark_info.param = param; - - /* - * If benchmark wasn't successfully started by child, then child should - * kill parent, so save parent's pid - */ ppid = getpid();
- if (pipe(pipefd)) { - ksft_perror("Unable to create pipe"); + /* Taskset test to specified CPU. */ + ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity); + if (ret) + return ret; + + /* Write test to specified control & monitoring group in resctrl FS. */ + ret = write_bm_pid_to_resctrl(ppid, param->ctrlgrp, param->mongrp); + if (ret) + goto reset_affinity;
- return -1; + if (param->init) { + ret = param->init(param, domain_id); + if (ret) + goto reset_affinity; }
/* - * Fork to start benchmark, save child's pid so that it can be killed - * when needed + * If not running user provided benchmark, run the default + * "fill_buf". First phase of "fill_buf" is to prepare the + * buffer that the benchmark will operate on. No measurements + * are needed during this phase and prepared memory will be + * passed to next part of benchmark via copy-on-write thus + * no impact on the benchmark that relies on reading from + * memory only. */ + if (param->fill_buf) { + buf = alloc_buffer(param->fill_buf->buf_size, + param->fill_buf->memflush); + if (!buf) { + ret = -ENOMEM; + goto reset_affinity; + } + } + fflush(stdout); bm_pid = fork(); if (bm_pid == -1) { + ret = -errno; ksft_perror("Unable to fork"); - - return -1; + goto free_buf; }
+ /* + * What needs to be measured runs in separate process until + * terminated. + */ if (bm_pid == 0) { - /* - * Mask all signals except SIGUSR1, parent uses SIGUSR1 to - * start benchmark - */ - sigfillset(&sigact.sa_mask); - sigdelset(&sigact.sa_mask, SIGUSR1); - - sigact.sa_sigaction = run_benchmark; - sigact.sa_flags = SA_SIGINFO; - - /* Register for "SIGUSR1" signal from parent */ - if (sigaction(SIGUSR1, &sigact, NULL)) { - ksft_perror("Can't register child for signal"); - parent_exit(ppid); - } - - /* Tell parent that child is ready */ - close(pipefd[0]); - pipe_message = 1; - if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) < - sizeof(pipe_message)) { - ksft_perror("Failed signaling parent process"); - close(pipefd[1]); - return -1; - } - close(pipefd[1]); - - /* Suspend child until delivery of "SIGUSR1" from parent */ - sigsuspend(&sigact.sa_mask); - - ksft_perror("Child is done"); - parent_exit(ppid); + if (param->fill_buf) + fill_cache_read(buf, param->fill_buf->buf_size, false); + else if (uparams->benchmark_cmd[0]) + execvp(uparams->benchmark_cmd[0], (char **)uparams->benchmark_cmd); + exit(EXIT_SUCCESS); }
ksft_print_msg("Benchmark PID: %d\n", (int)bm_pid);
- value.sival_ptr = (void *)&benchmark_info; - - /* Taskset benchmark to specified cpu */ - ret = taskset_benchmark(bm_pid, uparams->cpu, NULL); - if (ret) - goto out; - - /* Write benchmark to specified control&monitoring grp in resctrl FS */ - ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp); - if (ret) - goto out; - - if (param->init) { - ret = param->init(param, domain_id); - if (ret) - goto out; - } - - /* Parent waits for child to be ready. */ - close(pipefd[1]); - while (pipe_message != 1) { - if (read(pipefd[0], &pipe_message, sizeof(pipe_message)) < - sizeof(pipe_message)) { - ksft_perror("Failed reading message from child process"); - close(pipefd[0]); - goto out; - } - } - close(pipefd[0]); - - /* Signal child to start benchmark */ - if (sigqueue(bm_pid, SIGUSR1, value) == -1) { - ksft_perror("sigqueue SIGUSR1 to child"); - ret = -1; - goto out; - } - - /* Give benchmark enough time to fully run */ + /* Give benchmark enough time to fully run. */ sleep(1);
/* Test runs until the callback setup() tells the test to stop. */ @@ -729,8 +632,10 @@ int resctrl_val(const struct resctrl_test *test, break; }
-out: kill(bm_pid, SIGKILL); - +free_buf: + free(buf); +reset_affinity: + taskset_restore(ppid, &old_affinity); return ret; }
By default the MBM and MBA tests use the "fill_buf" benchmark to read from a buffer with the goal to measure the memory bandwidth generated by this buffer access.
Care should be taken when sizing the buffer used by the "fill_buf" benchmark. If the buffer is small enough to fit in the cache then it cannot be expected that the benchmark will generate much memory bandwidth. For example, on a system with 320MB L3 cache the existing hardcoded default of 250MB is insufficient.
Use the measured cache size to determine a buffer size that can be expected to trigger memory access while keeping the existing default as minimum, now renamed to MINIMUM_SPAN, that has been appropriate for testing so far.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V3: - Add Ilpo's Reviewed-by tag.
Changes since V2: - Move duplicate code into helper. (Ilpo) - Rename DEFAULT_SPAN to MINIMUM_SPAN to reflect its new purpose. (Ilpo) - Do _not_ add Ilpo's Reviewed-by tag ... the patch changed too much.
Changes since V1: - Ensure buffer is at least double L3 cache size. (Ilpo) - Support user override of default buffer size. (Ilpo) --- tools/testing/selftests/resctrl/fill_buf.c | 13 +++++++++++++ tools/testing/selftests/resctrl/mba_test.c | 7 ++++++- tools/testing/selftests/resctrl/mbm_test.c | 7 ++++++- tools/testing/selftests/resctrl/resctrl.h | 3 ++- tools/testing/selftests/resctrl/resctrl_tests.c | 2 +- 5 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 380cc35f10c6..19a01a52dc1a 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -129,3 +129,16 @@ unsigned char *alloc_buffer(size_t buf_size, bool memflush)
return buf; } + +ssize_t get_fill_buf_size(int cpu_no, const char *cache_type) +{ + unsigned long cache_total_size = 0; + int ret; + + ret = get_cache_size(cpu_no, cache_type, &cache_total_size); + if (ret) + return ret; + + return cache_total_size * 2 > MINIMUM_SPAN ? + cache_total_size * 2 : MINIMUM_SPAN; +} diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index 74d95c460bd0..bf37f3555660 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -182,7 +182,12 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param fill_buf.memflush = uparams->fill_buf->memflush; param.fill_buf = &fill_buf; } else if (!uparams->benchmark_cmd[0]) { - fill_buf.buf_size = DEFAULT_SPAN; + ssize_t buf_size; + + buf_size = get_fill_buf_size(uparams->cpu, "L3"); + if (buf_size < 0) + return buf_size; + fill_buf.buf_size = buf_size; fill_buf.memflush = true; param.fill_buf = &fill_buf; } diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 72261413c868..4224f8ce3538 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -149,7 +149,12 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param fill_buf.memflush = uparams->fill_buf->memflush; param.fill_buf = &fill_buf; } else if (!uparams->benchmark_cmd[0]) { - fill_buf.buf_size = DEFAULT_SPAN; + ssize_t buf_size; + + buf_size = get_fill_buf_size(uparams->cpu, "L3"); + if (buf_size < 0) + return buf_size; + fill_buf.buf_size = buf_size; fill_buf.memflush = true; param.fill_buf = &fill_buf; } diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 032cd9ebd761..a553fe975938 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -41,7 +41,7 @@
#define BENCHMARK_ARGS 64
-#define DEFAULT_SPAN (250 * MB) +#define MINIMUM_SPAN (250 * MB)
/* * fill_buf_param: "fill_buf" benchmark parameters @@ -169,6 +169,7 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, unsigned char *alloc_buffer(size_t buf_size, bool memflush); void mem_flush(unsigned char *buf, size_t buf_size); void fill_cache_read(unsigned char *buf, size_t buf_size, bool once); +ssize_t get_fill_buf_size(int cpu_no, const char *cache_type); int initialize_read_mem_bw_imc(void); int measure_read_mem_bw(const struct user_params *uparams, struct resctrl_val_param *param, pid_t bm_pid); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 24daf76b4039..3335af815b21 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -189,7 +189,7 @@ static struct fill_buf_param *alloc_fill_buf_param(struct user_params *uparams) ksft_exit_skip("Unable to parse benchmark buffer size.\n"); } } else { - fill_param->buf_size = DEFAULT_SPAN; + fill_param->buf_size = MINIMUM_SPAN; }
if (uparams->benchmark_cmd[2] && *uparams->benchmark_cmd[2] != '\0') {
The MBA test incrementally throttles memory bandwidth, each time followed by a comparison between the memory bandwidth observed by the performance counters and resctrl respectively.
While a comparison between performance counters and resctrl is generally appropriate, they do not have an identical view of memory bandwidth. For example RAS features or memory performance features that generate memory traffic may drive accesses that are counted differently by performance counters and MBM respectively, for instance generating "overhead" traffic which is not counted against any specific RMID. As a ratio, this different view of memory bandwidth becomes more apparent at low memory bandwidths.
It is not practical to enable/disable the various features that may generate memory bandwidth to give performance counters and resctrl an identical view. Instead, do not compare performance counters and resctrl view of memory bandwidth when the memory bandwidth is low.
Bandwidth throttling behaves differently across platforms so it is not appropriate to drop measurement data simply based on the throttling level. Instead, use a threshold of 750MiB that has been observed to support adequate comparison between performance counters and resctrl.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V2: - Add Ilpo's Reviewed-by tag.
Changes since V1: - Fix code alignment and spacing. - Modify flow to use "continue" instead of "break" now that earlier changes decreases throttling. - Expand comment of define to elaborate causes of discrepancy between performance counters and MBM. --- tools/testing/selftests/resctrl/mba_test.c | 7 +++++++ tools/testing/selftests/resctrl/resctrl.h | 10 ++++++++++ 2 files changed, 17 insertions(+)
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index bf37f3555660..5b4f0aa7a3a4 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -98,6 +98,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1); avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1); + if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) { + ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n", + THROTTLE_THRESHOLD, + ALLOCATION_MIN + ALLOCATION_STEP * allocation); + continue; + } + avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc; avg_diff_per = (int)(avg_diff * 100);
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index a553fe975938..dab1953fc7a0 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -43,6 +43,16 @@
#define MINIMUM_SPAN (250 * MB)
+/* + * Memory bandwidth (in MiB) below which the bandwidth comparisons + * between iMC and resctrl are considered unreliable. For example RAS + * features or memory performance features that generate memory traffic + * may drive accesses that are counted differently by performance counters + * and MBM respectively, for instance generating "overhead" traffic which + * is not counted against any specific RMID. + */ +#define THROTTLE_THRESHOLD 750 + /* * fill_buf_param: "fill_buf" benchmark parameters * @buf_size: Size (in bytes) of buffer used in benchmark.
The resctrl selftests drop the results from every first test run to avoid (per comment) "inaccurate due to monitoring setup transition phase" data. Previously inaccurate data resulted from workloads needing some time to "settle" and also the measurements themselves to account for earlier measurements to measure across needed timeframe.
commit da50de0a92f3 ("selftests/resctrl: Calculate resctrl FS derived mem bw over sleep(1) only")
ensured that measurements accurately measure just the time frame of interest. The default "fill_buf" benchmark since separated the buffer prepare phase from the benchmark run phase reducing the need for the tests themselves to accommodate the benchmark's "settle" time.
With these enhancements there are no remaining portions needing to "settle" and the first test run can contribute to measurements.
Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V2: - Separate patch addresses Ilpo's observation about magic number in adjacent code. - Add Ilpo's Reviewed-by tag.
Changes since V1: - Remove comment about needing results from first run removed. - Fix existing incorrect spacing while changing line. --- tools/testing/selftests/resctrl/cmt_test.c | 5 ++--- tools/testing/selftests/resctrl/mba_test.c | 10 +++------- tools/testing/selftests/resctrl/mbm_test.c | 10 +++------- 3 files changed, 8 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c index 4c3cf2c25a38..3bbf3042fb06 100644 --- a/tools/testing/selftests/resctrl/cmt_test.c +++ b/tools/testing/selftests/resctrl/cmt_test.c @@ -99,14 +99,13 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of }
/* Field 3 is llc occ resc value */ - if (runs > 0) - sum_llc_occu_resc += strtoul(token_array[3], NULL, 0); + sum_llc_occu_resc += strtoul(token_array[3], NULL, 0); runs++; } fclose(fp);
return show_results_info(sum_llc_occu_resc, no_of_bits, span, - MAX_DIFF, MAX_DIFF_PERCENT, runs - 1, true); + MAX_DIFF, MAX_DIFF_PERCENT, runs, true); }
static void cmt_test_cleanup(void) diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index 5b4f0aa7a3a4..4e6645b172e3 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -86,18 +86,14 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc) int avg_diff_per; float avg_diff;
- /* - * The first run is discarded due to inaccurate value from - * phase transition. - */ - for (runs = NUM_OF_RUNS * allocation + 1; + for (runs = NUM_OF_RUNS * allocation; runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS ; runs++) { sum_bw_imc += bw_imc[runs]; sum_bw_resc += bw_resc[runs]; }
- avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1); - avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1); + avg_bw_imc = sum_bw_imc / NUM_OF_RUNS; + avg_bw_resc = sum_bw_resc / NUM_OF_RUNS; if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) { ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n", THROTTLE_THRESHOLD, diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 4224f8ce3538..315b2ef3b3bc 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -22,17 +22,13 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span) int runs, ret, avg_diff_per; float avg_diff = 0;
- /* - * Discard the first value which is inaccurate due to monitoring setup - * transition phase. - */ - for (runs = 1; runs < NUM_OF_RUNS ; runs++) { + for (runs = 0; runs < NUM_OF_RUNS; runs++) { sum_bw_imc += bw_imc[runs]; sum_bw_resc += bw_resc[runs]; }
- avg_bw_imc = sum_bw_imc / 4; - avg_bw_resc = sum_bw_resc / 4; + avg_bw_imc = sum_bw_imc / NUM_OF_RUNS; + avg_bw_resc = sum_bw_resc / NUM_OF_RUNS; avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc; avg_diff_per = (int)(avg_diff * 100);
The Memory Bandwidth Allocation (MBA) test iterates through all possible MBA allocations, from 10% (ALLOCATION_MIN) to 100% (ALLOCATION_MAX) with increments of 10% (ALLOCATION_STEP) at each iteration. During each iteration the test measures the actual memory bandwidth NUM_OF_RUNS times to determine the impact of MBA on actual memory bandwidth.
After the MBA test completes all the memory bandwidth measurements are parsed into an array. One array for resctrl Memory Bandwidth Monitoring (MBM) measurements and one array for the Integrated Memory Controller (iMC) measurements. Each array has a hardcoded size of 1024 that is large enough to hold the current test data, but this hardcoded value makes the implementation difficult to understand. It will not be clear that this array needs to be reconsidered if any of the test parameters are changed.
Replace the magic constant as array size with the test parameters the array size depends on.
Reported-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Closes: https://lore.kernel.org/all/45af2a8c-517d-8f0d-137d-ad0f3f6a3c68@linux.intel... Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com --- Changes since V3: - Add Ilpo's Reviewed-by tag.
Changes since V2: - New patch. --- tools/testing/selftests/resctrl/mba_test.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index 4e6645b172e3..536d9089d2f6 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -127,8 +127,9 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
static int check_results(void) { + unsigned long bw_resc[NUM_OF_RUNS * ALLOCATION_MAX / ALLOCATION_STEP]; + unsigned long bw_imc[NUM_OF_RUNS * ALLOCATION_MAX / ALLOCATION_STEP]; char *token_array[8], output[] = RESULT_FILE_NAME, temp[512]; - unsigned long bw_imc[1024], bw_resc[1024]; int runs; FILE *fp;
On 10/24/24 15:18, Reinette Chatre wrote:
Changes since V3:
- V3: https://lore.kernel.org/all/cover.1729218182.git.reinette.chatre@intel.com/
- Rebased on HEAD 2a027d6bb660 of kselftest/next.
- Fix empty string parsing issues pointed out by Ilpo.
- Add Reviewed-by tags.
- Please see individual patches for detailed changes.
Changes since V2:
- V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.chatre@intel.com/
- Add fix to protect against buffer overflow when parsing text from sysfs files.
- Add cleanup patch to address use of magic constants as pointed out by Ilpo.
- Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache size to determine "fill_buf" buffer size" that changed too much since receiving the Reviewed-by tag.
- Please see individual patches for detailed changes.
Changes since V1:
- V1: https://lore.kernel.org/cover.1724970211.git.reinette.chatre@intel.com/
- V2 contains the same general solutions to stated problem as V1 but these are now preceded by more fixes (patches 1 to 5) and improved robustness (patches 6 to 9) to existing tests before the series gets back to solving the original problem with more confidence in patches 10 to 13.
- The posibility of making "memflush = false" for CMT test was discussed during V1. Modifying this setting does not have a significant impact on the observed results that are already well within acceptable range and this version thus keeps original default. If performance was a goal it may be possible to do further experimentation where "memflush = false" could eliminate the need for the sleep(1) within the test wrapper, but improving the performance is not a goal of this work.
- (New) Support what seems to be unintended ability for user space to provide parameters to "fill_buf" by making the parsing robust and only support changing parameters that are supported to be changed. Drop support for "write" operation since it has never been measured.
- (New) Improve wraparound handling. (Ilpo)
- (New) A couple of new fixes addressing issues discovered during development.
- (Change from V1) To support fill_buf parameters provided by user space as well as test specific fill_buf parameters struct fill_buf_param is no longer just a member of struct resctrl_val_param, instead there could be at most two instances of struct fill_buf_param, the immutable parameters provided by user space and the parameters used by individual tests. (Ilpo)
- Please see individual patches for detailed changes.
V1 cover:
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald Rapids systems. The test failures result from the following two properties of these systems:
- Emerald Rapids systems can have up to 320MB L3 cache. The resctrl MBA and MBM selftests measure memory traffic for which a hardcoded 250MB buffer has been sufficient so far. On platforms with L3 cache larger than the buffer, the buffer fits in the L3 cache and thus no/very little memory traffic is generated during the "memory bandwidth" tests.
- Some platform features, for example RAS features or memory performance features that generate memory traffic may drive accesses that are counted differently by performance counters and MBM respectively, for instance generating "overhead" traffic which is not counted against any specific RMID. Until now these counting differences have always been "in the noise". On Emerald Rapids systems the maximum MBA throttling (10% memory bandwidth) throttles memory bandwidth to where memory accesses by these other platform features push the memory bandwidth difference between memory controller performance counters and resctrl (MBM) beyond the tests' hardcoded tolerance.
Make the tests more robust against platform variations:
- Let the buffer used by memory bandwidth tests be guided by the size of the L3 cache.
- Larger buffers require longer initialization time before the buffer can be used to measurement. Rework the tests to ensure that buffer initialization is complete before measurements start.
- Do not compare performance counters and MBM measurements at low bandwidth. The value of "low" is hardcoded to 750MiB based on measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake systems. This limit is not applicable to AMD systems since it only applies to the MBA and MBM tests that are isolated to Intel.
[1] https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-plati...
Reinette Chatre (15): selftests/resctrl: Make functions only used in same file static selftests/resctrl: Print accurate buffer size as part of MBM results selftests/resctrl: Fix memory overflow due to unhandled wraparound selftests/resctrl: Protect against array overrun during iMC config parsing selftests/resctrl: Protect against array overflow when reading strings selftests/resctrl: Make wraparound handling obvious selftests/resctrl: Remove "once" parameter required to be false selftests/resctrl: Only support measured read operation selftests/resctrl: Remove unused measurement code selftests/resctrl: Make benchmark parameter passing robust selftests/resctrl: Ensure measurements skip initialization of default benchmark selftests/resctrl: Use cache size to determine "fill_buf" buffer size selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth selftests/resctrl: Keep results from first test run selftests/resctrl: Replace magic constants used as array size
tools/testing/selftests/resctrl/cmt_test.c | 37 +- tools/testing/selftests/resctrl/fill_buf.c | 45 +- tools/testing/selftests/resctrl/mba_test.c | 54 ++- tools/testing/selftests/resctrl/mbm_test.c | 37 +- tools/testing/selftests/resctrl/resctrl.h | 79 +++- .../testing/selftests/resctrl/resctrl_tests.c | 95 +++- tools/testing/selftests/resctrl/resctrl_val.c | 447 +++++------------- tools/testing/selftests/resctrl/resctrlfs.c | 19 +- 8 files changed, 354 insertions(+), 459 deletions(-)
base-commit: 2a027d6bb66002c8e50e974676f932b33c5fce10
Is this patch series ready to be applied?
thanks, -- Shuah
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
Is this patch series ready to be applied?
I believe it is close ... I would like to give Ilpo some time to peek at patches 2 and 10 to confirm if I got their fixes right this time. The rest of the series is ready.
Thank you
Reinette
On Thu, 24 Oct 2024, Reinette Chatre wrote:
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
Is this patch series ready to be applied?
I believe it is close ... I would like to give Ilpo some time to peek at patches 2 and 10 to confirm if I got their fixes right this time. The rest of the series is ready.
Hi,
I took a look at those two patches now and they seemed fine to me so this series should be ready to go now.
On 10/25/24 6:54 AM, Ilpo Järvinen wrote:
On Thu, 24 Oct 2024, Reinette Chatre wrote:
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
Is this patch series ready to be applied?
I believe it is close ... I would like to give Ilpo some time to peek at patches 2 and 10 to confirm if I got their fixes right this time. The rest of the series is ready.
Hi,
I took a look at those two patches now and they seemed fine to me so this series should be ready to go now.
Thank you very much Ilpo.
Reinette
On 11/4/24 15:16, Reinette Chatre wrote:
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
On 10/24/24 15:18, Reinette Chatre wrote:
Is this patch series ready to be applied?
It is now ready after receiving anticipated tags. Could you please consider it for inclusion?
yes. I will apply the series for the next release.
thanks, -- Shuah
On 11/4/24 2:28 PM, Shuah Khan wrote:
On 11/4/24 15:16, Reinette Chatre wrote:
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
On 10/24/24 15:18, Reinette Chatre wrote:
Is this patch series ready to be applied?
It is now ready after receiving anticipated tags. Could you please consider it for inclusion?
yes. I will apply the series for the next release.
Thank you very much Shuah.
Reinette
On 11/4/24 16:14, Reinette Chatre wrote:
On 11/4/24 2:28 PM, Shuah Khan wrote:
On 11/4/24 15:16, Reinette Chatre wrote:
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
On 10/24/24 15:18, Reinette Chatre wrote:
Is this patch series ready to be applied?
It is now ready after receiving anticipated tags. Could you please consider it for inclusion?
yes. I will apply the series for the next release.
Thank you very much Shuah.
Reinette
Applied to linux-kselftest next for Linux 6.13-rc1.
Tested on my system and worked fine.
thanks, -- Shuah
On 11/4/24 4:07 PM, Shuah Khan wrote:
On 11/4/24 16:14, Reinette Chatre wrote:
On 11/4/24 2:28 PM, Shuah Khan wrote:
On 11/4/24 15:16, Reinette Chatre wrote:
Hi Shuah,
On 10/24/24 3:36 PM, Shuah Khan wrote:
On 10/24/24 15:18, Reinette Chatre wrote:
Is this patch series ready to be applied?
It is now ready after receiving anticipated tags. Could you please consider it for inclusion?
yes. I will apply the series for the next release.
Thank you very much Shuah.
Reinette
Applied to linux-kselftest next for Linux 6.13-rc1.
Tested on my system and worked fine.
Thank you very much Shuah. I received automated emails from patchwork-bot+linux-kselftest@kernel.org indicating that versions 1, 2, and 3 were merged (no automated email about v4), but when looking at the kselftest next branch it is clear that latest version, V4, was indeed merged and all looks good.
Thank you.
Reinette
linux-kselftest-mirror@lists.linaro.org