[PATCH V1 00/13] Miscellaneous fixes for resctrl selftests

List overview All Threads
Download

newer

older

[RFC PATCH kunit-next] kunit:...

[PATCH] selftests/seccomp: Adjust...

Sai Praneeth Prakhya

7 Mar 2020 7 Mar '20

3:40 a.m.

This patch set has several miscellaneous fixes to resctrl selftest tool. Some fixes are minor in nature while other are major fixes.

The minor fixes are 1. Typos, comment format 2. Fix MBA feature detection 3. Fix a bug while selecting sibling cpu 4. Remove unnecessary use of variable arguments 5. Change MBM/MBA results reporting format from absolute values to percentage

The major fixes are changing CAT and CQM test cases. CAT test wasn't testing CAT as it isn't using the cache it's allocated, hence, change the test case to test noisy neighbor use case. CAT guarantees a user specified amount of cache for a process or a group of processes, hence test this use case. The updated test case checks if critical process is impacted by noisy neighbor or not. If it's impacted the test fails.

The present CQM test assumes that all the allocated memory (size less than LLC size) for a process will fit into cache and there won't be any overlappings. While this is mostly true, it cannot be *always* true by the nature of how cache works i.e. two addresses could index into same cache line. Hence, change CQM test such that it now uses CAT. Allocate a specific amount of cache using CAT and check if CQM reports more than what CAT has allocated.

Fenghua Yu (1): selftests/resctrl: Fix missing options "-n" and "-p"

Reinette Chatre (4): selftests/resctrl: Fix feature detection selftests/resctrl: Fix typo selftests/resctrl: Fix typo in help text selftests/resctrl: Ensure sibling CPU is not same as original CPU

Sai Praneeth Prakhya (8): selftests/resctrl: Fix MBA/MBM results reporting format selftests/resctrl: Don't use variable argument list for setup function selftests/resctrl: Fix typos selftests/resctrl: Modularize fill_buf for new CAT test case selftests/resctrl: Change Cache Allocation Technology (CAT) test selftests/resctrl: Change Cache Quality Monitoring (CQM) test selftests/resctrl: Dynamically select buffer size for CAT test selftests/resctrl: Cleanup fill_buff after changing CAT test

tools/testing/selftests/resctrl/cache.c | 179 ++++++++----- tools/testing/selftests/resctrl/cat_test.c | 322 +++++++++++++----------- tools/testing/selftests/resctrl/cqm_test.c | 210 +++++++++------- tools/testing/selftests/resctrl/fill_buf.c | 113 ++++++--- tools/testing/selftests/resctrl/mba_test.c | 32 ++- tools/testing/selftests/resctrl/mbm_test.c | 33 ++- tools/testing/selftests/resctrl/resctrl.h | 19 +- tools/testing/selftests/resctrl/resctrl_tests.c | 26 +- tools/testing/selftests/resctrl/resctrl_val.c | 22 +- tools/testing/selftests/resctrl/resctrlfs.c | 52 +++- 10 files changed, 592 insertions(+), 416 deletions(-)

-- 2.7.4

Show replies by date

Sai Praneeth Prakhya

7 Mar 7 Mar

3:40 a.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

From: Reinette Chatre reinette.chatre@intel.com

The intention of the resctrl selftests is to only run the tests associated with the feature(s) supported by the platform. Through parsing of the feature flags found in /proc/cpuinfo it is possible to learn which features are supported by the plaform.

There are currently two issues with the platform feature detection that together result in tests always being run, whether the platform supports a feature or not. First, the parsing of the the feature flags loads the line containing the flags in a buffer that is too small (256 bytes) to always contain all flags. The consequence is that the flags of the features being tested for may not be present in the buffer. Second, the actual test for presence of a feature has an error in the logic, negating the test for a particular feature flag instead of testing for the presence of a particular feature flag.

These two issues combined results in all tests being run on all platforms, whether the feature is supported or not.

Fix these issue by (1) increasing the buffer size being used to parse the feature flags, and (2) change the logic to test for presence of the feature being tested for.

Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/resctrlfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 19c0ec4045a4..226dd7fdcfb1 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -596,11 +596,11 @@ bool check_resctrlfs_support(void)

char *fgrep(FILE *inf, const char *str) { - char line[256]; int slen = strlen(str); + char line[2048];

while (!feof(inf)) { - if (!fgets(line, 256, inf)) + if (!fgets(line, 2048, inf)) break; if (strncmp(line, str, slen)) continue; @@ -631,7 +631,7 @@ bool validate_resctrl_feature_request(char *resctrl_val) if (res) { char *s = strchr(res, ':');

- found = s && !strstr(s, resctrl_val); + found = s && strstr(s, resctrl_val); free(res); } fclose(inf);

-- 2.7.4

Reinette Chatre

9 Mar 9 Mar

9:44 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...

From: Reinette Chatre reinette.chatre@intel.com

The intention of the resctrl selftests is to only run the tests associated with the feature(s) supported by the platform. Through parsing of the feature flags found in /proc/cpuinfo it is possible to learn which features are supported by the plaform.

There are currently two issues with the platform feature detection that together result in tests always being run, whether the platform supports a feature or not. First, the parsing of the the feature flags loads the line containing the flags in a buffer that is too small (256 bytes) to always contain all flags. The consequence is that the flags of the features being tested for may not be present in the buffer. Second, the actual test for presence of a feature has an error in the logic, negating the test for a particular feature flag instead of testing for the presence of a particular feature flag.

These two issues combined results in all tests being run on all platforms, whether the feature is supported or not.

Fix these issue by (1) increasing the buffer size being used to parse the feature flags, and (2) change the logic to test for presence of the feature being tested for.

Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/resctrlfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 19c0ec4045a4..226dd7fdcfb1 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -596,11 +596,11 @@ bool check_resctrlfs_support(void) char *fgrep(FILE *inf, const char *str) {

char line[256]; int slen = strlen(str);

char line[2048];

while (!feof(inf)) {
if (!fgets(line, 256, inf))
if (!fgets(line, 2048, inf))
break;
if (strncmp(line, str, slen)) continue;
@@ -631,7 +631,7 @@ bool validate_resctrl_feature_request(char *resctrl_val) if (res) { char *s = strchr(res, ':');
found = s && !strstr(s, resctrl_val);
found = s && strstr(s, resctrl_val);
free(res); } fclose(inf);

Please note that this is only a partial fix. The current feature detection relies on the feature flags found in /proc/cpuinfo. Quirks and kernel boot parameters are not taken into account. This fix only addresses the parsing of feature flags. If a feature has been disabled via kernel boot parameter or quirk then the resctrl tests would still attempt to run the test for it.

Reinette

Prakhya, Sai Praneeth

10:22 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Reinette,

...

-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 2:45 PM To: Prakhya, Sai Praneeth sai.praneeth.prakhya@intel.com; shuah@kernel.org; linux-kselftest@vger.kernel.org Cc: tglx@linutronix.de; mingo@redhat.com; bp@alien8.de; Luck, Tony tony.luck@intel.com; babu.moger@amd.com; james.morse@arm.com; Shankar, Ravi V ravi.v.shankar@intel.com; Yu, Fenghua fenghua.yu@intel.com; x86@kernel.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
From: Reinette Chatre reinette.chatre@intel.com

The intention of the resctrl selftests is to only run the tests associated with the feature(s) supported by the platform. Through parsing of the feature flags found in /proc/cpuinfo it is possible to learn which features are supported by the plaform.

There are currently two issues with the platform feature detection that together result in tests always being run, whether the platform supports a feature or not. First, the parsing of the the feature flags loads the line containing the flags in a buffer that is too small (256 bytes) to always contain all flags. The consequence is that the flags of the features being tested for may not be present in the buffer. Second, the actual test for presence of a feature has an error in the logic, negating the test for a particular feature flag instead of testing for the presence of a particular feature flag.

These two issues combined results in all tests being run on all platforms, whether the feature is supported or not.

Fix these issue by (1) increasing the buffer size being used to parse the feature flags, and (2) change the logic to test for presence of the feature being tested for.

Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/resctrlfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 19c0ec4045a4..226dd7fdcfb1 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -596,11 +596,11 @@ bool check_resctrlfs_support(void)

char *fgrep(FILE *inf, const char *str) {

char line[256]; int slen = strlen(str);

char line[2048];

while (!feof(inf)) {
if (!fgets(line, 256, inf))
if (!fgets(line, 2048, inf))
break;
if (strncmp(line, str, slen)) continue;
@@ -631,7 +631,7 @@ bool validate_resctrl_feature_request(char
*resctrl_val)

...
if (res) { char *s = strchr(res, ':');
found = s && !strstr(s, resctrl_val);
found = s && strstr(s, resctrl_val);
free(res); } fclose(inf);
Please note that this is only a partial fix. The current feature detection relies on the feature flags found in /proc/cpuinfo. Quirks and kernel boot parameters are not taken into account. This fix only addresses the parsing of feature flags. If a feature has been disabled via kernel boot parameter or quirk then the resctrl tests would still attempt to run the test for it.

That's a good point and makes sense to me. I think we could fix it in two ways 1. grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3 monitoring detected" for both the features 2. Check in "info" directory a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me know what you think.

Regards, Sai

Reinette Chatre

10:33 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Sai,

On 3/9/2020 3:22 PM, Prakhya, Sai Praneeth wrote:

...

Hi Reinette,

...
-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 2:45 PM On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

[SNIP]

...

...
Please note that this is only a partial fix. The current feature detection relies on the feature flags found in /proc/cpuinfo. Quirks and kernel boot parameters are not taken into account. This fix only addresses the parsing of feature flags. If a feature has been disabled via kernel boot parameter or quirk then the resctrl tests would still attempt to run the test for it.

That's a good point and makes sense to me. I think we could fix it in two ways

grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3 monitoring detected" for both the features

Check in "info" directory a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me know what you think.

I agree. For the reasons you mention and also that (1) may not be possible if the loglevel prevents those lines from being printed.

Reinette

Prakhya, Sai Praneeth

10:51 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Reinette,

...

-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 3:34 PM

[SNIP]

...

...
That's a good point and makes sense to me. I think we could fix it in two ways 1. grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3

monitoring detected" for both the features 2. Check in "info" directory

...
a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me

know what you think.

I agree. For the reasons you mention and also that (1) may not be possible if the loglevel prevents those lines from being printed.

Makes sense. I will work on the fix.

Regards, Sai

Reinette Chatre

11 Mar 11 Mar

6:06 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Sai,

On 3/9/2020 3:51 PM, Prakhya, Sai Praneeth wrote:

...

...
-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 3:34 PM

[SNIP]

...
...
That's a good point and makes sense to me. I think we could fix it in two ways 1. grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3

monitoring detected" for both the features 2. Check in "info" directory

...
a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me

know what you think.

I agree. For the reasons you mention and also that (1) may not be possible if the loglevel prevents those lines from being printed.

Makes sense. I will work on the fix.

One more note about this ... from what I can tell the test for a feature currently fails if the platform does not support the feature. Would it be possible to just skip the test in this case instead?

Reinette

Sai Praneeth Prakhya

6:22 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Reinette,

On Wed, 2020-03-11 at 11:06 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/9/2020 3:51 PM, Prakhya, Sai Praneeth wrote:

...
...
-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 3:34 PM

[SNIP]

...
...
That's a good point and makes sense to me. I think we could fix it in two ways 1. grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3

monitoring detected" for both the features 2. Check in "info" directory

...
a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me

know what you think.

I agree. For the reasons you mention and also that (1) may not be possible if the loglevel prevents those lines from being printed.

Makes sense. I will work on the fix.

One more note about this ... from what I can tell the test for a feature currently fails if the platform does not support the feature. Would it be possible to just skip the test in this case instead?

That's because the output of the test should be just "ok" or "not ok".

I can change it to something like "# Skip <test_name> because platform doesn't support the feature", but not really sure if it complies with TAP 13 protocol.

Regards, Sai

Reinette Chatre

6:45 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Sai,

On 3/11/2020 11:22 AM, Sai Praneeth Prakhya wrote:

...

Hi Reinette,

On Wed, 2020-03-11 at 11:06 -0700, Reinette Chatre wrote:

...
Hi Sai,

On 3/9/2020 3:51 PM, Prakhya, Sai Praneeth wrote:

...
...
-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 3:34 PM

[SNIP]

...
...
That's a good point and makes sense to me. I think we could fix it in two ways 1. grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3

monitoring detected" for both the features 2. Check in "info" directory

...
a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me

know what you think.

I agree. For the reasons you mention and also that (1) may not be possible if the loglevel prevents those lines from being printed.

Makes sense. I will work on the fix.

One more note about this ... from what I can tell the test for a feature currently fails if the platform does not support the feature. Would it be possible to just skip the test in this case instead?

That's because the output of the test should be just "ok" or "not ok".

The output could be something like:

ok MBA # SKIP MBA is not supported

...

I can change it to something like "# Skip <test_name> because platform doesn't support the feature", but not really sure if it complies with TAP 13 protocol.

Please consider the "skip" directive at https://testanything.org/tap-version-13-specification.html

Reinette

Sai Praneeth Prakhya

6:54 p.m.

New subject: [PATCH V1 01/13] selftests/resctrl: Fix feature detection

Hi Reinette,

On Wed, 2020-03-11 at 11:45 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/11/2020 11:22 AM, Sai Praneeth Prakhya wrote:

...
Hi Reinette,

On Wed, 2020-03-11 at 11:06 -0700, Reinette Chatre wrote:

...
Hi Sai,

On 3/9/2020 3:51 PM, Prakhya, Sai Praneeth wrote:

...
...
-----Original Message----- From: Reinette Chatre reinette.chatre@intel.com Sent: Monday, March 9, 2020 3:34 PM

[SNIP]

...
...
That's a good point and makes sense to me. I think we could fix it in two ways 1. grep for strings in dmesg but that will still leave ambiguity in deciding b/w mbm and cqm because kernel prints "resctrl: L3

monitoring detected" for both the features 2. Check in "info" directory

...
a. For cat_l3, we could search for info/L3 b. For mba, we could search for info/MB c. For cqm and mbm, we could search for specified string in info/L3_MON/mon_features

I think option 2 might be better because it can handle all cases, please let me

know what you think.

I agree. For the reasons you mention and also that (1) may not be possible if the loglevel prevents those lines from being printed.

Makes sense. I will work on the fix.

One more note about this ... from what I can tell the test for a feature currently fails if the platform does not support the feature. Would it be possible to just skip the test in this case instead?

That's because the output of the test should be just "ok" or "not ok".

The output could be something like:

ok MBA # SKIP MBA is not supported

Makes sense.. I will fix it.

...

...
I can change it to something like "# Skip <test_name> because platform doesn't support the feature", but not really sure if it complies with TAP 13 protocol.

Please consider the "skip" directive at https://testanything.org/tap-version-13-specification.html

Sure! thanks for the link :)

Regards, Sai

Sai Praneeth Prakhya

7 Mar 7 Mar

3:40 a.m.

New subject: [PATCH V1 02/13] selftests/resctrl: Fix typo

From: Reinette Chatre reinette.chatre@intel.com

The format "%sok" is used to print results of a test. If the test passes, the empty string is printed and if the test fails "not " is printed. This results in output of "ok" when test passes and "not ok" when test fails.

Fix one instance where "not" (without a space) is printed on test failure resulting in output of "notok" on test failure.

Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cqm_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/resctrl/cqm_test.c b/tools/testing/selftests/resctrl/cqm_test.c index c8756152bd61..fb4797cfda09 100644 --- a/tools/testing/selftests/resctrl/cqm_test.c +++ b/tools/testing/selftests/resctrl/cqm_test.c @@ -58,7 +58,7 @@ static void show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits, else res = false;

- printf("%sok CQM: diff within %d, %d%%\n", res ? "" : "not", + printf("%sok CQM: diff within %d, %d%%\n", res ? "" : "not ", MAX_DIFF, (int)MAX_DIFF_PERCENT);

printf("# diff: %ld\n", avg_diff);

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 03/13] selftests/resctrl: Fix typo in help text

From: Reinette Chatre reinette.chatre@intel.com

Add a missing newline to the help text printed and fixup the next line to line it up to previous line for improved readability.

Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/resctrl_tests.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 425cc85ac883..f076d285d7c3 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -37,8 +37,8 @@ void detect_amd(void) static void cmd_help(void) { printf("usage: resctrl_tests [-h] [-b "benchmark_cmd [options]"] [-t test list] [-n no_of_bits]\n"); - printf("\t-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CQM"); - printf("\t default benchmark is builtin fill_buf\n"); + printf("\t-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CQM\n"); + printf("\t default benchmark is builtin fill_buf\n"); printf("\t-t test list: run tests specified in the test list, "); printf("e.g. -t mbm, mba, cqm, cat\n"); printf("\t-n no_of_bits: run cache tests using specified no of bits in cache bit mask\n");

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 04/13] selftests/resctrl: Ensure sibling CPU is not same as original CPU

From: Reinette Chatre reinette.chatre@intel.com

The resctrl tests can accept a CPU on which the tests are run and use default of CPU #1 if it is not provided. In the CAT test a "sibling CPU" is determined that is from the same package where another thread will be run.

The current algorithm with which a "sibling CPU" is determined does not take the provided/default CPU into account and when that CPU is the first CPU in a package then the "sibling CPU" will be selected to be the same CPU since it starts by picking the first CPU from core_siblings_list.

Fix the "sibling CPU" selection by taking the provided/default CPU into account and ensuring a sibling that is a different CPU is selected.

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 226dd7fdcfb1..465faaad3239 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -268,7 +268,7 @@ int get_core_sibling(int cpu_no) while (token) { sibling_cpu_no = atoi(token); /* Skipping core 0 as we don't want to run test on core 0 */ - if (sibling_cpu_no != 0) + if (sibling_cpu_no != 0 && sibling_cpu_no != cpu_no) break; token = strtok(NULL, "-,"); }

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 05/13] selftests/resctrl: Fix missing options "-n" and "-p"

From: Fenghua Yu fenghua.yu@intel.com

Add missing options "-n" and "-p" in getopt() so that the options can take action. Other code related to the options are in place already and no change is needed.

Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/resctrl_tests.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index f076d285d7c3..84a436e0775c 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -73,7 +73,7 @@ int main(int argc, char **argv) } }

- while ((c = getopt(argc_new, argv, "ht:b:")) != -1) { + while ((c = getopt(argc_new, argv, "ht:b:n:p:")) != -1) { char *token;

switch (c) {

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 06/13] selftests/resctrl: Fix MBA/MBM results reporting format

Currently MBM/MBA tests use absolute values to check results. But, iMC values and MBM resctrl values may vary on different platforms and specifically for MBA the values may vary as schemata changes. Hence, use percentage instead of absolute values to check tests result.

Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/mba_test.c | 24 +++++++++++++----------- tools/testing/selftests/resctrl/mbm_test.c | 21 +++++++++++---------- 2 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index 7bf8eaa6204b..165e5123e040 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -12,7 +12,7 @@

#define RESULT_FILE_NAME "result_mba" #define NUM_OF_RUNS 5 -#define MAX_DIFF 300 +#define MAX_DIFF_PERCENT 5 #define ALLOCATION_MAX 100 #define ALLOCATION_MIN 10 #define ALLOCATION_STEP 10 @@ -62,31 +62,33 @@ static void show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc) allocation++) { unsigned long avg_bw_imc, avg_bw_resc; unsigned long sum_bw_imc = 0, sum_bw_resc = 0; - unsigned long avg_diff; + float avg_diff; + int avg_diff_per;

/* * The first run is discarded due to inaccurate value from * phase transition. */ for (runs = NUM_OF_RUNS * allocation + 1; - runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS ; runs++) { + runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS; runs++) { sum_bw_imc += bw_imc[runs]; sum_bw_resc += bw_resc[runs]; }

avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1); avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1); - avg_diff = labs((long)(avg_bw_resc - avg_bw_imc)); + avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc; + avg_diff_per = (int)(avg_diff * 100);

- printf("%sok MBA schemata percentage %u smaller than %d %%\n", - avg_diff > MAX_DIFF ? "not " : "", - ALLOCATION_MAX - ALLOCATION_STEP * allocation, - MAX_DIFF); - tests_run++; - printf("# avg_diff: %lu\n", avg_diff); + printf("%sok MBA: diff within %d%% for schemata %u\n", + avg_diff_per > MAX_DIFF_PERCENT ? "not " : "", + MAX_DIFF_PERCENT, + ALLOCATION_MAX - ALLOCATION_STEP * allocation); printf("# avg_bw_imc: %lu\n", avg_bw_imc); printf("# avg_bw_resc: %lu\n", avg_bw_resc); - if (avg_diff > MAX_DIFF) + printf("# avg_diff_per: %d%%\n", avg_diff_per); + tests_run++; + if (avg_diff_per > MAX_DIFF_PERCENT) failed = true; }

diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 4700f7453f81..530ec5bec0b9 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -11,7 +11,7 @@ #include "resctrl.h"

#define RESULT_FILE_NAME "result_mbm" -#define MAX_DIFF 300 +#define MAX_DIFF_PERCENT 5 #define NUM_OF_RUNS 5

static void @@ -19,29 +19,30 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, int span) { unsigned long avg_bw_imc = 0, avg_bw_resc = 0; unsigned long sum_bw_imc = 0, sum_bw_resc = 0; - long avg_diff = 0; - int runs; + float avg_diff = 0; + int runs, avg_diff_per;

/* * Discard the first value which is inaccurate due to monitoring setup * transition phase. */ - for (runs = 1; runs < NUM_OF_RUNS ; runs++) { + for (runs = 1; runs < NUM_OF_RUNS; runs++) { sum_bw_imc += bw_imc[runs]; sum_bw_resc += bw_resc[runs]; }

- avg_bw_imc = sum_bw_imc / 4; - avg_bw_resc = sum_bw_resc / 4; - avg_diff = avg_bw_resc - avg_bw_imc; + avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1); + avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1); + avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc; + avg_diff_per = (int)(avg_diff * 100);

printf("%sok MBM: diff within %d%%\n", - labs(avg_diff) > MAX_DIFF ? "not " : "", MAX_DIFF); - tests_run++; - printf("# avg_diff: %lu\n", labs(avg_diff)); + avg_diff_per > MAX_DIFF_PERCENT ? "not " : "", MAX_DIFF_PERCENT); printf("# Span (MB): %d\n", span); printf("# avg_bw_imc: %lu\n", avg_bw_imc); printf("# avg_bw_resc: %lu\n", avg_bw_resc); + printf("# avg_diff_per: %d%%\n", avg_diff_per); + tests_run++; }

static int check_results(int span)

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 07/13] selftests/resctrl: Don't use variable argument list for setup function

struct resctrl_val_param has setup() function that accepts variable argument list. But, presently, all the test cases use only 1 argument as input and it's struct resctrl_val_param *. So, instead of variable argument list, directly pass struct resctrl_val_param * as parameter.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cache.c | 2 +- tools/testing/selftests/resctrl/cat_test.c | 8 +------- tools/testing/selftests/resctrl/cqm_test.c | 9 +-------- tools/testing/selftests/resctrl/mba_test.c | 8 +------- tools/testing/selftests/resctrl/mbm_test.c | 8 +------- tools/testing/selftests/resctrl/resctrl.h | 2 +- tools/testing/selftests/resctrl/resctrl_val.c | 4 ++-- 7 files changed, 8 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index 38dbf4962e33..1cbcd7fbe216 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -243,7 +243,7 @@ int cat_val(struct resctrl_val_param *param) /* Test runs until the callback setup() tells the test to stop. */ while (1) { if (strcmp(resctrl_val, "cat") == 0) { - ret = param->setup(1, param); + ret = param->setup(param); if (ret) { ret = 0; break; diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 5da43767b973..046c7f285e72 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -27,17 +27,11 @@ unsigned long cache_size; * con_mon grp, mon_grp in resctrl FS. * Run 5 times in order to get average values. */ -static int cat_setup(int num, ...) +static int cat_setup(struct resctrl_val_param *p) { - struct resctrl_val_param *p; char schemata[64]; - va_list param; int ret = 0;

- va_start(param, num); - p = va_arg(param, struct resctrl_val_param *); - va_end(param); - /* Run NUM_OF_RUNS times */ if (p->num_of_runs >= NUM_OF_RUNS) return -1; diff --git a/tools/testing/selftests/resctrl/cqm_test.c b/tools/testing/selftests/resctrl/cqm_test.c index fb4797cfda09..f27b0363e518 100644 --- a/tools/testing/selftests/resctrl/cqm_test.c +++ b/tools/testing/selftests/resctrl/cqm_test.c @@ -21,15 +21,8 @@ char cbm_mask[256]; unsigned long long_mask; unsigned long cache_size;

-static int cqm_setup(int num, ...) +static int cqm_setup(struct resctrl_val_param *p) { - struct resctrl_val_param *p; - va_list param; - - va_start(param, num); - p = va_arg(param, struct resctrl_val_param *); - va_end(param); - /* Run NUM_OF_RUNS times */ if (p->num_of_runs >= NUM_OF_RUNS) return -1; diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index 165e5123e040..0f85e5be390d 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -22,16 +22,10 @@ * con_mon grp, mon_grp in resctrl FS. * For each allocation, run 5 times in order to get average values. */ -static int mba_setup(int num, ...) +static int mba_setup(struct resctrl_val_param *p) { static int runs_per_allocation, allocation = 100; - struct resctrl_val_param *p; char allocation_str[64]; - va_list param; - - va_start(param, num); - p = va_arg(param, struct resctrl_val_param *); - va_end(param);

if (runs_per_allocation >= NUM_OF_RUNS) runs_per_allocation = 0; diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 530ec5bec0b9..9e847641516a 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -84,21 +84,15 @@ static int check_results(int span) return 0; }

-static int mbm_setup(int num, ...) +static int mbm_setup(struct resctrl_val_param *p) { - struct resctrl_val_param *p; static int num_of_runs; - va_list param; int ret = 0;

/* Run NUM_OF_RUNS times */ if (num_of_runs++ >= NUM_OF_RUNS) return -1;

- va_start(param, num); - p = va_arg(param, struct resctrl_val_param *); - va_end(param); - /* Set up shemata with 100% allocation on the first run. */ if (num_of_runs == 0) ret = write_schemata(p->ctrlgrp, "100", p->cpu_no, diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 39bf59c6b9c5..e320e79bc4d4 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -59,7 +59,7 @@ struct resctrl_val_param { char *bw_report; unsigned long mask; int num_of_runs; - int (*setup)(int num, ...); + int (*setup)(struct resctrl_val_param *param); };

pid_t bm_pid, ppid; diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 520fea3606d1..271cb5c976f5 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -712,7 +712,7 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) while (1) { if ((strcmp(resctrl_val, "mbm") == 0) || (strcmp(resctrl_val, "mba") == 0)) { - ret = param->setup(1, param); + ret = param->setup(param); if (ret) { ret = 0; break; @@ -722,7 +722,7 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) if (ret) break; } else if (strcmp(resctrl_val, "cqm") == 0) { - ret = param->setup(1, param); + ret = param->setup(param); if (ret) { ret = 0; break;

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 08/13] selftests/resctrl: Fix typos

No functional changes intended

1. Schemata is spelled wrongly in a comment in mbm_test, fix it. 2. Fix incorrect commenting style in cache.c, fill_buf.c 3. Remove extra space while initializing struct resctrl_val_param in mbm_test.c

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cache.c | 10 +++------- tools/testing/selftests/resctrl/fill_buf.c | 3 ++- tools/testing/selftests/resctrl/mbm_test.c | 4 ++-- 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index 1cbcd7fbe216..be60d7d3f066 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -179,9 +179,7 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0; int ret;

- /* - * Measure cache miss from perf. - */ + /* Measure cache miss from perf */ if (!strcmp(param->resctrl_val, "cat")) { ret = get_llc_perf(&llc_perf_miss); if (ret < 0) @@ -189,9 +187,7 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) llc_value = llc_perf_miss; }

- /* - * Measure llc occupancy from resctrl. - */ + /* Measure llc occupancy from resctrl */ if (!strcmp(param->resctrl_val, "cqm")) { ret = get_llc_occu_resctrl(&llc_occu_resc); if (ret < 0) @@ -228,7 +224,7 @@ int cat_val(struct resctrl_val_param *param) if (ret) return ret;

- /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS*/ + /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS */ ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp, resctrl_val); if (ret) diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 84d2a8b9657a..9ede7b63f059 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -54,7 +54,8 @@ static void mem_flush(void *p, size_t s) char *cp = (char *)p; size_t i = 0;

- s = s / CL_SIZE; /* mem size in cache llines */ + /* mem size in cache lines */ + s = s / CL_SIZE;

for (i = 0; i < s; i++) cl_flush(&cp[i * CL_SIZE]); diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 9e847641516a..b64906f1b34f 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -93,7 +93,7 @@ static int mbm_setup(struct resctrl_val_param *p) if (num_of_runs++ >= NUM_OF_RUNS) return -1;

- /* Set up shemata with 100% allocation on the first run. */ + /* Set up schemata with 100% allocation on the first run */ if (num_of_runs == 0) ret = write_schemata(p->ctrlgrp, "100", p->cpu_no, p->resctrl_val); @@ -116,7 +116,7 @@ int mbm_bw_change(int span, int cpu_no, char *bw_report, char **benchmark_cmd) .cpu_no = cpu_no, .mum_resctrlfs = 1, .filename = RESULT_FILE_NAME, - .bw_report = bw_report, + .bw_report = bw_report, .setup = mbm_setup }; int ret;

-- 2.7.4

Sai Praneeth Prakhya

3:40 a.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

Currently fill_buf (in-built benchmark) runs as a separate process and it runs indefinitely looping around given buffer either reading it or writing to it. But, some future test cases might want to start and stop looping around the buffer as they see fit. So, modularize fill_buf to support this use case.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/fill_buf.c | 66 ++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 9ede7b63f059..204ae8870a32 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -23,7 +23,7 @@ #define PAGE_SIZE (4 * 1024) #define MB (1024 * 1024)

-static unsigned char *startptr; +static unsigned char *startptr, *endptr;

static void sb(void) { @@ -82,13 +82,13 @@ static void *malloc_and_init_memory(size_t s) return p; }

-static int fill_one_span_read(unsigned char *start_ptr, unsigned char *end_ptr) +static int fill_one_span_read(void) { unsigned char sum, *p;

sum = 0; - p = start_ptr; - while (p < end_ptr) { + p = startptr; + while (p < endptr) { sum += *p; p += (CL_SIZE / 2); } @@ -96,26 +96,24 @@ static int fill_one_span_read(unsigned char *start_ptr, unsigned char *end_ptr) return sum; }

-static -void fill_one_span_write(unsigned char *start_ptr, unsigned char *end_ptr) +static void fill_one_span_write(void) { unsigned char *p;

- p = start_ptr; - while (p < end_ptr) { + p = startptr; + while (p < endptr) { *p = '1'; p += (CL_SIZE / 2); } }

-static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr, - char *resctrl_val) +static int fill_cache_read(char *resctrl_val) { int ret = 0; FILE *fp;

while (1) { - ret = fill_one_span_read(start_ptr, end_ptr); + ret = fill_one_span_read(); if (!strcmp(resctrl_val, "cat")) break; } @@ -130,11 +128,10 @@ static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr, return 0; }

-static int fill_cache_write(unsigned char *start_ptr, unsigned char *end_ptr, - char *resctrl_val) +static int fill_cache_write(char *resctrl_val) { while (1) { - fill_one_span_write(start_ptr, end_ptr); + fill_one_span_write(); if (!strcmp(resctrl_val, "cat")) break; } @@ -142,24 +139,25 @@ static int fill_cache_write(unsigned char *start_ptr, unsigned char *end_ptr, return 0; }

-static int -fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, - int op, char *resctrl_val) +static +int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) { unsigned char *start_ptr, *end_ptr; unsigned long long i; - int ret;

if (malloc_and_init) start_ptr = malloc_and_init_memory(buf_size); else start_ptr = malloc(buf_size);

- if (!start_ptr) + if (!start_ptr) { + printf("Failed to allocate memory to buffer\n"); return -1; + }

- startptr = start_ptr; end_ptr = start_ptr + buf_size; + endptr = end_ptr; + startptr = start_ptr;

/* * It's better to touch the memory once to avoid any compiler @@ -176,16 +174,40 @@ fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, if (memflush) mem_flush(start_ptr, buf_size);

+ return 0; +} + +static int use_buffer_forever(int op, char *resctrl_val) +{ + int ret; + if (op == 0) - ret = fill_cache_read(start_ptr, end_ptr, resctrl_val); + ret = fill_cache_read(resctrl_val); else - ret = fill_cache_write(start_ptr, end_ptr, resctrl_val); + ret = fill_cache_write(resctrl_val);

if (ret) { printf("\n Errror in fill cache read/write...\n"); return -1; }

+ return 0; +} + +static int +fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, + int op, char *resctrl_val) +{ + int ret; + + ret = init_buffer(buf_size, malloc_and_init, memflush); + if (ret) + return ret; + + ret = use_buffer_forever(op, resctrl_val); + if (ret) + return ret; + free(startptr);

return 0;

-- 2.7.4

Reinette Chatre

10 Mar 10 Mar

9:59 p.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...

Currently fill_buf (in-built benchmark) runs as a separate process and it runs indefinitely looping around given buffer either reading it or writing to it. But, some future test cases might want to start and stop looping around the buffer as they see fit. So, modularize fill_buf to support this use case.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/fill_buf.c | 66 ++++++++++++++++++++---------- 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 9ede7b63f059..204ae8870a32 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -23,7 +23,7 @@ #define PAGE_SIZE (4 * 1024) #define MB (1024 * 1024) -static unsigned char *startptr; +static unsigned char *startptr, *endptr; static void sb(void) { @@ -82,13 +82,13 @@ static void *malloc_and_init_memory(size_t s) return p; } -static int fill_one_span_read(unsigned char *start_ptr, unsigned char *end_ptr) +static int fill_one_span_read(void) { unsigned char sum, *p; sum = 0;

p = start_ptr;

while (p < end_ptr) {

p = startptr;

while (p < endptr) { sum += *p; p += (CL_SIZE / 2); }

@@ -96,26 +96,24 @@ static int fill_one_span_read(unsigned char *start_ptr, unsigned char *end_ptr) return sum; } -static -void fill_one_span_write(unsigned char *start_ptr, unsigned char *end_ptr) +static void fill_one_span_write(void) { unsigned char *p;

p = start_ptr;

while (p < end_ptr) {

p = startptr;

while (p < endptr) { *p = '1'; p += (CL_SIZE / 2); }

} -static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr,
	   char *resctrl_val)
+static int fill_cache_read(char *resctrl_val) { int ret = 0; FILE *fp; while (1) {
ret = fill_one_span_read(start_ptr, end_ptr);
ret = fill_one_span_read();
if (!strcmp(resctrl_val, "cat")) break; }
@@ -130,11 +128,10 @@ static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr, return 0; } -static int fill_cache_write(unsigned char *start_ptr, unsigned char *end_ptr,
	    char *resctrl_val)
+static int fill_cache_write(char *resctrl_val) { while (1) {
fill_one_span_write(start_ptr, end_ptr);
fill_one_span_write();
if (!strcmp(resctrl_val, "cat")) break; }
@@ -142,24 +139,25 @@ static int fill_cache_write(unsigned char *start_ptr, unsigned char *end_ptr, return 0; } -static int -fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush,
  int op, char *resctrl_val)
+static +int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) { unsigned char *start_ptr, *end_ptr; unsigned long long i;

int ret;

if (malloc_and_init) start_ptr = malloc_and_init_memory(buf_size); else start_ptr = malloc(buf_size);

if (!start_ptr)
if (!start_ptr) {
printf("Failed to allocate memory to buffer\n");
return -1;
}
startptr = start_ptr; end_ptr = start_ptr + buf_size;

endptr = end_ptr;

startptr = start_ptr;

/* * It's better to touch the memory once to avoid any compiler @@ -176,16 +174,40 @@ fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, if (memflush) mem_flush(start_ptr, buf_size);

return 0;

+}

+static int use_buffer_forever(int op, char *resctrl_val) +{

int ret;

if (op == 0)
ret = fill_cache_read(start_ptr, end_ptr, resctrl_val);
ret = fill_cache_read(resctrl_val);
else
ret = fill_cache_write(start_ptr, end_ptr, resctrl_val);
ret = fill_cache_write(resctrl_val);
if (ret) { printf("\n Errror in fill cache read/write...\n"); return -1; }

return 0;

+}

+static int +fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush,
  int op, char *resctrl_val)
+{
int ret;

ret = init_buffer(buf_size, malloc_and_init, memflush);

if (ret)
return ret;
ret = use_buffer_forever(op, resctrl_val);

if (ret)
return ret;

Should buffer be freed on this error path?

I think the asymmetrical nature of the memory allocation and release creates traps like this.

It may be less error prone to have the pointer returned by init_buffer and the acted on and released within fill_cache(), passed to "use_buffer_forever()" as a parameter. The buffer size is known here, there is no need to keep an "end pointer" around.

...

free(startptr);

return 0;

Reinette

Sai Praneeth Prakhya

11 Mar 11 Mar

1:04 a.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

Hi Reinette,

On Tue, 2020-03-10 at 14:59 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
Currently fill_buf (in-built benchmark) runs as a separate process and it runs indefinitely looping around given buffer either reading it or writing to it. But, some future test cases might want to start and stop looping around the buffer as they see fit. So, modularize fill_buf to support this use case.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/fill_buf.c | 66 ++++++++++++++++++++-----

1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 9ede7b63f059..204ae8870a32 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -23,7 +23,7 @@ #define PAGE_SIZE (4 * 1024) #define MB (1024 * 1024) -static unsigned char *startptr; +static unsigned char *startptr, *endptr;

[Snipped.. assuming code over here might not be needed for discussion]

...

...
+static int use_buffer_forever(int op, char *resctrl_val) +{

int ret;

if (op == 0)
ret = fill_cache_read(start_ptr, end_ptr, resctrl_val);
ret = fill_cache_read(resctrl_val);
else
ret = fill_cache_write(start_ptr, end_ptr, resctrl_val);
ret = fill_cache_write(resctrl_val);
if (ret) { printf("\n Errror in fill cache read/write...\n"); return -1; }

return 0;

+}

+static int +fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush,
  int op, char *resctrl_val)
+{
int ret;

ret = init_buffer(buf_size, malloc_and_init, memflush);

if (ret)
return ret;
ret = use_buffer_forever(op, resctrl_val);

if (ret)
return ret;
Should buffer be freed on this error path?

Yes, that's right.. my bad. Will fix it. But the right fix is, use_buffer_forever() should not return at all. It's meant to loop around the buffer _forever_.

...

I think the asymmetrical nature of the memory allocation and release creates traps like this.

It may be less error prone to have the pointer returned by init_buffer and the acted on and released within fill_cache(), passed to "use_buffer_forever()" as a parameter. The buffer size is known here, there is no need to keep an "end pointer" around.

The main reason for having "startptr" as a global variable is to free memory when fill_buf is killed. fill_buf runs as a separate process (for test cases like MBM, MBA and CQM) and when user issues Ctrl_c or when the test kills benchmark_pid (i.e. fill_buf), the buffer is freed (please see ctrl_handler()).

So, I thought, as "startptr" is anyways global, why pass it around as an argument? While making this change I thought it's natural to make "endptr" global as well because the function didn't really look good to just take endptr as an argument without startptr.

I do agree that asymmetrical nature of the memory allocation and release might create traps, I will try to overcome this for CAT test case (other test cases will not need it).

Regards, Sai

Reinette Chatre

3:44 p.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

Hi Sai,

On 3/10/2020 6:04 PM, Sai Praneeth Prakhya wrote:

...

On Tue, 2020-03-10 at 14:59 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
Currently fill_buf (in-built benchmark) runs as a separate process and it runs indefinitely looping around given buffer either reading it or writing to it. But, some future test cases might want to start and stop looping around the buffer as they see fit. So, modularize fill_buf to support this use case.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/fill_buf.c | 66 ++++++++++++++++++++-----

1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 9ede7b63f059..204ae8870a32 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -23,7 +23,7 @@ #define PAGE_SIZE (4 * 1024) #define MB (1024 * 1024) -static unsigned char *startptr; +static unsigned char *startptr, *endptr;

[Snipped.. assuming code over here might not be needed for discussion]

...
...
+static int use_buffer_forever(int op, char *resctrl_val) +{

int ret;

if (op == 0)
ret = fill_cache_read(start_ptr, end_ptr, resctrl_val);
ret = fill_cache_read(resctrl_val);
else
ret = fill_cache_write(start_ptr, end_ptr, resctrl_val);
ret = fill_cache_write(resctrl_val);
if (ret) { printf("\n Errror in fill cache read/write...\n"); return -1; }

return 0;

+}

+static int +fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush,
  int op, char *resctrl_val)
+{
int ret;

ret = init_buffer(buf_size, malloc_and_init, memflush);

if (ret)
return ret;
ret = use_buffer_forever(op, resctrl_val);

if (ret)
return ret;
Should buffer be freed on this error path?
Yes, that's right.. my bad. Will fix it. But the right fix is, use_buffer_forever() should not return at all. It's meant to loop around the buffer _forever_.

...
I think the asymmetrical nature of the memory allocation and release creates traps like this.

It may be less error prone to have the pointer returned by init_buffer and the acted on and released within fill_cache(), passed to "use_buffer_forever()" as a parameter. The buffer size is known here, there is no need to keep an "end pointer" around.

The main reason for having "startptr" as a global variable is to free memory when fill_buf is killed. fill_buf runs as a separate process (for test cases like MBM, MBA and CQM) and when user issues Ctrl_c or when the test kills benchmark_pid (i.e. fill_buf), the buffer is freed (please see ctrl_handler()).

I see. Got it, thanks.

...

So, I thought, as "startptr" is anyways global, why pass it around as an argument? While making this change I thought it's natural to make "endptr" global as well because the function didn't really look good to just take endptr as an argument without startptr.

Maintaining the end pointer is unusual. The start of the buffer and the size are known properties that the end of the buffer can be computed from. Not a problem, it just seems inconsistent that some of the buffer functions operate on the start pointer and size while others operate on the start pointer and end pointer.

...

I do agree that asymmetrical nature of the memory allocation and release might create traps, I will try to overcome this for CAT test case (other test cases will not need it).

Thank you very much

Reinette

Sai Praneeth Prakhya

5:45 p.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

Hi Reinette,

On Wed, 2020-03-11 at 08:44 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/10/2020 6:04 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 14:59 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
Currently fill_buf (in-built benchmark) runs as a separate process a

[SNIP]

...

...
...
Should buffer be freed on this error path?

Yes, that's right.. my bad. Will fix it. But the right fix is, use_buffer_forever() should not return at all. It's meant to loop around the buffer _forever_.

...
I think the asymmetrical nature of the memory allocation and release creates traps like this.

It may be less error prone to have the pointer returned by init_buffer and the acted on and released within fill_cache(), passed to "use_buffer_forever()" as a parameter. The buffer size is known here, there is no need to keep an "end pointer" around.

The main reason for having "startptr" as a global variable is to free memory when fill_buf is killed. fill_buf runs as a separate process (for test cases like MBM, MBA and CQM) and when user issues Ctrl_c or when the test kills benchmark_pid (i.e. fill_buf), the buffer is freed (please see ctrl_handler()).

I see. Got it, thanks.

...
So, I thought, as "startptr" is anyways global, why pass it around as an argument? While making this change I thought it's natural to make "endptr" global as well because the function didn't really look good to just take endptr as an argument without startptr.

Maintaining the end pointer is unusual. The start of the buffer and the size are known properties that the end of the buffer can be computed from. Not a problem, it just seems inconsistent that some of the buffer functions operate on the start pointer and size while others operate on the start pointer and end pointer.

Ok.. makes sense. I will try to make it consistent by using endptr all the time. One advantage of using endptr is that we could just compute endptr once and use it when needed by passing it as variable (will try to not make it global variable).

Regards, Sai

Reinette Chatre

6:10 p.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

Hi Sai,

On 3/11/2020 10:45 AM, Sai Praneeth Prakhya wrote:

...

On Wed, 2020-03-11 at 08:44 -0700, Reinette Chatre wrote:

...
On 3/10/2020 6:04 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 14:59 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
Currently fill_buf (in-built benchmark) runs as a separate process a

[SNIP]

...
...
...
Should buffer be freed on this error path?

Yes, that's right.. my bad. Will fix it. But the right fix is, use_buffer_forever() should not return at all. It's meant to loop around the buffer _forever_.

...
I think the asymmetrical nature of the memory allocation and release creates traps like this.

It may be less error prone to have the pointer returned by init_buffer and the acted on and released within fill_cache(), passed to "use_buffer_forever()" as a parameter. The buffer size is known here, there is no need to keep an "end pointer" around.

The main reason for having "startptr" as a global variable is to free memory when fill_buf is killed. fill_buf runs as a separate process (for test cases like MBM, MBA and CQM) and when user issues Ctrl_c or when the test kills benchmark_pid (i.e. fill_buf), the buffer is freed (please see ctrl_handler()).

I see. Got it, thanks.

...
So, I thought, as "startptr" is anyways global, why pass it around as an argument? While making this change I thought it's natural to make "endptr" global as well because the function didn't really look good to just take endptr as an argument without startptr.

Maintaining the end pointer is unusual. The start of the buffer and the size are known properties that the end of the buffer can be computed from. Not a problem, it just seems inconsistent that some of the buffer functions operate on the start pointer and size while others operate on the start pointer and end pointer.

Ok.. makes sense. I will try to make it consistent by using endptr all the time. One advantage of using endptr is that we could just compute endptr once and use it when needed by passing it as variable (will try to not make it global variable).

This may add unnecessary complexity because from what I can tell some of those calls require buffer size and this would then require needing to recompute the buffer size based on the start and end pointers. Do you really need the end pointer? Can you not just use the start pointer and buffer size?

Reinette

Sai Praneeth Prakhya

6:14 p.m.

New subject: [PATCH V1 09/13] selftests/resctrl: Modularize fill_buf for new CAT test case

HI Reinette,

On Wed, 2020-03-11 at 11:10 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/11/2020 10:45 AM, Sai Praneeth Prakhya wrote:

...
On Wed, 2020-03-11 at 08:44 -0700, Reinette Chatre wrote:

...
On 3/10/2020 6:04 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 14:59 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
Currently fill_buf (in-built benchmark) runs as a separate process a

[SNIP]

...

...
...
Maintaining the end pointer is unusual. The start of the buffer and the size are known properties that the end of the buffer can be computed from. Not a problem, it just seems inconsistent that some of the buffer functions operate on the start pointer and size while others operate on the start pointer and end pointer.

Ok.. makes sense. I will try to make it consistent by using endptr all the time. One advantage of using endptr is that we could just compute endptr once and use it when needed by passing it as variable (will try to not make it global variable).

This may add unnecessary complexity because from what I can tell some of those calls require buffer size and this would then require needing to recompute the buffer size based on the start and end pointers. Do you really need the end pointer? Can you not just use the start pointer and buffer size?

Ok.. makes sense. Will use buffer size.

Regards, Sai

Sai Praneeth Prakhya

7 Mar 7 Mar

3:40 a.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

The present CAT test case, spawns two processes that run in two different control groups with exclusive schemata and both the processes read a buffer from memory only once. Before reading the buffer, perf miss count is cleared and perf miss count is calculated for the read. Since the processes are reading through the buffer only once and initially all the buffer is in memory perf miss count will always be the same regardless of the cache size allocated by CAT to these processes. So, the test isn't testing CAT. Fix this issue by changing the CAT test case.

The updated CAT test runs a "critical" process with exclusive schemata that reads a buffer (same as the size of allocated cache) multiple times there-by utilizing the allocated cache and calculates perf miss rate for every read of the buffer. The average of this perf miss rate is saved. This value indicates the critical process self induced misses. Now, the "critical" process runs besides a "noisy" neighbor that is reading a buffer that is 10 times the size of LLC and both the processes are in different control groups with exclusive schematas. The average perf miss rate for "critical" process is calculated again and compared with the earlier value. If the difference between both these values is greater than 5% it means that "noisy" neighbor does have impact on "critical" process which means CAT is not working as expected and hence the test fails.

Reported-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Tony Luck tony.luck@intel.com Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cache.c | 167 ++++++++----- tools/testing/selftests/resctrl/cat_test.c | 312 ++++++++++++++---------- tools/testing/selftests/resctrl/fill_buf.c | 33 ++- tools/testing/selftests/resctrl/resctrl.h | 9 +- tools/testing/selftests/resctrl/resctrl_tests.c | 2 +- tools/testing/selftests/resctrl/resctrlfs.c | 34 ++- 6 files changed, 352 insertions(+), 205 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index be60d7d3f066..e30cdd7b851c 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -10,9 +10,9 @@ struct read_format { } values[2]; };

-static struct perf_event_attr pea_llc_miss; +static struct perf_event_attr pea_llc_miss, pea_llc_access; static struct read_format rf_cqm; -static int fd_lm; +static int fd_lm, fd_la; char llc_occup_path[1024];

static void initialize_perf_event_attr(void) @@ -27,15 +27,30 @@ static void initialize_perf_event_attr(void) pea_llc_miss.inherit = 1; pea_llc_miss.exclude_guest = 1; pea_llc_miss.disabled = 1; + + pea_llc_access.type = PERF_TYPE_HARDWARE; + pea_llc_access.size = sizeof(struct perf_event_attr); + pea_llc_access.read_format = PERF_FORMAT_GROUP; + pea_llc_access.exclude_kernel = 1; + pea_llc_access.exclude_hv = 1; + pea_llc_access.exclude_idle = 1; + pea_llc_access.exclude_callchain_kernel = 1; + pea_llc_access.inherit = 1; + pea_llc_access.exclude_guest = 1; + pea_llc_access.disabled = 1; + }

static void ioctl_perf_event_ioc_reset_enable(void) { ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0); ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0); + + ioctl(fd_la, PERF_EVENT_IOC_RESET, 0); + ioctl(fd_la, PERF_EVENT_IOC_ENABLE, 0); }

-static int perf_event_open_llc_miss(pid_t pid, int cpu_no) +static int perf_event_open_llc_miss_rate(pid_t pid, int cpu_no) { fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1, PERF_FLAG_FD_CLOEXEC); @@ -45,29 +60,40 @@ static int perf_event_open_llc_miss(pid_t pid, int cpu_no) return -1; }

+ fd_la = perf_event_open(&pea_llc_access, pid, cpu_no, fd_lm, + PERF_FLAG_FD_CLOEXEC); + if (fd_la == -1) { + perror("Error opening member"); + ctrlc_handler(0, NULL, NULL); + return -1; + } + return 0; }

-static int initialize_llc_perf(void) +static void initialize_llc_perf(void) { memset(&pea_llc_miss, 0, sizeof(struct perf_event_attr)); + memset(&pea_llc_access, 0, sizeof(struct perf_event_attr)); memset(&rf_cqm, 0, sizeof(struct read_format));

- /* Initialize perf_event_attr structures for HW_CACHE_MISSES */ + /* + * Initialize perf_event_attr structures for HW_CACHE_MISSES and + * HW_CACHE_REFERENCES + */ initialize_perf_event_attr();

pea_llc_miss.config = PERF_COUNT_HW_CACHE_MISSES; + pea_llc_access.config = PERF_COUNT_HW_CACHE_REFERENCES;

- rf_cqm.nr = 1; - - return 0; + rf_cqm.nr = 2; }

static int reset_enable_llc_perf(pid_t pid, int cpu_no) { int ret = 0;

- ret = perf_event_open_llc_miss(pid, cpu_no); + ret = perf_event_open_llc_miss_rate(pid, cpu_no); if (ret < 0) return ret;

@@ -78,21 +104,21 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no) }

/* - * get_llc_perf: llc cache miss through perf events - * @cpu_no: CPU number that the benchmark PID is binded to + * get_llc_perf_miss_rate: llc cache miss rate through perf events + * @cpu_no: CPU number that the benchmark PID is binded to * - * Perf events like HW_CACHE_MISSES could be used to validate number of - * cache lines allocated. + * Perf events like HW_CACHE_MISSES and HW_CACHE_REFERENCES could be used to + * approximate LLc occupancy under controlled environment * * Return: =0 on success. <0 on failure. */ -static int get_llc_perf(unsigned long *llc_perf_miss) +static int get_llc_perf_miss_rate(float *llc_perf_miss_rate) { - __u64 total_misses; + __u64 total_misses, total_references;

/* Stop counters after one span to get miss rate */ - ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0); + ioctl(fd_la, PERF_EVENT_IOC_DISABLE, 0);

if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) { perror("Could not get llc misses through perf"); @@ -100,11 +126,19 @@ static int get_llc_perf(unsigned long *llc_perf_miss) return -1; }

+ if (read(fd_la, &rf_cqm, sizeof(struct read_format)) == -1) { + perror("Could not get llc accesses through perf"); + + return -1; + } + total_misses = rf_cqm.values[0].value; + total_references = rf_cqm.values[1].value;

close(fd_lm); + close(fd_la);

- *llc_perf_miss = total_misses; + *llc_perf_miss_rate = ((float)total_misses / total_references) * 100;

return 0; } @@ -176,15 +210,16 @@ static int print_results_cache(char *filename, int bm_pid,

int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) { - unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0; + unsigned long llc_occu_resc = 0, llc_value = 0; + float llc_perf_miss_rate = 0; int ret;

/* Measure cache miss from perf */ if (!strcmp(param->resctrl_val, "cat")) { - ret = get_llc_perf(&llc_perf_miss); + ret = get_llc_perf_miss_rate(&llc_perf_miss_rate); if (ret < 0) return ret; - llc_value = llc_perf_miss; + llc_value = llc_perf_miss_rate; }

/* Measure llc occupancy from resctrl */ @@ -202,66 +237,72 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) }

/* - * cache_val: execute benchmark and measure LLC occupancy resctrl - * and perf cache miss for the benchmark - * @param: parameters passed to cache_val() + * setup_critical_process: Bind given pid to given cpu and write the pid + * in requested resctrl FS location, set schemata, + * initialize perf LLC counters and also initialize + * fill buffer benchmark. + * @pid: pid of the process + * @param: Parameters passed to cache_val() * - * Return: 0 on success. non-zero on failure. + * Return: 0 on success. non-zero on failure. */ -int cat_val(struct resctrl_val_param *param) +int setup_critical_process(pid_t pid, struct resctrl_val_param *param) { - int malloc_and_init_memory = 1, memflush = 1, operation = 0, ret = 0; + int ret = 0; char *resctrl_val = param->resctrl_val; - pid_t bm_pid; + char schemata[64];

- if (strcmp(param->filename, "") == 0) - sprintf(param->filename, "stdio"); + /* Taskset parent (critical process) to a specified cpu */ + ret = taskset_benchmark(pid, param->cpu_no); + if (ret) + return ret;

- bm_pid = getpid(); + /* Write parent to specified con_mon grp, mon_grp in resctrl FS */ + ret = write_bm_pid_to_resctrl(pid, param->ctrlgrp, param->mongrp, + resctrl_val); + if (ret) + return ret;

- /* Taskset benchmark to specified cpu */ - ret = taskset_benchmark(bm_pid, param->cpu_no); + sprintf(schemata, "%lx", param->mask); + ret = write_schemata(param->ctrlgrp, schemata, param->cpu_no, "cat"); if (ret) return ret;

- /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS */ - ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp, - resctrl_val); + initialize_llc_perf(); + + ret = init_buffer(param->span, 1, 1); if (ret) return ret;

- if ((strcmp(resctrl_val, "cat") == 0)) { - ret = initialize_llc_perf(); - if (ret) - return ret; - } + return 0; +} + +int run_critical_process(pid_t pid, struct resctrl_val_param *param) +{ + int ret = 0;

- /* Test runs until the callback setup() tells the test to stop. */ + /* Test runs until the callback setup() tells the test to stop */ while (1) { - if (strcmp(resctrl_val, "cat") == 0) { - ret = param->setup(param); - if (ret) { - ret = 0; - break; - } - ret = reset_enable_llc_perf(bm_pid, param->cpu_no); - if (ret) - break; - - if (run_fill_buf(param->span, malloc_and_init_memory, - memflush, operation, resctrl_val)) { - fprintf(stderr, "Error-running fill buffer\n"); - ret = -1; - break; - } - - sleep(1); - ret = measure_cache_vals(param, bm_pid); - if (ret) - break; - } else { + ret = param->setup(param); + if (ret) { + ret = 0; + break; + } + + ret = reset_enable_llc_perf(pid, param->cpu_no); + if (ret) + break; + + /* Read buffer once */ + if (use_buffer_once(0)) { + fprintf(stderr, "Error-running fill buffer\n"); + ret = -1; break; } + + ret = measure_cache_vals(param, pid); + if (ret) + break; }

return ret; diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 046c7f285e72..f7a67f005fe5 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -11,70 +11,65 @@ #include "resctrl.h" #include <unistd.h>

-#define RESULT_FILE_NAME1 "result_cat1" -#define RESULT_FILE_NAME2 "result_cat2" -#define NUM_OF_RUNS 5 -#define MAX_DIFF_PERCENT 4 -#define MAX_DIFF 1000000 +#define RESULT_FILE_NAME "result_cat" +#define NUM_OF_RUNS 10 +#define MAX_DIFF_PERCENT 5

-int count_of_bits; char cbm_mask[256]; -unsigned long long_mask; -unsigned long cache_size;

-/* - * Change schemata. Write schemata to specified - * con_mon grp, mon_grp in resctrl FS. - * Run 5 times in order to get average values. - */ +static unsigned long avg_llc_perf_miss_rate_single_thread; +static unsigned long p1_mask, p2_mask; +static unsigned long cache_size; +static pid_t noisy_pid; +static int count_of_bits; + +/* Run 5 times in order to get average values */ static int cat_setup(struct resctrl_val_param *p) { - char schemata[64]; - int ret = 0; - /* Run NUM_OF_RUNS times */ if (p->num_of_runs >= NUM_OF_RUNS) return -1;

- if (p->num_of_runs == 0) { - sprintf(schemata, "%lx", p->mask); - ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no, - p->resctrl_val); - } p->num_of_runs++; - - return ret; + return 0; }

-static void show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits, - unsigned long span) +static void show_cache_info(unsigned long sum_llc_perf_miss_rate, + int no_of_bits, unsigned long span) { - unsigned long allocated_cache_lines = span / 64; - unsigned long avg_llc_perf_miss = 0; - float diff_percent; + unsigned long avg_llc_perf_miss_rate = 0, diff_percent = 0; + + avg_llc_perf_miss_rate = sum_llc_perf_miss_rate / (NUM_OF_RUNS - 1); + if (!noisy_pid) { + avg_llc_perf_miss_rate_single_thread = avg_llc_perf_miss_rate; + return; + }

- avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1); - diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) / - allocated_cache_lines * 100; + diff_percent = labs(avg_llc_perf_miss_rate - + avg_llc_perf_miss_rate_single_thread);

- printf("%sok CAT: cache miss rate within %d%%\n", - !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT ? + printf("%sok CAT: cache miss rate difference within %d%%\n", + !is_amd && diff_percent > MAX_DIFF_PERCENT ? "not " : "", MAX_DIFF_PERCENT); - tests_run++; - printf("# Percent diff=%d\n", abs((int)diff_percent)); printf("# Number of bits: %d\n", no_of_bits); - printf("# Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss); - printf("# Allocated cache lines: %lu\n", allocated_cache_lines); + printf("# Buffer size: %lu\n", span); + printf("# Avg_llc_perf_miss_rate without noisy process: %lu%%\n", + avg_llc_perf_miss_rate_single_thread); + printf("# Avg_llc_perf_miss_rate with noisy process: %lu%%\n", + avg_llc_perf_miss_rate); + printf("# Percent diff: %lu\n", diff_percent); + tests_run++; }

static int check_results(struct resctrl_val_param *param) { char *token_array[8], temp[512]; - unsigned long sum_llc_perf_miss = 0; + unsigned long sum_llc_perf_miss_rate = 0; int runs = 0, no_of_bits = 0; FILE *fp;

- printf("# Checking for pass/fail\n"); + if (noisy_pid) + printf("# Checking for pass/fail\n"); fp = fopen(param->filename, "r"); if (!fp) { perror("# Cannot open file"); @@ -90,37 +85,107 @@ static int check_results(struct resctrl_val_param *param) token_array[fields++] = token; token = strtok(NULL, ":\t"); } + /* * Discard the first value which is inaccurate due to monitoring * setup transition phase. */ - if (runs > 0) - sum_llc_perf_miss += strtoul(token_array[3], NULL, 0); runs++; + if (runs == 1) + continue; + + sum_llc_perf_miss_rate += strtoul(token_array[3], NULL, 0); }

fclose(fp); no_of_bits = count_bits(param->mask); - - show_cache_info(sum_llc_perf_miss, no_of_bits, param->span); + show_cache_info(sum_llc_perf_miss_rate, no_of_bits, param->span);

return 0; }

void cat_test_cleanup(void) { - remove(RESULT_FILE_NAME1); - remove(RESULT_FILE_NAME2); + remove(RESULT_FILE_NAME); +} + +static int prepare_masks_for_two_processes(int no_of_bits, char *cache_type) +{ + int ret, i; + unsigned long long_mask, shareable_mask; + + /* Get default cbm mask for L3/L2 cache */ + ret = get_cbm_mask(cache_type); + if (ret) + return ret; + + /* Get max number of bits from default cbm mask */ + long_mask = strtoul(cbm_mask, NULL, 16); + count_of_bits = count_bits(long_mask); + + /* + * Max limit is count_of_bits - 1 because we need exclusive masks for + * the two processes. So, the last saved bit will be used by the other + * process. + */ + if (no_of_bits < 1 || no_of_bits > count_of_bits - 1) { + printf("Invalid input value for no_of_bits 'n'\n"); + printf("Please Enter value in range 1 to %d\n", + count_of_bits - 1); + return -1; + } + + ret = get_shareable_mask(cache_type, &shareable_mask); + if (ret) + return ret; + + /* Prepare cbm mask without any shareable bits */ + for (i = 0; i < no_of_bits; i++) { + p1_mask <<= 1; + p1_mask |= 1; + } + p1_mask = ~shareable_mask & p1_mask; + p2_mask = ~p1_mask & long_mask; + + return 0; }

-int cat_perf_miss_val(int cpu_no, int n, char *cache_type) +static int start_noisy_process(pid_t pid, int sibling_cpu_no) { - unsigned long l_mask, l_mask_1; - int ret, pipefd[2], sibling_cpu_no; - char pipe_message; - pid_t bm_pid; + int ret; + unsigned long buf_size = cache_size * 10;

- cache_size = 0; + /* Taskset noisy process to specified cpu */ + ret = taskset_benchmark(pid, sibling_cpu_no); + if (ret) + return ret; + + /* Write noisy process to root con_mon grp in resctrl FS */ + ret = write_bm_pid_to_resctrl(pid, "", "", "cat"); + if (ret) + return ret; + + /* + * Passing 'cat' will not loop around buffer forever, hence don't pass + * test name + */ + ret = run_fill_buf(buf_size, 1, 1, 0, ""); + if (ret) + return ret; + + return 0; +} + +int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type) +{ + int ret, sibling_cpu_no; + unsigned long buf_size; + pid_t critical_pid; + char schemata[64]; + + noisy_pid = 0; + critical_pid = getpid(); + printf("# critical_pid: %d\n", critical_pid);

ret = remount_resctrlfs(true); if (ret) @@ -129,77 +194,43 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) if (!validate_resctrl_feature_request("cat")) return -1;

- /* Get default cbm mask for L3/L2 cache */ - ret = get_cbm_mask(cache_type); + ret = prepare_masks_for_two_processes(no_of_bits, cache_type); if (ret) return ret;

- long_mask = strtoul(cbm_mask, NULL, 16); + /* + * Change root con_mon grp schemata to be exclusive of critical process + * schemata to avoid any interference + */ + sprintf(schemata, "%lx", p2_mask); + ret = write_schemata("", schemata, cpu_no, "cat"); + if (ret) + return ret;

/* Get L3/L2 cache size */ ret = get_cache_size(cpu_no, cache_type, &cache_size); if (ret) return ret; - printf("cache size :%lu\n", cache_size); - - /* Get max number of bits from default-cabm mask */ - count_of_bits = count_bits(long_mask); - - if (n < 1 || n > count_of_bits - 1) { - printf("Invalid input value for no_of_bits n!\n"); - printf("Please Enter value in range 1 to %d\n", - count_of_bits - 1); - return -1; - } - - /* Get core id from same socket for running another thread */ - sibling_cpu_no = get_core_sibling(cpu_no); - if (sibling_cpu_no < 0) - return -1; + printf("# cache size: %lu\n", cache_size);

+ buf_size = cache_size * ((float)(no_of_bits) / count_of_bits); struct resctrl_val_param param = { .resctrl_val = "cat", .cpu_no = cpu_no, .mum_resctrlfs = 0, .setup = cat_setup, + .ctrlgrp = "c1", + .filename = RESULT_FILE_NAME, + .mask = p1_mask, + .num_of_runs = 0, + .span = buf_size };

- l_mask = long_mask >> n; - l_mask_1 = ~l_mask & long_mask; - - /* Set param values for parent thread which will be allocated bitmask - * with (max_bits - n) bits - */ - param.span = cache_size * (count_of_bits - n) / count_of_bits; - strcpy(param.ctrlgrp, "c2"); - strcpy(param.mongrp, "m2"); - strcpy(param.filename, RESULT_FILE_NAME2); - param.mask = l_mask; - param.num_of_runs = 0; - - if (pipe(pipefd)) { - perror("# Unable to create pipe"); - return errno; - } - - bm_pid = fork(); - - /* Set param values for child thread which will be allocated bitmask - * with n bits - */ - if (bm_pid == 0) { - param.mask = l_mask_1; - strcpy(param.ctrlgrp, "c1"); - strcpy(param.mongrp, "m1"); - param.span = cache_size * n / count_of_bits; - strcpy(param.filename, RESULT_FILE_NAME1); - param.num_of_runs = 0; - param.cpu_no = sibling_cpu_no; - } - - remove(param.filename); + ret = setup_critical_process(critical_pid, &param); + if (ret) + return ret;

- ret = cat_val(&param); + ret = run_critical_process(critical_pid, &param); if (ret) return ret;

@@ -207,38 +238,51 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) if (ret) return ret;

- if (bm_pid == 0) { - /* Tell parent that child is ready */ - close(pipefd[0]); - pipe_message = 1; - if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) < - sizeof(pipe_message)) { - close(pipefd[1]); - perror("# failed signaling parent process"); - return errno; - } + printf("# ran critical process without noisy process\n");

- close(pipefd[1]); - while (1) - ; + /* + * Results from first run of critical process are already calculated + * and stored in 'avg_llc_perf_miss_single_thread'. Hence, delete the + * file, so that it could be reused for second run. + */ + cat_test_cleanup(); + + /* Get core id from same socket for running noisy process */ + sibling_cpu_no = get_core_sibling(cpu_no); + if (sibling_cpu_no < 0) + return -1; + + noisy_pid = fork(); + if (noisy_pid == 0) { + /* + * Child is the noisy_process which runs in root con_mon grp by + * default and hence no need to write pid to resctrl FS. + * Schemata for root con_mon grp is also set above. + */ + printf("# noisy_pid: %d\n", getpid()); + ret = start_noisy_process(getpid(), sibling_cpu_no); + exit(EXIT_SUCCESS); + } else if (noisy_pid == -1) { + return -1; } else { - /* Parent waits for child to be ready. */ - close(pipefd[1]); - pipe_message = 0; - while (pipe_message != 1) { - if (read(pipefd[0], &pipe_message, - sizeof(pipe_message)) < sizeof(pipe_message)) { - perror("# failed reading from child process"); - break; - } - } - close(pipefd[0]); - kill(bm_pid, SIGKILL); - } + /* + * Parent runs again. Sleep for a second here so that noisy + * process gets to run before critical process + */ + sleep(1); + param.num_of_runs = 0; + ret = run_critical_process(critical_pid, &param); + if (ret) + return ret;

- cat_test_cleanup(); - if (bm_pid) - umount_resctrlfs(); + ret = check_results(&param); + if (ret) + return ret; + + ret = kill(noisy_pid, SIGKILL); + if (ret) + printf("Failed to kill noisy_pid\n"); + }

return 0; } diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 204ae8870a32..0500dab90b2e 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -139,7 +139,6 @@ static int fill_cache_write(char *resctrl_val) return 0; }

-static int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) { unsigned char *start_ptr, *end_ptr; @@ -177,7 +176,33 @@ int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) return 0; }

-static int use_buffer_forever(int op, char *resctrl_val) +int use_buffer_once(int op) +{ + FILE *fp; + int ret = 0; + + if (op == 0) { + ret = fill_one_span_read(); + + /* Consume result so that reading memory is not optimized */ + fp = fopen("/dev/null", "w"); + if (!fp) + perror("Unable to write to /dev/null"); + fprintf(fp, "Sum: %d ", ret); + fclose(fp); + ret = 0; + } else { + fill_one_span_write(); + } + + if (ret) { + printf("\n Error in fill cache read/write...\n"); + return -1; + } + return 0; +} + +int use_buffer_forever(int op, char *resctrl_val) { int ret;

@@ -187,7 +212,7 @@ static int use_buffer_forever(int op, char *resctrl_val) ret = fill_cache_write(resctrl_val);

if (ret) { - printf("\n Errror in fill cache read/write...\n"); + printf("\n Error in fill cache read/write...\n"); return -1; }

@@ -228,7 +253,7 @@ int run_fill_buf(unsigned long span, int malloc_and_init_memory, ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op, resctrl_val); if (ret) { - printf("\n Errror in fill cache\n"); + printf("\n Error in fill cache\n"); return -1; }

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index e320e79bc4d4..79148cbbd7a4 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -27,7 +27,7 @@ #define MB (1024 * 1024) #define RESCTRL_PATH "/sys/fs/resctrl" #define PHYS_ID_PATH "/sys/devices/system/cpu/cpu" -#define CBM_MASK_PATH "/sys/fs/resctrl/info" +#define INFO_PATH "/sys/fs/resctrl/info"

#define PARENT_EXIT(err_msg) \ do { \ @@ -84,6 +84,9 @@ int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp, char *resctrl_val); int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags); +int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush); +int use_buffer_once(int op); +int use_buffer_forever(int op, char *resctrl_val); int run_fill_buf(unsigned long span, int malloc_and_init_memory, int memflush, int op, char *resctrl_va); int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param); @@ -93,9 +96,11 @@ void mbm_test_cleanup(void); int mba_schemata_change(int cpu_no, char *bw_report, char **benchmark_cmd); void mba_test_cleanup(void); int get_cbm_mask(char *cache_type); +int get_shareable_mask(char *cache_type, unsigned long *shareable_mask); int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size); void ctrlc_handler(int signum, siginfo_t *info, void *ptr); -int cat_val(struct resctrl_val_param *param); +int setup_critical_process(pid_t pid, struct resctrl_val_param *param); +int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 84a436e0775c..60db128312a6 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -192,8 +192,8 @@ int main(int argc, char **argv) printf("# Starting CAT test ...\n"); res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); printf("%sok CAT: test\n", res ? "not " : ""); - tests_run++; cat_test_cleanup(); + tests_run++; }

printf("1..%d\n", tests_run); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 465faaad3239..52452bb0178a 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -215,7 +215,7 @@ int get_cbm_mask(char *cache_type) char cbm_mask_path[1024]; FILE *fp;

- sprintf(cbm_mask_path, "%s/%s/cbm_mask", CBM_MASK_PATH, cache_type); + sprintf(cbm_mask_path, "%s/%s/cbm_mask", INFO_PATH, cache_type);

fp = fopen(cbm_mask_path, "r"); if (!fp) { @@ -235,6 +235,38 @@ int get_cbm_mask(char *cache_type) }

/* + * get_shareable_mask - Get shareable mask from shareable_bits for given cache + * @cache_type: Cache level L2/L3 + * @shareable_mask: Mask is returned as unsigned long value + * + * Return: = 0 on success, < 0 on failure. + */ +int get_shareable_mask(char *cache_type, unsigned long *shareable_mask) +{ + char shareable_bits_file[1024]; + FILE *fp; + + sprintf(shareable_bits_file, "%s/%s/shareable_bits", INFO_PATH, + cache_type); + + fp = fopen(shareable_bits_file, "r"); + if (!fp) { + perror("Failed to open shareable_bits file"); + + return -1; + } + if (fscanf(fp, "%lx", shareable_mask) <= 0) { + perror("Could not get shareable bits"); + fclose(fp); + + return -1; + } + fclose(fp); + + return 0; +} + +/* * get_core_sibling - Get sibling core id from the same socket for given CPU * @cpu_no: CPU number *

-- 2.7.4

Reinette Chatre

10 Mar 10 Mar

10:14 p.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

Hi Sai,

Not just specific to this patch but I think the prevalent use of global variables that are initialized/used or allocated/released from a variety of places within the code is creating traps. I seemed to have stumbled on a few during this review so far but it is hard to keep track of and I am not confident that I caught them all. Having the code be symmetrical (allocate and free from same area or initialize and use from same area) does help to avoid such complexity.

This patch and the patch that follows are both quite large and difficult to keep track of all the collected changes. There seems to be opportunity for separating it into logical changes. Some of my comments may be just because I could not keep track of all that is changed at the same time.

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...

The present CAT test case, spawns two processes that run in two different control groups with exclusive schemata and both the processes read a buffer from memory only once. Before reading the buffer, perf miss count is cleared and perf miss count is calculated for the read. Since the processes are reading through the buffer only once and initially all the buffer is in memory perf miss count will always be the same regardless of the cache size allocated by CAT to these processes. So, the test isn't testing CAT. Fix this issue by changing the CAT test case.

The updated CAT test runs a "critical" process with exclusive schemata that reads a buffer (same as the size of allocated cache) multiple times there-by utilizing the allocated cache and calculates perf miss rate for

Transitioning the description from "perf miss count" to "perf miss rate" is subtle. It would be valuable to elaborate what is meant with "perf miss rate".

...

every read of the buffer. The average of this perf miss rate is saved. This value indicates the critical process self induced misses. Now, the "critical" process runs besides a "noisy" neighbor that is reading a buffer that is 10 times the size of LLC and both the processes are in different control groups with exclusive schematas. The average perf miss rate for "critical" process is calculated again and compared with the earlier value. If the difference between both these values is greater than 5% it means that "noisy" neighbor does have impact on "critical" process which means CAT is not working as expected and hence the test fails.

Reported-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Tony Luck tony.luck@intel.com Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/cache.c | 167 ++++++++----- tools/testing/selftests/resctrl/cat_test.c | 312 ++++++++++++++---------- tools/testing/selftests/resctrl/fill_buf.c | 33 ++- tools/testing/selftests/resctrl/resctrl.h | 9 +- tools/testing/selftests/resctrl/resctrl_tests.c | 2 +- tools/testing/selftests/resctrl/resctrlfs.c | 34 ++- 6 files changed, 352 insertions(+), 205 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index be60d7d3f066..e30cdd7b851c 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -10,9 +10,9 @@ struct read_format { } values[2]; }; -static struct perf_event_attr pea_llc_miss; +static struct perf_event_attr pea_llc_miss, pea_llc_access; static struct read_format rf_cqm; -static int fd_lm; +static int fd_lm, fd_la; char llc_occup_path[1024]; static void initialize_perf_event_attr(void) @@ -27,15 +27,30 @@ static void initialize_perf_event_attr(void) pea_llc_miss.inherit = 1; pea_llc_miss.exclude_guest = 1; pea_llc_miss.disabled = 1;

pea_llc_access.type = PERF_TYPE_HARDWARE;

pea_llc_access.size = sizeof(struct perf_event_attr);

pea_llc_access.read_format = PERF_FORMAT_GROUP;

pea_llc_access.exclude_kernel = 1;

pea_llc_access.exclude_hv = 1;

pea_llc_access.exclude_idle = 1;

pea_llc_access.exclude_callchain_kernel = 1;

pea_llc_access.inherit = 1;

pea_llc_access.exclude_guest = 1;

pea_llc_access.disabled = 1;

This initialization appears to duplicate the initialization done above. Perhaps this function could be a wrapper that calls an initialization function with pointer to perf_event_attr that initializes structure the same?

...

} static void ioctl_perf_event_ioc_reset_enable(void) { ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0); ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0);

ioctl(fd_la, PERF_EVENT_IOC_RESET, 0);

ioctl(fd_la, PERF_EVENT_IOC_ENABLE, 0);

}

Here is more duplication.

...

-static int perf_event_open_llc_miss(pid_t pid, int cpu_no) +static int perf_event_open_llc_miss_rate(pid_t pid, int cpu_no) { fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1, PERF_FLAG_FD_CLOEXEC); @@ -45,29 +60,40 @@ static int perf_event_open_llc_miss(pid_t pid, int cpu_no) return -1; }
fd_la = perf_event_open(&pea_llc_access, pid, cpu_no, fd_lm,
		PERF_FLAG_FD_CLOEXEC);
if (fd_la == -1) {
perror("Error opening member");
ctrlc_handler(0, NULL, NULL);
return -1;

Should fd_lm not be closed on this error path?

...

}

return 0;

} -static int initialize_llc_perf(void) +static void initialize_llc_perf(void) { memset(&pea_llc_miss, 0, sizeof(struct perf_event_attr));

memset(&pea_llc_access, 0, sizeof(struct perf_event_attr)); memset(&rf_cqm, 0, sizeof(struct read_format));

/* Initialize perf_event_attr structures for HW_CACHE_MISSES */
/*
* Initialize perf_event_attr structures for HW_CACHE_MISSES and
* HW_CACHE_REFERENCES
*/
initialize_perf_event_attr();
pea_llc_miss.config = PERF_COUNT_HW_CACHE_MISSES;

pea_llc_access.config = PERF_COUNT_HW_CACHE_REFERENCES;

rf_cqm.nr = 1;

return 0;

rf_cqm.nr = 2;

} static int reset_enable_llc_perf(pid_t pid, int cpu_no) { int ret = 0;

ret = perf_event_open_llc_miss(pid, cpu_no);

ret = perf_event_open_llc_miss_rate(pid, cpu_no); if (ret < 0) return ret;

@@ -78,21 +104,21 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no) } /*

get_llc_perf: llc cache miss through perf events

@cpu_no: CPU number that the benchmark PID is binded to

get_llc_perf_miss_rate: llc cache miss rate through perf events

Could "llc" be "LLC" to be consistent with below?

...

@cpu_no: CPU number that the benchmark PID is binded to

Perf events like HW_CACHE_MISSES could be used to validate number of

cache lines allocated.

Perf events like HW_CACHE_MISSES and HW_CACHE_REFERENCES could be used to

approximate LLc occupancy under controlled environment

s/LLc/LLC/

...

Return: =0 on success. <0 on failure.

*/ -static int get_llc_perf(unsigned long *llc_perf_miss) +static int get_llc_perf_miss_rate(float *llc_perf_miss_rate) {

__u64 total_misses;

__u64 total_misses, total_references;

/* Stop counters after one span to get miss rate */

ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);

ioctl(fd_la, PERF_EVENT_IOC_DISABLE, 0);

if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) { perror("Could not get llc misses through perf"); @@ -100,11 +126,19 @@ static int get_llc_perf(unsigned long *llc_perf_miss) return -1; }
if (read(fd_la, &rf_cqm, sizeof(struct read_format)) == -1) {
perror("Could not get llc accesses through perf");
return -1;

It looks like the cleanup (closing of file descriptors) is omitted on this and the earlier error path.

...

}

total_misses = rf_cqm.values[0].value;

total_references = rf_cqm.values[1].value;

close(fd_lm);

close(fd_la);

*llc_perf_miss = total_misses;

*llc_perf_miss_rate = ((float)total_misses / total_references) * 100;

return 0; } @@ -176,15 +210,16 @@ static int print_results_cache(char *filename, int bm_pid, int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) {

unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0;

unsigned long llc_occu_resc = 0, llc_value = 0;

float llc_perf_miss_rate = 0; int ret;

/* Measure cache miss from perf */ if (!strcmp(param->resctrl_val, "cat")) {
ret = get_llc_perf(&llc_perf_miss);
ret = get_llc_perf_miss_rate(&llc_perf_miss_rate);
if (ret < 0) return ret;
llc_value = llc_perf_miss;
llc_value = llc_perf_miss_rate;

What is the benefit of llc_perf_miss_rate being of type float?

...

} /* Measure llc occupancy from resctrl */ @@ -202,66 +237,72 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) } /*

cache_val: execute benchmark and measure LLC occupancy resctrl

and perf cache miss for the benchmark

@param: parameters passed to cache_val()
setup_critical_process: Bind given pid to given cpu and write the pid
		in requested resctrl FS location, set schemata,
		initialize perf LLC counters and also initialize
		fill buffer benchmark.
@pid: pid of the process

@param: Parameters passed to cache_val()
Return: 0 on success. non-zero on failure.

Return: 0 on success. non-zero on failure.

*/

-int cat_val(struct resctrl_val_param *param) +int setup_critical_process(pid_t pid, struct resctrl_val_param *param) {

int malloc_and_init_memory = 1, memflush = 1, operation = 0, ret = 0;

int ret = 0; char *resctrl_val = param->resctrl_val;

pid_t bm_pid;

char schemata[64];
if (strcmp(param->filename, "") == 0)
sprintf(param->filename, "stdio");
/* Taskset parent (critical process) to a specified cpu */

ret = taskset_benchmark(pid, param->cpu_no);

if (ret)
return ret;
bm_pid = getpid();
/* Write parent to specified con_mon grp, mon_grp in resctrl FS */

ret = write_bm_pid_to_resctrl(pid, param->ctrlgrp, param->mongrp,
		      resctrl_val);
if (ret)
return ret;
/* Taskset benchmark to specified cpu */

ret = taskset_benchmark(bm_pid, param->cpu_no);

sprintf(schemata, "%lx", param->mask);

ret = write_schemata(param->ctrlgrp, schemata, param->cpu_no, "cat"); if (ret) return ret;
/* Write benchmark to specified con_mon grp, mon_grp in resctrl FS */

ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
		      resctrl_val);
initialize_llc_perf();

ret = init_buffer(param->span, 1, 1); if (ret) return ret;
if ((strcmp(resctrl_val, "cat") == 0)) {
ret = initialize_llc_perf();
if (ret)
	return ret;
}
return 0;

+}

+int run_critical_process(pid_t pid, struct resctrl_val_param *param) +{

int ret = 0;

/* Test runs until the callback setup() tells the test to stop. */

/* Test runs until the callback setup() tells the test to stop */ while (1) {
if (strcmp(resctrl_val, "cat") == 0) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
	ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
	if (ret)
		break;
	if (run_fill_buf(param->span, malloc_and_init_memory,
			 memflush, operation, resctrl_val)) {
		fprintf(stderr, "Error-running fill buffer\n");
		ret = -1;
		break;
	}
	sleep(1);
	ret = measure_cache_vals(param, bm_pid);
	if (ret)
		break;
} else {
ret = param->setup(param);
if (ret) {
	ret = 0;
	break;
}
ret = reset_enable_llc_perf(pid, param->cpu_no);

This is in a while(1) loop and it seems reset_enable_llc_perf() opens the file descriptors and reset then enable the counters. Would it not be more efficient to open the file descriptors outside of this while() loop and just reset/enable the counters within?

...

```
if (ret)
```
```
	break;
```
```
/* Read buffer once */
```
```
if (use_buffer_once(0)) {
```

	fprintf(stderr, "Error-running fill buffer\n");

```
	ret = -1;
break;
```
}

If I understand correctly reset_enable_llc_perf() will open the perf file descriptors and start the measurement and measure_cache_vals() will read from the file descriptors to obtain the measurements. It seems that if use_buffer_once() fails that the perf file descriptors need to be closed?

...

ret = measure_cache_vals(param, pid);
if (ret)
	break;
}
return ret; diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 046c7f285e72..f7a67f005fe5 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -11,70 +11,65 @@ #include "resctrl.h" #include <unistd.h> -#define RESULT_FILE_NAME1 "result_cat1" -#define RESULT_FILE_NAME2 "result_cat2" -#define NUM_OF_RUNS 5 -#define MAX_DIFF_PERCENT 4 -#define MAX_DIFF 1000000 +#define RESULT_FILE_NAME "result_cat" +#define NUM_OF_RUNS 10 +#define MAX_DIFF_PERCENT 5 -int count_of_bits; char cbm_mask[256]; -unsigned long long_mask; -unsigned long cache_size; -/*

Change schemata. Write schemata to specified

con_mon grp, mon_grp in resctrl FS.

Run 5 times in order to get average values.

*/

+static unsigned long avg_llc_perf_miss_rate_single_thread; +static unsigned long p1_mask, p2_mask;

If these _have_ to be global variables, could they have more descriptive names?

...

+static unsigned long cache_size; +static pid_t noisy_pid; +static int count_of_bits;

+/* Run 5 times in order to get average values */

Seems like NUM_OF_RUNS above was changed to 10 so the above is no longer accurate. Perhaps just say "Run NUM_OF_RUNS times" ?

...

static int cat_setup(struct resctrl_val_param *p) {

char schemata[64];

int ret = 0;

/* Run NUM_OF_RUNS times */ if (p->num_of_runs >= NUM_OF_RUNS) return -1;
if (p->num_of_runs == 0) {
sprintf(schemata, "%lx", p->mask);
ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
		     p->resctrl_val);
} p->num_of_runs++;

return ret;
return 0;

}

All of this complication does not seem to be necessary. This cat_setup() does not actually do any setup ... it seems to only exist to be able to break out of an infinite loop. Why not just eliminate this function and just run the loop within run_critical_process() NUM_OF_RUNS times?

...

-static void show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
	    unsigned long span)
+static void show_cache_info(unsigned long sum_llc_perf_miss_rate,
	    int no_of_bits, unsigned long span)
{

unsigned long allocated_cache_lines = span / 64;

unsigned long avg_llc_perf_miss = 0;

float diff_percent;
unsigned long avg_llc_perf_miss_rate = 0, diff_percent = 0;

avg_llc_perf_miss_rate = sum_llc_perf_miss_rate / (NUM_OF_RUNS - 1);

if (!noisy_pid) {
avg_llc_perf_miss_rate_single_thread = avg_llc_perf_miss_rate;
return;
}
avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1);

diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) /
		allocated_cache_lines * 100;
diff_percent = labs(avg_llc_perf_miss_rate -
       avg_llc_perf_miss_rate_single_thread);
printf("%sok CAT: cache miss rate within %d%%\n",
      !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT ?
printf("%sok CAT: cache miss rate difference within %d%%\n",
      !is_amd && diff_percent > MAX_DIFF_PERCENT ?
    "not " : "", MAX_DIFF_PERCENT);
tests_run++;

printf("# Percent diff=%d\n", abs((int)diff_percent)); printf("# Number of bits: %d\n", no_of_bits);

printf("# Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);

printf("# Allocated cache lines: %lu\n", allocated_cache_lines);
printf("# Buffer size: %lu\n", span);

printf("# Avg_llc_perf_miss_rate without noisy process: %lu%%\n",
      avg_llc_perf_miss_rate_single_thread);
printf("# Avg_llc_perf_miss_rate with noisy process: %lu%%\n",
      avg_llc_perf_miss_rate);
printf("# Percent diff: %lu\n", diff_percent);

tests_run++;
} static int check_results(struct resctrl_val_param *param) { char *token_array[8], temp[512];

unsigned long sum_llc_perf_miss = 0;

unsigned long sum_llc_perf_miss_rate = 0; int runs = 0, no_of_bits = 0; FILE *fp;

printf("# Checking for pass/fail\n");
if (noisy_pid)
printf("# Checking for pass/fail\n");
fp = fopen(param->filename, "r"); if (!fp) { perror("# Cannot open file");
@@ -90,37 +85,107 @@ static int check_results(struct resctrl_val_param *param) token_array[fields++] = token; token = strtok(NULL, ":\t"); }

/*

Discard the first value which is inaccurate due to monitoring

setup transition phase.

*/
if (runs > 0)
	sum_llc_perf_miss += strtoul(token_array[3], NULL, 0);
runs++;
if (runs == 1)
	continue;
sum_llc_perf_miss_rate += strtoul(token_array[3], NULL, 0);
}
fclose(fp); no_of_bits = count_bits(param->mask);

show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);

show_cache_info(sum_llc_perf_miss_rate, no_of_bits, param->span);

return 0; } void cat_test_cleanup(void) {

remove(RESULT_FILE_NAME1);

remove(RESULT_FILE_NAME2);

remove(RESULT_FILE_NAME);

+}

+static int prepare_masks_for_two_processes(int no_of_bits, char *cache_type)

It would be valuable to include comments that describe the goal of these masks. Some questions that may be asked when seeing this function ... Why are two masks needed? What are the differences between them? How are they intended to be used?

...

+{
int ret, i;

unsigned long long_mask, shareable_mask;

/* Get default cbm mask for L3/L2 cache */

ret = get_cbm_mask(cache_type);

if (ret)
return ret;
/* Get max number of bits from default cbm mask */

long_mask = strtoul(cbm_mask, NULL, 16);

count_of_bits = count_bits(long_mask);

/*
* Max limit is count_of_bits - 1 because we need exclusive masks for
* the two processes. So, the last saved bit will be used by the other
* process.
*/
if (no_of_bits < 1 || no_of_bits > count_of_bits - 1) {
printf("Invalid input value for no_of_bits 'n'\n");
printf("Please Enter value in range 1 to %d\n",
       count_of_bits - 1);
return -1;
}

ret = get_shareable_mask(cache_type, &shareable_mask);

if (ret)
return ret;
/* Prepare cbm mask without any shareable bits */

for (i = 0; i < no_of_bits; i++) {
p1_mask <<= 1;
p1_mask |= 1;
}

p1_mask = ~shareable_mask & p1_mask;

p2_mask = ~p1_mask & long_mask;

return 0;
} -int cat_perf_miss_val(int cpu_no, int n, char *cache_type) +static int start_noisy_process(pid_t pid, int sibling_cpu_no) {

unsigned long l_mask, l_mask_1;

int ret, pipefd[2], sibling_cpu_no;

char pipe_message;

pid_t bm_pid;

int ret;

unsigned long buf_size = cache_size * 10;

cache_size = 0;
/* Taskset noisy process to specified cpu */

ret = taskset_benchmark(pid, sibling_cpu_no);

if (ret)
return ret;
/* Write noisy process to root con_mon grp in resctrl FS */

ret = write_bm_pid_to_resctrl(pid, "", "", "cat");

if (ret)
return ret;
/*
* Passing 'cat' will not loop around buffer forever, hence don't pass
* test name
*/
ret = run_fill_buf(buf_size, 1, 1, 0, "");

if (ret)
return ret;
return 0;
+}

+int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type) +{

int ret, sibling_cpu_no;

unsigned long buf_size;

pid_t critical_pid;

char schemata[64];

noisy_pid = 0;

critical_pid = getpid();

printf("# critical_pid: %d\n", critical_pid);

ret = remount_resctrlfs(true); if (ret) @@ -129,77 +194,43 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) if (!validate_resctrl_feature_request("cat")) return -1;

/* Get default cbm mask for L3/L2 cache */

ret = get_cbm_mask(cache_type);

ret = prepare_masks_for_two_processes(no_of_bits, cache_type); if (ret) return ret;

Global variables p1_mask and p2_mask are initialized from above and only used in this function. Would it not be simpler to just initialize and use them as local variables here?

...

long_mask = strtoul(cbm_mask, NULL, 16);
/*
* Change root con_mon grp schemata to be exclusive of critical process
* schemata to avoid any interference
*/
sprintf(schemata, "%lx", p2_mask);

ret = write_schemata("", schemata, cpu_no, "cat");

if (ret)
return ret;
/* Get L3/L2 cache size */ ret = get_cache_size(cpu_no, cache_type, &cache_size); if (ret) return ret;
printf("cache size :%lu\n", cache_size);

/* Get max number of bits from default-cabm mask */

count_of_bits = count_bits(long_mask);

if (n < 1 || n > count_of_bits - 1) {
printf("Invalid input value for no_of_bits n!\n");
printf("Please Enter value in range 1 to %d\n",
       count_of_bits - 1);
return -1;
}

/* Get core id from same socket for running another thread */

sibling_cpu_no = get_core_sibling(cpu_no);

if (sibling_cpu_no < 0)
return -1;
printf("# cache size: %lu\n", cache_size);

buf_size = cache_size * ((float)(no_of_bits) / count_of_bits);

Is all the parentheses and float necessary? The number of bits with which the cache can be partitioned should divide the cache evenly, no? How about: buf_size = cache_size / count_of_bits * no_of_bits

...

struct resctrl_val_param param = { .resctrl_val = "cat", .cpu_no = cpu_no, .mum_resctrlfs = 0, .setup = cat_setup,
.ctrlgrp	= "c1",
.filename	= RESULT_FILE_NAME,
.mask		= p1_mask,
.num_of_runs	= 0,
.span		= buf_size
};
l_mask = long_mask >> n;

l_mask_1 = ~l_mask & long_mask;

/* Set param values for parent thread which will be allocated bitmask
* with (max_bits - n) bits
*/
param.span = cache_size * (count_of_bits - n) / count_of_bits;

strcpy(param.ctrlgrp, "c2");

strcpy(param.mongrp, "m2");

strcpy(param.filename, RESULT_FILE_NAME2);

param.mask = l_mask;

param.num_of_runs = 0;

if (pipe(pipefd)) {
perror("# Unable to create pipe");
return errno;
}

bm_pid = fork();

/* Set param values for child thread which will be allocated bitmask
* with n bits
*/
if (bm_pid == 0) {
param.mask = l_mask_1;
strcpy(param.ctrlgrp, "c1");
strcpy(param.mongrp, "m1");
param.span = cache_size * n / count_of_bits;
strcpy(param.filename, RESULT_FILE_NAME1);
param.num_of_runs = 0;
param.cpu_no = sibling_cpu_no;
}

remove(param.filename);
ret = setup_critical_process(critical_pid, &param);

if (ret)
return ret;
ret = cat_val(&param);

ret = run_critical_process(critical_pid, &param); if (ret) return ret;

@@ -207,38 +238,51 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) if (ret) return ret;
if (bm_pid == 0) {
/* Tell parent that child is ready */
close(pipefd[0]);
pipe_message = 1;
if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
    sizeof(pipe_message)) {
	close(pipefd[1]);
	perror("# failed signaling parent process");
	return errno;
}
printf("# ran critical process without noisy process\n");
close(pipefd[1]);
while (1)
	;
/*
* Results from first run of critical process are already calculated
* and stored in 'avg_llc_perf_miss_single_thread'. Hence, delete the
* file, so that it could be reused for second run.
*/
cat_test_cleanup();

/* Get core id from same socket for running noisy process */

sibling_cpu_no = get_core_sibling(cpu_no);

if (sibling_cpu_no < 0)
return -1;
noisy_pid = fork();

if (noisy_pid == 0) {

This is confusing. Before the above "noisy_pid == 0" meant that we are dealing with the critical process. Now this changes, "noisy_pid == 0" means that we are dealing with the noisy process ... and below "noisy_pid != 0" means we are dealing with the critical process?

...

/*
 * Child is the noisy_process which runs in root con_mon grp by
 * default and hence no need to write pid to resctrl FS.
 * Schemata for root con_mon grp is also set above.
 */
printf("# noisy_pid: %d\n", getpid());
ret = start_noisy_process(getpid(), sibling_cpu_no);
exit(EXIT_SUCCESS);
} else if (noisy_pid == -1) {
return -1;
} else {
/* Parent waits for child to be ready. */
close(pipefd[1]);
pipe_message = 0;
while (pipe_message != 1) {
	if (read(pipefd[0], &pipe_message,
		 sizeof(pipe_message)) < sizeof(pipe_message)) {
		perror("# failed reading from child process");
		break;
	}
}
close(pipefd[0]);
kill(bm_pid, SIGKILL);
}
/*
 * Parent runs again. Sleep for a second here so that noisy
 * process gets to run before critical process
 */
sleep(1);
param.num_of_runs = 0;
ret = run_critical_process(critical_pid, &param);
if (ret)
	return ret;
cat_test_cleanup();

if (bm_pid)
umount_resctrlfs();
ret = check_results(&param);
if (ret)
	return ret;
ret = kill(noisy_pid, SIGKILL);
if (ret)
	printf("Failed to kill noisy_pid\n");
}
return 0; } diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 204ae8870a32..0500dab90b2e 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -139,7 +139,6 @@ static int fill_cache_write(char *resctrl_val) return 0; } -static int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) { unsigned char *start_ptr, *end_ptr; @@ -177,7 +176,33 @@ int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) return 0; } -static int use_buffer_forever(int op, char *resctrl_val) +int use_buffer_once(int op) +{
FILE *fp;

int ret = 0;

if (op == 0) {
ret = fill_one_span_read();
/* Consume result so that reading memory is not optimized */
fp = fopen("/dev/null", "w");
if (!fp)
	perror("Unable to write to /dev/null");
fprintf(fp, "Sum: %d ", ret);
fclose(fp);
ret = 0;
} else {
fill_one_span_write();
}

if (ret) {
printf("\n Error in fill cache read/write...\n");
return -1;
}

return 0;
+}

+int use_buffer_forever(int op, char *resctrl_val) { int ret; @@ -187,7 +212,7 @@ static int use_buffer_forever(int op, char *resctrl_val) ret = fill_cache_write(resctrl_val); if (ret) {
printf("\n Errror in fill cache read/write...\n");
printf("\n Error in fill cache read/write...\n");

Please remove this hunk. Two reasons: (1) only one logical change per patch, (2) a fix for this was already submitted upstream.

...

return -1;
} @@ -228,7 +253,7 @@ int run_fill_buf(unsigned long span, int malloc_and_init_memory, ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op, resctrl_val); if (ret) {
printf("\n Errror in fill cache\n");
printf("\n Error in fill cache\n");

Same comment as above.

...

return -1;
} diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index e320e79bc4d4..79148cbbd7a4 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -27,7 +27,7 @@ #define MB (1024 * 1024) #define RESCTRL_PATH "/sys/fs/resctrl" #define PHYS_ID_PATH "/sys/devices/system/cpu/cpu" -#define CBM_MASK_PATH "/sys/fs/resctrl/info" +#define INFO_PATH "/sys/fs/resctrl/info" #define PARENT_EXIT(err_msg) \ do { \ @@ -84,6 +84,9 @@ int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp, char *resctrl_val); int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags); +int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush); +int use_buffer_once(int op); +int use_buffer_forever(int op, char *resctrl_val); int run_fill_buf(unsigned long span, int malloc_and_init_memory, int memflush, int op, char *resctrl_va); int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param); @@ -93,9 +96,11 @@ void mbm_test_cleanup(void); int mba_schemata_change(int cpu_no, char *bw_report, char **benchmark_cmd); void mba_test_cleanup(void); int get_cbm_mask(char *cache_type); +int get_shareable_mask(char *cache_type, unsigned long *shareable_mask); int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size); void ctrlc_handler(int signum, siginfo_t *info, void *ptr); -int cat_val(struct resctrl_val_param *param); +int setup_critical_process(pid_t pid, struct resctrl_val_param *param); +int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 84a436e0775c..60db128312a6 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -192,8 +192,8 @@ int main(int argc, char **argv) printf("# Starting CAT test ...\n"); res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); printf("%sok CAT: test\n", res ? "not " : "");
tests_run++;
cat_test_cleanup();
tests_run++;
}

What is the benefit of this change? If you want to do cleanup like this then it would be great to separate it into a different patch to keep logical changes together and make this patch easier to review.

...

printf("1..%d\n", tests_run); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 465faaad3239..52452bb0178a 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -215,7 +215,7 @@ int get_cbm_mask(char *cache_type) char cbm_mask_path[1024]; FILE *fp;

sprintf(cbm_mask_path, "%s/%s/cbm_mask", CBM_MASK_PATH, cache_type);

sprintf(cbm_mask_path, "%s/%s/cbm_mask", INFO_PATH, cache_type);

This could also be a separate patch.

...

fp = fopen(cbm_mask_path, "r"); if (!fp) { @@ -235,6 +235,38 @@ int get_cbm_mask(char *cache_type) } /*

get_shareable_mask - Get shareable mask from shareable_bits for given cache

@cache_type: Cache level L2/L3

@shareable_mask: Mask is returned as unsigned long value

Return: = 0 on success, < 0 on failure.

*/

+int get_shareable_mask(char *cache_type, unsigned long *shareable_mask) +{
char shareable_bits_file[1024];

FILE *fp;

sprintf(shareable_bits_file, "%s/%s/shareable_bits", INFO_PATH,
cache_type);
fp = fopen(shareable_bits_file, "r");

if (!fp) {
perror("Failed to open shareable_bits file");
return -1;
}

if (fscanf(fp, "%lx", shareable_mask) <= 0) {
perror("Could not get shareable bits");
fclose(fp);
return -1;
}

fclose(fp);

return 0;
+}

+/*

get_core_sibling - Get sibling core id from the same socket for given CPU

@cpu_no: CPU number

Apart from the code comments I do remain interested in how this test performs on different systems under different load to ensure that the hardware prefetcher does not interfere with the results. If you do have assumptions/requirements in this area ("This has to run on an idle system") then it should be added to at least the README.

As a sidenote when I looked at the README it seems to not take these cache tests into account ... it reads "Currently it supports Memory Bandwidth Monitoring test and Memory Bandwidth Allocation test on Intel RDT hardware. More tests will be added in the future."

Reinette

Sai Praneeth Prakhya

11 Mar 11 Mar

1:59 a.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

Hi Reinette,

On Tue, 2020-03-10 at 15:14 -0700, Reinette Chatre wrote:

...

Hi Sai,

Not just specific to this patch but I think the prevalent use of global variables that are initialized/used or allocated/released from a variety of places within the code is creating traps. I seemed to have stumbled on a few during this review so far but it is hard to keep track of and I am not confident that I caught them all. Having the code be symmetrical (allocate and free from same area or initialize and use from same area) does help to avoid such complexity.

Sure! makes sense. I will try to wrap them up in some meaningful structures to pass around functions and will see if everything still works as expected. If not, I will comment why a particular variable needs to be global.

...

This patch and the patch that follows are both quite large and difficult to keep track of all the collected changes. There seems to be opportunity for separating it into logical changes. Some of my comments may be just because I could not keep track of all that is changed at the same time.

Ok.. makes sense. The main reason this patch and the next patch are large because they do two things 1. Remove previous CAT/CQM test case 2. Add new CAT/CQM test cases

Since the new test cases are not just logical extensions or fixing some bugs in previous test cases, the patch might not be readable. I am thinking to split this at-least like this 1. A patch to remove CAT test case 2. A patch to remove CQM test case 3. Patches that just add CAT and CQM (without other changes)

Please let me know if you think otherwise

...

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
The present CAT test case, spawns two processes that run in two different control groups with exclusive schemata and both the processes read a buffer from memory only once. Before reading the buffer, perf miss count is cleared and perf miss count is calculated for the read. Since the processes are reading through the buffer only once and initially all the buffer is in memory perf miss count will always be the same regardless of the cache size allocated by CAT to these processes. So, the test isn't testing CAT. Fix this issue by changing the CAT test case.

The updated CAT test runs a "critical" process with exclusive schemata that reads a buffer (same as the size of allocated cache) multiple times there-by utilizing the allocated cache and calculates perf miss rate for

Transitioning the description from "perf miss count" to "perf miss rate" is subtle. It would be valuable to elaborate what is meant with "perf miss rate".

...
every read of the buffer. The average of this perf miss rate is saved. This value indicates the critical process self induced misses. Now, the "critical" process runs besides a "noisy" neighbor that is reading a buffer that is 10 times the size of LLC and both the processes are in different control groups with exclusive schematas. The average perf miss rate for "critical" process is calculated again and compared with the earlier value. If the difference between both these values is greater than 5% it means that "noisy" neighbor does have impact on "critical" process which means CAT is not working as expected and hence the test fails.

Reported-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Tony Luck tony.luck@intel.com Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/cache.c | 167 ++++++++----- tools/testing/selftests/resctrl/cat_test.c | 312 ++++++++++++++-----

tools/testing/selftests/resctrl/fill_buf.c | 33 ++- tools/testing/selftests/resctrl/resctrl.h | 9 +- tools/testing/selftests/resctrl/resctrl_tests.c | 2 +- tools/testing/selftests/resctrl/resctrlfs.c | 34 ++- 6 files changed, 352 insertions(+), 205 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index be60d7d3f066..e30cdd7b851c 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -10,9 +10,9 @@ struct read_format { } values[2]; }; -static struct perf_event_attr pea_llc_miss; +static struct perf_event_attr pea_llc_miss, pea_llc_access; static struct read_format rf_cqm; -static int fd_lm; +static int fd_lm, fd_la; char llc_occup_path[1024]; static void initialize_perf_event_attr(void) @@ -27,15 +27,30 @@ static void initialize_perf_event_attr(void) pea_llc_miss.inherit = 1; pea_llc_miss.exclude_guest = 1; pea_llc_miss.disabled = 1;

pea_llc_access.type = PERF_TYPE_HARDWARE;

pea_llc_access.size = sizeof(struct perf_event_attr);

pea_llc_access.read_format = PERF_FORMAT_GROUP;

pea_llc_access.exclude_kernel = 1;

pea_llc_access.exclude_hv = 1;

pea_llc_access.exclude_idle = 1;

pea_llc_access.exclude_callchain_kernel = 1;

pea_llc_access.inherit = 1;

pea_llc_access.exclude_guest = 1;

pea_llc_access.disabled = 1;

This initialization appears to duplicate the initialization done above. Perhaps this function could be a wrapper that calls an initialization function with pointer to perf_event_attr that initializes structure the same?

I did think about a wrapper but since pea_llc_access and pea_llc_miss are global variables, I thought passing them as variables might not look good (why do we want to pass a global variable?). I will try and see if I can make these local variables.

...

...
} static void ioctl_perf_event_ioc_reset_enable(void) { ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0); ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0);

ioctl(fd_la, PERF_EVENT_IOC_RESET, 0);

ioctl(fd_la, PERF_EVENT_IOC_ENABLE, 0);

}

Here is more duplication.

Ok.. will fix it.

...

...
-static int perf_event_open_llc_miss(pid_t pid, int cpu_no) +static int perf_event_open_llc_miss_rate(pid_t pid, int cpu_no) { fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1, PERF_FLAG_FD_CLOEXEC); @@ -45,29 +60,40 @@ static int perf_event_open_llc_miss(pid_t pid, int cpu_no) return -1; }
fd_la = perf_event_open(&pea_llc_access, pid, cpu_no, fd_lm,
		PERF_FLAG_FD_CLOEXEC);
if (fd_la == -1) {
perror("Error opening member");
ctrlc_handler(0, NULL, NULL);
return -1;
Should fd_lm not be closed on this error path?

That's right. will fix it.

...

...
}

return 0;

} -static int initialize_llc_perf(void) +static void initialize_llc_perf(void) { memset(&pea_llc_miss, 0, sizeof(struct perf_event_attr));

memset(&pea_llc_access, 0, sizeof(struct perf_event_attr)); memset(&rf_cqm, 0, sizeof(struct read_format));

/* Initialize perf_event_attr structures for HW_CACHE_MISSES */
/*
* Initialize perf_event_attr structures for HW_CACHE_MISSES and
* HW_CACHE_REFERENCES
*/
initialize_perf_event_attr();
pea_llc_miss.config = PERF_COUNT_HW_CACHE_MISSES;

pea_llc_access.config = PERF_COUNT_HW_CACHE_REFERENCES;

rf_cqm.nr = 1;

return 0;

rf_cqm.nr = 2;

} static int reset_enable_llc_perf(pid_t pid, int cpu_no) { int ret = 0;

ret = perf_event_open_llc_miss(pid, cpu_no);

ret = perf_event_open_llc_miss_rate(pid, cpu_no); if (ret < 0) return ret;

@@ -78,21 +104,21 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no) } /*

get_llc_perf: llc cache miss through perf events

@cpu_no: CPU number that the benchmark PID is binded to

get_llc_perf_miss_rate: llc cache miss rate through perf events
Could "llc" be "LLC" to be consistent with below?

Sure! will fix it.

...

...

@cpu_no: CPU number that the benchmark PID is

binded to

Perf events like HW_CACHE_MISSES could be used to validate number of

cache lines allocated.

Perf events like HW_CACHE_MISSES and HW_CACHE_REFERENCES could be used

to

approximate LLc occupancy under controlled environment

s/LLc/LLC/

Sure! my bad.

...

...
Return: =0 on success. <0 on failure.

*/ -static int get_llc_perf(unsigned long *llc_perf_miss) +static int get_llc_perf_miss_rate(float *llc_perf_miss_rate) {

__u64 total_misses;

__u64 total_misses, total_references;

/* Stop counters after one span to get miss rate */

ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);

ioctl(fd_la, PERF_EVENT_IOC_DISABLE, 0);

if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) { perror("Could not get llc misses through perf"); @@ -100,11 +126,19 @@ static int get_llc_perf(unsigned long *llc_perf_miss) return -1; }
if (read(fd_la, &rf_cqm, sizeof(struct read_format)) == -1) {
perror("Could not get llc accesses through perf");
return -1;
It looks like the cleanup (closing of file descriptors) is omitted on this and the earlier error path.

True! Missed it, my bad! Will fix it.

...

...
}

total_misses = rf_cqm.values[0].value;

total_references = rf_cqm.values[1].value;

close(fd_lm);

close(fd_la);

*llc_perf_miss = total_misses;

*llc_perf_miss_rate = ((float)total_misses / total_references) * 100;

return 0; } @@ -176,15 +210,16 @@ static int print_results_cache(char *filename, int bm_pid, int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) {

unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0;

unsigned long llc_occu_resc = 0, llc_value = 0;

float llc_perf_miss_rate = 0; int ret;

/* Measure cache miss from perf */ if (!strcmp(param->resctrl_val, "cat")) {
ret = get_llc_perf(&llc_perf_miss);
ret = get_llc_perf_miss_rate(&llc_perf_miss_rate);
if (ret < 0) return ret;
llc_value = llc_perf_miss;
llc_value = llc_perf_miss_rate;
What is the benefit of llc_perf_miss_rate being of type float?

Good catch.. not really (as I think of it now). I think, I made it float while working on CQM test case with perf.

...

...
} /* Measure llc occupancy from resctrl */ @@ -202,66 +237,72 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) } /*

cache_val: execute benchmark and measure LLC occupancy

resctrl

and perf cache miss for the benchmark

@param: parameters passed to cache_val()
setup_critical_process: Bind given pid to given cpu and write the pid
		in requested resctrl FS location, set
schemata,
		initialize perf LLC counters and also
initialize
		fill buffer benchmark.
@pid: pid of the process

@param: Parameters passed to cache_val()
Return: 0 on success. non-zero on failure.

Return: 0 on success. non-zero on failure.

*/

-int cat_val(struct resctrl_val_param *param) +int setup_critical_process(pid_t pid, struct resctrl_val_param *param) {

int malloc_and_init_memory = 1, memflush = 1, operation = 0, ret = 0;

int ret = 0; char *resctrl_val = param->resctrl_val;

pid_t bm_pid;

char schemata[64];
if (strcmp(param->filename, "") == 0)
sprintf(param->filename, "stdio");
/* Taskset parent (critical process) to a specified cpu */

ret = taskset_benchmark(pid, param->cpu_no);

if (ret)
return ret;
bm_pid = getpid();
/* Write parent to specified con_mon grp, mon_grp in resctrl FS */

ret = write_bm_pid_to_resctrl(pid, param->ctrlgrp, param->mongrp,
		      resctrl_val);
if (ret)
return ret;
/* Taskset benchmark to specified cpu */

ret = taskset_benchmark(bm_pid, param->cpu_no);

sprintf(schemata, "%lx", param->mask);

ret = write_schemata(param->ctrlgrp, schemata, param->cpu_no, "cat"); if (ret) return ret;
/* Write benchmark to specified con_mon grp, mon_grp in resctrl FS */

ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
		      resctrl_val);
initialize_llc_perf();

ret = init_buffer(param->span, 1, 1); if (ret) return ret;
if ((strcmp(resctrl_val, "cat") == 0)) {
ret = initialize_llc_perf();
if (ret)
	return ret;
}
return 0;

+}

+int run_critical_process(pid_t pid, struct resctrl_val_param *param) +{

int ret = 0;

/* Test runs until the callback setup() tells the test to stop. */

/* Test runs until the callback setup() tells the test to stop */ while (1) {
if (strcmp(resctrl_val, "cat") == 0) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
	ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
	if (ret)
		break;
	if (run_fill_buf(param->span, malloc_and_init_memory,
			 memflush, operation, resctrl_val)) {
		fprintf(stderr, "Error-running fill
buffer\n");
		ret = -1;
		break;
	}
	sleep(1);
	ret = measure_cache_vals(param, bm_pid);
	if (ret)
		break;
} else {
ret = param->setup(param);
if (ret) {
	ret = 0;
	break;
}
ret = reset_enable_llc_perf(pid, param->cpu_no);
This is in a while(1) loop and it seems reset_enable_llc_perf() opens the file descriptors and reset then enable the counters. Would it not be more efficient to open the file descriptors outside of this while() loop and just reset/enable the counters within?

I did try this (i.e. open perf counters before while loop and in the loop just reset them before every run) but I wasn't able to get readings from perf counters. Hence, I started having open perf counters in the loop. But please note that I didn't debug why that was the case and I tried it really long ago (when I initially worked on this code, so things might have changed now). I will try again and see if it works.

...

...
if (ret)
	break;
/* Read buffer once */
if (use_buffer_once(0)) {
	fprintf(stderr, "Error-running fill buffer\n");
	ret = -1;
break;
}
If I understand correctly reset_enable_llc_perf() will open the perf file descriptors and start the measurement and measure_cache_vals() will read from the file descriptors to obtain the measurements. It seems that if use_buffer_once() fails that the perf file descriptors need to be closed?

Yes, that's right. Will fix it.

...

...
ret = measure_cache_vals(param, pid);
if (ret)
	break;
}
return ret; diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 046c7f285e72..f7a67f005fe5 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -11,70 +11,65 @@ #include "resctrl.h" #include <unistd.h> -#define RESULT_FILE_NAME1 "result_cat1" -#define RESULT_FILE_NAME2 "result_cat2" -#define NUM_OF_RUNS 5 -#define MAX_DIFF_PERCENT 4 -#define MAX_DIFF 1000000 +#define RESULT_FILE_NAME "result_cat" +#define NUM_OF_RUNS 10 +#define MAX_DIFF_PERCENT 5 -int count_of_bits; char cbm_mask[256]; -unsigned long long_mask; -unsigned long cache_size; -/*

Change schemata. Write schemata to specified

con_mon grp, mon_grp in resctrl FS.

Run 5 times in order to get average values.

*/

+static unsigned long avg_llc_perf_miss_rate_single_thread; +static unsigned long p1_mask, p2_mask;
If these _have_ to be global variables, could they have more descriptive names?

Sure! Will fix them.

...

...
+static unsigned long cache_size; +static pid_t noisy_pid; +static int count_of_bits;

+/* Run 5 times in order to get average values */

Seems like NUM_OF_RUNS above was changed to 10 so the above is no longer accurate. Perhaps just say "Run NUM_OF_RUNS times" ?

Sure!

...

...
static int cat_setup(struct resctrl_val_param *p) {

char schemata[64];

int ret = 0;

/* Run NUM_OF_RUNS times */ if (p->num_of_runs >= NUM_OF_RUNS) return -1;
if (p->num_of_runs == 0) {
sprintf(schemata, "%lx", p->mask);
ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
		     p->resctrl_val);
} p->num_of_runs++;

return ret;
return 0;

}
All of this complication does not seem to be necessary. This cat_setup() does not actually do any setup ... it seems to only exist to be able to break out of an infinite loop. Why not just eliminate this function and just run the loop within run_critical_process() NUM_OF_RUNS times?

Makes sense. Will change it.

...

...
-static void show_cache_info(unsigned long sum_llc_perf_miss, int no_of_bits,
	    unsigned long span)
+static void show_cache_info(unsigned long sum_llc_perf_miss_rate,
	    int no_of_bits, unsigned long span)
{

unsigned long allocated_cache_lines = span / 64;

unsigned long avg_llc_perf_miss = 0;

float diff_percent;
unsigned long avg_llc_perf_miss_rate = 0, diff_percent = 0;

avg_llc_perf_miss_rate = sum_llc_perf_miss_rate / (NUM_OF_RUNS - 1);

if (!noisy_pid) {
avg_llc_perf_miss_rate_single_thread = avg_llc_perf_miss_rate;
return;
}
avg_llc_perf_miss = sum_llc_perf_miss / (NUM_OF_RUNS - 1);

diff_percent = ((float)allocated_cache_lines - avg_llc_perf_miss) /
		allocated_cache_lines * 100;
diff_percent = labs(avg_llc_perf_miss_rate -
       avg_llc_perf_miss_rate_single_thread);
printf("%sok CAT: cache miss rate within %d%%\n",
      !is_amd && abs((int)diff_percent) > MAX_DIFF_PERCENT ?
printf("%sok CAT: cache miss rate difference within %d%%\n",
      !is_amd && diff_percent > MAX_DIFF_PERCENT ?
    "not " : "", MAX_DIFF_PERCENT);
tests_run++;

printf("# Percent diff=%d\n", abs((int)diff_percent)); printf("# Number of bits: %d\n", no_of_bits);

printf("# Avg_llc_perf_miss: %lu\n", avg_llc_perf_miss);

printf("# Allocated cache lines: %lu\n", allocated_cache_lines);
printf("# Buffer size: %lu\n", span);

printf("# Avg_llc_perf_miss_rate without noisy process: %lu%%\n",
      avg_llc_perf_miss_rate_single_thread);
printf("# Avg_llc_perf_miss_rate with noisy process: %lu%%\n",
      avg_llc_perf_miss_rate);
printf("# Percent diff: %lu\n", diff_percent);

tests_run++;
} static int check_results(struct resctrl_val_param *param) { char *token_array[8], temp[512];

unsigned long sum_llc_perf_miss = 0;

unsigned long sum_llc_perf_miss_rate = 0; int runs = 0, no_of_bits = 0; FILE *fp;

printf("# Checking for pass/fail\n");
if (noisy_pid)
printf("# Checking for pass/fail\n");
fp = fopen(param->filename, "r"); if (!fp) { perror("# Cannot open file");
@@ -90,37 +85,107 @@ static int check_results(struct resctrl_val_param *param) token_array[fields++] = token; token = strtok(NULL, ":\t"); }

/*

Discard the first value which is inaccurate due to

monitoring * setup transition phase. */
if (runs > 0)
	sum_llc_perf_miss += strtoul(token_array[3], NULL, 0);
runs++;
if (runs == 1)
	continue;
sum_llc_perf_miss_rate += strtoul(token_array[3], NULL, 0);
}
fclose(fp); no_of_bits = count_bits(param->mask);

show_cache_info(sum_llc_perf_miss, no_of_bits, param->span);

show_cache_info(sum_llc_perf_miss_rate, no_of_bits, param->span);

return 0; } void cat_test_cleanup(void) {

remove(RESULT_FILE_NAME1);

remove(RESULT_FILE_NAME2);

remove(RESULT_FILE_NAME);

+}

+static int prepare_masks_for_two_processes(int no_of_bits, char *cache_type)
It would be valuable to include comments that describe the goal of these masks. Some questions that may be asked when seeing this function ... Why are two masks needed? What are the differences between them? How are they intended to be used?

Ok.. makes sense. Will add comments. Just to keep things clear here (for others), the two masks are exclusive masks. The masks are exclusive because no other process should cause interference for the critical process.

...

...
+{
int ret, i;

unsigned long long_mask, shareable_mask;

/* Get default cbm mask for L3/L2 cache */

ret = get_cbm_mask(cache_type);

if (ret)
return ret;
/* Get max number of bits from default cbm mask */

long_mask = strtoul(cbm_mask, NULL, 16);

count_of_bits = count_bits(long_mask);

/*
* Max limit is count_of_bits - 1 because we need exclusive masks for
* the two processes. So, the last saved bit will be used by the other
* process.
*/
if (no_of_bits < 1 || no_of_bits > count_of_bits - 1) {
printf("Invalid input value for no_of_bits 'n'\n");
printf("Please Enter value in range 1 to %d\n",
       count_of_bits - 1);
return -1;
}

ret = get_shareable_mask(cache_type, &shareable_mask);

if (ret)
return ret;
/* Prepare cbm mask without any shareable bits */

for (i = 0; i < no_of_bits; i++) {
p1_mask <<= 1;
p1_mask |= 1;
}

p1_mask = ~shareable_mask & p1_mask;

p2_mask = ~p1_mask & long_mask;

return 0;
} -int cat_perf_miss_val(int cpu_no, int n, char *cache_type) +static int start_noisy_process(pid_t pid, int sibling_cpu_no) {

unsigned long l_mask, l_mask_1;

int ret, pipefd[2], sibling_cpu_no;

char pipe_message;

pid_t bm_pid;

int ret;

unsigned long buf_size = cache_size * 10;

cache_size = 0;
/* Taskset noisy process to specified cpu */

ret = taskset_benchmark(pid, sibling_cpu_no);

if (ret)
return ret;
/* Write noisy process to root con_mon grp in resctrl FS */

ret = write_bm_pid_to_resctrl(pid, "", "", "cat");

if (ret)
return ret;
/*
* Passing 'cat' will not loop around buffer forever, hence don't pass
* test name
*/
ret = run_fill_buf(buf_size, 1, 1, 0, "");

if (ret)
return ret;
return 0;
+}

+int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type) +{

int ret, sibling_cpu_no;

unsigned long buf_size;

pid_t critical_pid;

char schemata[64];

noisy_pid = 0;

critical_pid = getpid();

printf("# critical_pid: %d\n", critical_pid);

ret = remount_resctrlfs(true); if (ret) @@ -129,77 +194,43 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) if (!validate_resctrl_feature_request("cat")) return -1;

/* Get default cbm mask for L3/L2 cache */

ret = get_cbm_mask(cache_type);

ret = prepare_masks_for_two_processes(no_of_bits, cache_type); if (ret) return ret;
Global variables p1_mask and p2_mask are initialized from above and only used in this function. Would it not be simpler to just initialize and use them as local variables here?

Makes sense. Will change them.

...

...
long_mask = strtoul(cbm_mask, NULL, 16);
/*
* Change root con_mon grp schemata to be exclusive of critical
process
* schemata to avoid any interference
*/
sprintf(schemata, "%lx", p2_mask);

ret = write_schemata("", schemata, cpu_no, "cat");

if (ret)
return ret;
/* Get L3/L2 cache size */ ret = get_cache_size(cpu_no, cache_type, &cache_size); if (ret) return ret;
printf("cache size :%lu\n", cache_size);

/* Get max number of bits from default-cabm mask */

count_of_bits = count_bits(long_mask);

if (n < 1 || n > count_of_bits - 1) {
printf("Invalid input value for no_of_bits n!\n");
printf("Please Enter value in range 1 to %d\n",
       count_of_bits - 1);
return -1;
}

/* Get core id from same socket for running another thread */

sibling_cpu_no = get_core_sibling(cpu_no);

if (sibling_cpu_no < 0)
return -1;
printf("# cache size: %lu\n", cache_size);

buf_size = cache_size * ((float)(no_of_bits) / count_of_bits);
Is all the parentheses and float necessary? The number of bits with which the cache can be partitioned should divide the cache evenly, no? How about: buf_size = cache_size / count_of_bits * no_of_bits

Makes sense. Will fix it.

...

...
struct resctrl_val_param param = { .resctrl_val = "cat", .cpu_no = cpu_no, .mum_resctrlfs = 0, .setup = cat_setup,
.ctrlgrp	= "c1",
.filename	= RESULT_FILE_NAME,
.mask		= p1_mask,
.num_of_runs	= 0,
.span		= buf_size
};
l_mask = long_mask >> n;

l_mask_1 = ~l_mask & long_mask;

/* Set param values for parent thread which will be allocated bitmask
* with (max_bits - n) bits
*/
param.span = cache_size * (count_of_bits - n) / count_of_bits;

strcpy(param.ctrlgrp, "c2");

strcpy(param.mongrp, "m2");

strcpy(param.filename, RESULT_FILE_NAME2);

param.mask = l_mask;

param.num_of_runs = 0;

if (pipe(pipefd)) {
perror("# Unable to create pipe");
return errno;
}

bm_pid = fork();

/* Set param values for child thread which will be allocated bitmask
* with n bits
*/
if (bm_pid == 0) {
param.mask = l_mask_1;
strcpy(param.ctrlgrp, "c1");
strcpy(param.mongrp, "m1");
param.span = cache_size * n / count_of_bits;
strcpy(param.filename, RESULT_FILE_NAME1);
param.num_of_runs = 0;
param.cpu_no = sibling_cpu_no;
}

remove(param.filename);
ret = setup_critical_process(critical_pid, &param);

if (ret)
return ret;
ret = cat_val(&param);

ret = run_critical_process(critical_pid, &param); if (ret) return ret;

@@ -207,38 +238,51 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) if (ret) return ret;
if (bm_pid == 0) {
/* Tell parent that child is ready */
close(pipefd[0]);
pipe_message = 1;
if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
    sizeof(pipe_message)) {
	close(pipefd[1]);
	perror("# failed signaling parent process");
	return errno;
}
printf("# ran critical process without noisy process\n");
close(pipefd[1]);
while (1)
	;
/*
* Results from first run of critical process are already calculated
* and stored in 'avg_llc_perf_miss_single_thread'. Hence, delete the
* file, so that it could be reused for second run.
*/
cat_test_cleanup();

/* Get core id from same socket for running noisy process */

sibling_cpu_no = get_core_sibling(cpu_no);

if (sibling_cpu_no < 0)
return -1;
noisy_pid = fork();

if (noisy_pid == 0) {
This is confusing. Before the above "noisy_pid == 0" meant that we are dealing with the critical process. Now this changes, "noisy_pid == 0" means that we are dealing with the noisy process ... and below "noisy_pid != 0" means we are dealing with the critical process?

Sorry! for the confusion. It's because I have used the same variable "noisy_pid" for two different purposes. 1. To check if "noisy" process is started or not (i.e. before fork()) 2. To store the pid of "noisy" process (after fork())

I will use different variables so that the code might be clear.

...

...
/*
 * Child is the noisy_process which runs in root con_mon grp
by
 * default and hence no need to write pid to resctrl FS.
 * Schemata for root con_mon grp is also set above.
 */
printf("# noisy_pid: %d\n", getpid());
ret = start_noisy_process(getpid(), sibling_cpu_no);
exit(EXIT_SUCCESS);
} else if (noisy_pid == -1) {
return -1;
} else {
/* Parent waits for child to be ready. */
close(pipefd[1]);
pipe_message = 0;
while (pipe_message != 1) {
	if (read(pipefd[0], &pipe_message,
		 sizeof(pipe_message)) < sizeof(pipe_message))
{
		perror("# failed reading from child process");
		break;
	}
}
close(pipefd[0]);
kill(bm_pid, SIGKILL);
}
/*
 * Parent runs again. Sleep for a second here so that noisy
 * process gets to run before critical process
 */
sleep(1);
param.num_of_runs = 0;
ret = run_critical_process(critical_pid, &param);
if (ret)
	return ret;
cat_test_cleanup();

if (bm_pid)
umount_resctrlfs();
ret = check_results(&param);
if (ret)
	return ret;
ret = kill(noisy_pid, SIGKILL);
if (ret)
	printf("Failed to kill noisy_pid\n");
}
return 0; } diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 204ae8870a32..0500dab90b2e 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -139,7 +139,6 @@ static int fill_cache_write(char *resctrl_val) return 0; } -static int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) { unsigned char *start_ptr, *end_ptr; @@ -177,7 +176,33 @@ int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush) return 0; } -static int use_buffer_forever(int op, char *resctrl_val) +int use_buffer_once(int op) +{
FILE *fp;

int ret = 0;

if (op == 0) {
ret = fill_one_span_read();
/* Consume result so that reading memory is not optimized */
fp = fopen("/dev/null", "w");
if (!fp)
	perror("Unable to write to /dev/null");
fprintf(fp, "Sum: %d ", ret);
fclose(fp);
ret = 0;
} else {
fill_one_span_write();
}

if (ret) {
printf("\n Error in fill cache read/write...\n");
return -1;
}

return 0;
+}

+int use_buffer_forever(int op, char *resctrl_val) { int ret; @@ -187,7 +212,7 @@ static int use_buffer_forever(int op, char *resctrl_val) ret = fill_cache_write(resctrl_val); if (ret) {
printf("\n Errror in fill cache read/write...\n");
printf("\n Error in fill cache read/write...\n");
Please remove this hunk. Two reasons: (1) only one logical change per patch, (2) a fix for this was already submitted upstream.

Sure! makes sense. I wasn't aware that there is already a fix for this.

...

...
return -1;
} @@ -228,7 +253,7 @@ int run_fill_buf(unsigned long span, int malloc_and_init_memory, ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op, resctrl_val); if (ret) {
printf("\n Errror in fill cache\n");
printf("\n Error in fill cache\n");
Same comment as above.

...

...
return -1;
} diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index e320e79bc4d4..79148cbbd7a4 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -27,7 +27,7 @@ #define MB (1024 * 1024) #define RESCTRL_PATH "/sys/fs/resctrl" #define PHYS_ID_PATH "/sys/devices/system/cpu/cpu" -#define CBM_MASK_PATH "/sys/fs/resctrl/info" +#define INFO_PATH "/sys/fs/resctrl/info" #define PARENT_EXIT(err_msg) \ do { \ @@ -84,6 +84,9 @@ int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp, char *resctrl_val); int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags); +int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush); +int use_buffer_once(int op); +int use_buffer_forever(int op, char *resctrl_val); int run_fill_buf(unsigned long span, int malloc_and_init_memory, int memflush, int op, char *resctrl_va); int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param); @@ -93,9 +96,11 @@ void mbm_test_cleanup(void); int mba_schemata_change(int cpu_no, char *bw_report, char **benchmark_cmd); void mba_test_cleanup(void); int get_cbm_mask(char *cache_type); +int get_shareable_mask(char *cache_type, unsigned long *shareable_mask); int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size); void ctrlc_handler(int signum, siginfo_t *info, void *ptr); -int cat_val(struct resctrl_val_param *param); +int setup_critical_process(pid_t pid, struct resctrl_val_param *param); +int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 84a436e0775c..60db128312a6 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -192,8 +192,8 @@ int main(int argc, char **argv) printf("# Starting CAT test ...\n"); res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); printf("%sok CAT: test\n", res ? "not " : "");
tests_run++;
cat_test_cleanup();
tests_run++;
}
What is the benefit of this change?

Just wanted to keep the pattern same for all the test cases i.e. "tests_run" increments last.

...

If you want to do cleanup like this then it would be great to separate it into a different patch to keep logical changes together and make this patch easier to review.

Ok.. makes sense.

...

...
printf("1..%d\n", tests_run); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 465faaad3239..52452bb0178a 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -215,7 +215,7 @@ int get_cbm_mask(char *cache_type) char cbm_mask_path[1024]; FILE *fp;

sprintf(cbm_mask_path, "%s/%s/cbm_mask", CBM_MASK_PATH, cache_type);

sprintf(cbm_mask_path, "%s/%s/cbm_mask", INFO_PATH, cache_type);

This could also be a separate patch.

Ok.

...

...
fp = fopen(cbm_mask_path, "r"); if (!fp) { @@ -235,6 +235,38 @@ int get_cbm_mask(char *cache_type) } /*

get_shareable_mask - Get shareable mask from shareable_bits for given

cache

@cache_type: Cache level L2/L3

@shareable_mask: Mask is returned as unsigned long value

Return: = 0 on success, < 0 on failure.

*/

+int get_shareable_mask(char *cache_type, unsigned long *shareable_mask) +{
char shareable_bits_file[1024];

FILE *fp;

sprintf(shareable_bits_file, "%s/%s/shareable_bits", INFO_PATH,
cache_type);
fp = fopen(shareable_bits_file, "r");

if (!fp) {
perror("Failed to open shareable_bits file");
return -1;
}

if (fscanf(fp, "%lx", shareable_mask) <= 0) {
perror("Could not get shareable bits");
fclose(fp);
return -1;
}

fclose(fp);

return 0;
+}

+/*

get_core_sibling - Get sibling core id from the same socket for given

CPU

@cpu_no: CPU number
Apart from the code comments I do remain interested in how this test performs on different systems under different load to ensure that the hardware prefetcher does not interfere with the results. If you do have assumptions/requirements in this area ("This has to run on an idle system") then it should be added to at least the README.

Sure! I will add the assumption to README and will also get data with/without H/W prefetchers.

...

As a sidenote when I looked at the README it seems to not take these cache tests into account ... it reads "Currently it supports Memory Bandwidth Monitoring test and Memory Bandwidth Allocation test on Intel RDT hardware. More tests will be added in the future."

Ok. Will update the README file.

Regards, Sai

Reinette Chatre

5:03 p.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

Hi Sai,

On 3/10/2020 6:59 PM, Sai Praneeth Prakhya wrote:

...

On Tue, 2020-03-10 at 15:14 -0700, Reinette Chatre wrote:

...
Hi Sai,

Not just specific to this patch but I think the prevalent use of global variables that are initialized/used or allocated/released from a variety of places within the code is creating traps. I seemed to have stumbled on a few during this review so far but it is hard to keep track of and I am not confident that I caught them all. Having the code be symmetrical (allocate and free from same area or initialize and use from same area) does help to avoid such complexity.

Sure! makes sense. I will try to wrap them up in some meaningful structures to pass around functions and will see if everything still works as expected. If not, I will comment why a particular variable needs to be global.

...
This patch and the patch that follows are both quite large and difficult to keep track of all the collected changes. There seems to be opportunity for separating it into logical changes. Some of my comments may be just because I could not keep track of all that is changed at the same time.

Ok.. makes sense. The main reason this patch and the next patch are large because they do two things

Remove previous CAT/CQM test case

Add new CAT/CQM test cases

Since the new test cases are not just logical extensions or fixing some bugs in previous test cases, the patch might not be readable. I am thinking to split this at-least like this

A patch to remove CAT test case

A patch to remove CQM test case

Patches that just add CAT and CQM (without other changes)

Please let me know if you think otherwise

I think this patch can be split up into logical changes without breaking the tests along the way. In my original review I identified two changes that can be split out. Other things that can be split out: - have CAT test take shareable bits into account - enable measurement of cache references (addition of this new perf event attribute, hooks to get measurements, etc.) - transition CAT test to use "perf rate" measurement instead of "perf count" - etc.

...

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

[SNIP]

...

...
...
-static struct perf_event_attr pea_llc_miss; +static struct perf_event_attr pea_llc_miss, pea_llc_access; static struct read_format rf_cqm; -static int fd_lm; +static int fd_lm, fd_la; char llc_occup_path[1024]; static void initialize_perf_event_attr(void) @@ -27,15 +27,30 @@ static void initialize_perf_event_attr(void) pea_llc_miss.inherit = 1; pea_llc_miss.exclude_guest = 1; pea_llc_miss.disabled = 1;

pea_llc_access.type = PERF_TYPE_HARDWARE;

pea_llc_access.size = sizeof(struct perf_event_attr);

pea_llc_access.read_format = PERF_FORMAT_GROUP;

pea_llc_access.exclude_kernel = 1;

pea_llc_access.exclude_hv = 1;

pea_llc_access.exclude_idle = 1;

pea_llc_access.exclude_callchain_kernel = 1;

pea_llc_access.inherit = 1;

pea_llc_access.exclude_guest = 1;

pea_llc_access.disabled = 1;

This initialization appears to duplicate the initialization done above. Perhaps this function could be a wrapper that calls an initialization function with pointer to perf_event_attr that initializes structure the same?

I did think about a wrapper but since pea_llc_access and pea_llc_miss are global variables, I thought passing them as variables might not look good (why do we want to pass a global variable?). I will try and see if I can make these local variables.

My goal was to avoid the duplicated code to initialize them identically. It is not clear to me why you think that would not look good. Perhaps I have not thought it through correctly ...

Reinette

Sai Praneeth Prakhya

7:14 p.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

Hi Reinette,

On Wed, 2020-03-11 at 10:03 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/10/2020 6:59 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 15:14 -0700, Reinette Chatre wrote:

...
Hi Sai,

Not just specific to this patch but I think the prevalent use of global variables that are initialized/used or allocated/released from a variety of places within the code is creating traps. I seemed to have stumbled on a few during this review so far but it is hard to keep track of and I am not confident that I caught them all. Having the code be symmetrical (allocate and free from same area or initialize and use from same area) does help to avoid such complexity.

Sure! makes sense. I will try to wrap them up in some meaningful structures to pass around functions and will see if everything still works as expected. If not, I will comment why a particular variable needs to be global.

...
This patch and the patch that follows are both quite large and difficult to keep track of all the collected changes. There seems to be opportunity for separating it into logical changes. Some of my comments may be just because I could not keep track of all that is changed at the same time.

Ok.. makes sense. The main reason this patch and the next patch are large because they do two things

Remove previous CAT/CQM test case

Add new CAT/CQM test cases

Since the new test cases are not just logical extensions or fixing some bugs in previous test cases, the patch might not be readable. I am thinking to split this at-least like this

A patch to remove CAT test case

A patch to remove CQM test case

Patches that just add CAT and CQM (without other changes)

Please let me know if you think otherwise

I think this patch can be split up into logical changes without breaking the tests along the way. In my original review I identified two changes that can be split out. Other things that can be split out:

have CAT test take shareable bits into account

enable measurement of cache references (addition of this new perf

event attribute, hooks to get measurements, etc.)

transition CAT test to use "perf rate" measurement instead of "perf count"

etc.

I think we could split the patch like this but I am unable to see the benefit of doing so.. (Sorry! if I misunderstood what you meant).

As CAT and CQM test cases are buggy (CAT is not testing CAT at all) and we are not attempting to fix them by incremental changes but completely changing the test plan itself (i.e. the way the test works), so why not just remove older test cases and add new test? I thought this might be more easier for review i.e. to see the new test case all at once. Don't you think so?

...

...
...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

[SNIP]

...
...
...
-static struct perf_event_attr pea_llc_miss; +static struct perf_event_attr pea_llc_miss, pea_llc_access; static struct read_format rf_cqm; -static int fd_lm; +static int fd_lm, fd_la; char llc_occup_path[1024]; static void initialize_perf_event_attr(void) @@ -27,15 +27,30 @@ static void initialize_perf_event_attr(void) pea_llc_miss.inherit = 1; pea_llc_miss.exclude_guest = 1; pea_llc_miss.disabled = 1;

pea_llc_access.type = PERF_TYPE_HARDWARE;

pea_llc_access.size = sizeof(struct perf_event_attr);

pea_llc_access.read_format = PERF_FORMAT_GROUP;

pea_llc_access.exclude_kernel = 1;

pea_llc_access.exclude_hv = 1;

pea_llc_access.exclude_idle = 1;

pea_llc_access.exclude_callchain_kernel = 1;

pea_llc_access.inherit = 1;

pea_llc_access.exclude_guest = 1;

pea_llc_access.disabled = 1;

This initialization appears to duplicate the initialization done above. Perhaps this function could be a wrapper that calls an initialization function with pointer to perf_event_attr that initializes structure the same?

I did think about a wrapper but since pea_llc_access and pea_llc_miss are global variables, I thought passing them as variables might not look good (why do we want to pass a global variable?). I will try and see if I can make these local variables.

My goal was to avoid the duplicated code to initialize them identically.

I agree that duplicate should always be avoided.

...

It is not clear to me why you think that would not look good.

I didn't mean that avoiding duplicate code doesn't look good.. what I meant was passing global variables around will not look good.

...

Perhaps I have not thought it through correctly ...

No.. I think the right thing to do here is not use global variable and hence avoid duplicate code.

Regards, Sai

Reinette Chatre

8:22 p.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

Hi Sai,

On 3/11/2020 12:14 PM, Sai Praneeth Prakhya wrote:

...

On Wed, 2020-03-11 at 10:03 -0700, Reinette Chatre wrote:

...
On 3/10/2020 6:59 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 15:14 -0700, Reinette Chatre wrote:

...
Hi Sai,

Not just specific to this patch but I think the prevalent use of global variables that are initialized/used or allocated/released from a variety of places within the code is creating traps. I seemed to have stumbled on a few during this review so far but it is hard to keep track of and I am not confident that I caught them all. Having the code be symmetrical (allocate and free from same area or initialize and use from same area) does help to avoid such complexity.

Sure! makes sense. I will try to wrap them up in some meaningful structures to pass around functions and will see if everything still works as expected. If not, I will comment why a particular variable needs to be global.

...
This patch and the patch that follows are both quite large and difficult to keep track of all the collected changes. There seems to be opportunity for separating it into logical changes. Some of my comments may be just because I could not keep track of all that is changed at the same time.

Ok.. makes sense. The main reason this patch and the next patch are large because they do two things

Remove previous CAT/CQM test case

Add new CAT/CQM test cases

Since the new test cases are not just logical extensions or fixing some bugs in previous test cases, the patch might not be readable. I am thinking to split this at-least like this

A patch to remove CAT test case

A patch to remove CQM test case

Patches that just add CAT and CQM (without other changes)

Please let me know if you think otherwise

I think this patch can be split up into logical changes without breaking the tests along the way. In my original review I identified two changes that can be split out. Other things that can be split out:

have CAT test take shareable bits into account

enable measurement of cache references (addition of this new perf

event attribute, hooks to get measurements, etc.)

transition CAT test to use "perf rate" measurement instead of "perf count"

etc.

I think we could split the patch like this but I am unable to see the benefit of doing so.. (Sorry! if I misunderstood what you meant).

Separating patches into logical changes facilitates review. Please consider this huge patch from the reviewer's perspective - it consists out of many different changes and is hard to review. If instead this patch was split into logical changes it would make it easier to understand what it is trying to do/change.

This is not a request that I invent but part of the established kernel development process. Please see Documentation/process/submitting-patches.rst (section is titled "Separate your changes").

...

As CAT and CQM test cases are buggy (CAT is not testing CAT at all) and we are not attempting to fix them by incremental changes but completely changing the test plan itself (i.e. the way the test works), so why not just remove older test cases and add new test? I thought this might be more easier for review i.e. to see the new test case all at once. Don't you think so?

...

From what I understand the new test continues to use many parts of the

original test. Completely removing the original test would thus end up needing to add back a lot of code that was removed. Incremental changes do seem appropriate to me. The logical changes I listed above actually has nothing to do with "the way the test works". When those building blocks are in place the test can be changed in one patch and it would be much more obvious how the new test is different from the original.

...

...
...
...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

[SNIP]

...
...
...
-static struct perf_event_attr pea_llc_miss; +static struct perf_event_attr pea_llc_miss, pea_llc_access; static struct read_format rf_cqm; -static int fd_lm; +static int fd_lm, fd_la; char llc_occup_path[1024]; static void initialize_perf_event_attr(void) @@ -27,15 +27,30 @@ static void initialize_perf_event_attr(void) pea_llc_miss.inherit = 1; pea_llc_miss.exclude_guest = 1; pea_llc_miss.disabled = 1;

pea_llc_access.type = PERF_TYPE_HARDWARE;

pea_llc_access.size = sizeof(struct perf_event_attr);

pea_llc_access.read_format = PERF_FORMAT_GROUP;

pea_llc_access.exclude_kernel = 1;

pea_llc_access.exclude_hv = 1;

pea_llc_access.exclude_idle = 1;

pea_llc_access.exclude_callchain_kernel = 1;

pea_llc_access.inherit = 1;

pea_llc_access.exclude_guest = 1;

pea_llc_access.disabled = 1;

This initialization appears to duplicate the initialization done above. Perhaps this function could be a wrapper that calls an initialization function with pointer to perf_event_attr that initializes structure the same?

I did think about a wrapper but since pea_llc_access and pea_llc_miss are global variables, I thought passing them as variables might not look good (why do we want to pass a global variable?). I will try and see if I can make these local variables.

My goal was to avoid the duplicated code to initialize them identically.

I agree that duplicate should always be avoided.

...
It is not clear to me why you think that would not look good.

I didn't mean that avoiding duplicate code doesn't look good.. what I meant was passing global variables around will not look good.

...
Perhaps I have not thought it through correctly ...

No.. I think the right thing to do here is not use global variable and hence avoid duplicate code.

ok.

Thank you

Reinette

Sai Praneeth Prakhya

8:55 p.m.

New subject: [PATCH V1 10/13] selftests/resctrl: Change Cache Allocation Technology (CAT) test

Hi Reinette,

On Wed, 2020-03-11 at 13:22 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/11/2020 12:14 PM, Sai Praneeth Prakhya wrote:

...
On Wed, 2020-03-11 at 10:03 -0700, Reinette Chatre wrote:

...
On 3/10/2020 6:59 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 15:14 -0700, Reinette Chatre wrote:

...
Hi Sai,

[SNIP]

...

...
...
...
Please let me know if you think otherwise

I think this patch can be split up into logical changes without breaking the tests along the way. In my original review I identified two changes that can be split out. Other things that can be split out:

have CAT test take shareable bits into account

enable measurement of cache references (addition of this new perf

event attribute, hooks to get measurements, etc.)

transition CAT test to use "perf rate" measurement instead of "perf

count"

etc.

I think we could split the patch like this but I am unable to see the benefit of doing so.. (Sorry! if I misunderstood what you meant).

Separating patches into logical changes facilitates review. Please consider this huge patch from the reviewer's perspective - it consists out of many different changes and is hard to review. If instead this patch was split into logical changes it would make it easier to understand what it is trying to do/change.

Ok.. makes sense.

...

This is not a request that I invent but part of the established kernel development process. Please see Documentation/process/submitting-patches.rst (section is titled "Separate your changes").

Sure! will take a look at it.

...

...
As CAT and CQM test cases are buggy (CAT is not testing CAT at all) and we are not attempting to fix them by incremental changes but completely changing the test plan itself (i.e. the way the test works), so why not just remove older test cases and add new test? I thought this might be more easier for review i.e. to see the new test case all at once. Don't you think so?

From what I understand the new test continues to use many parts of the original test. Completely removing the original test would thus end up needing to add back a lot of code that was removed. Incremental changes do seem appropriate to me. The logical changes I listed above actually has nothing to do with "the way the test works". When those building blocks are in place the test can be changed in one patch and it would be much more obvious how the new test is different from the original.

Ok.. makes sense. Will split the patch as you suggested.

Regards, Sai

Sai Praneeth Prakhya

7 Mar 7 Mar

3:40 a.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

The present CQM test runs fill_buff continuously with some user specified buffer size and reads cqm_llc_occupancy every 1 second and tests if resctrl reported value is in 15% range of buffer that fill_buff is working on. If the difference is greater than 15% the test fails. This test assumes that the buffer fill_buff is working on will be identity mapped into cache from memory i.e. there won't be any overlap. But that might not always be true because of the way cache indexing works (two physical addresses could get indexed into the same cache line). If this happens, cqm_llc_occupancy will be less than buffer size and we cannot guarantee the percentage by which this might be less. Another issue with the test case is that, although it has 15% of guard band, the cache occupied by code (or other parts) of the process may not be within this range. While we are actively looking into approximating llc_occupancy through perf, fix this test case with the help of CAT.

The new CQM test runs fill_buff continuously with a buffer size that is much greater than cache size and uses CAT to change schemata (from 1 bit to max_bits available without shareable bits). For every change in schemata, it then averages cqm_llc_occupancy and checks if it is less than allocated cache size (with 5% guard band). If the average cqm_llc_occupancy is less than allocated cache size, the test passes. Please note that there is no lower bound on the expected cqm_llc_occupancy because presently that cannot be determined.

Note: The new test case assumes that 1. The system supports CAT 2. CAT is working as expected on the system

Reported-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Tony Luck tony.luck@intel.com Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cache.c | 4 + tools/testing/selftests/resctrl/cqm_test.c | 203 +++++++++++++++--------- tools/testing/selftests/resctrl/resctrl.h | 3 +- tools/testing/selftests/resctrl/resctrl_tests.c | 6 +- tools/testing/selftests/resctrl/resctrl_val.c | 22 +-- tools/testing/selftests/resctrl/resctrlfs.c | 6 +- 6 files changed, 143 insertions(+), 101 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index e30cdd7b851c..ca794ad6fcfc 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -224,11 +224,15 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid)

/* Measure llc occupancy from resctrl */ if (!strcmp(param->resctrl_val, "cqm")) { + /* Sleep for a second so that benchmark gets to run */ + sleep(1); + ret = get_llc_occu_resctrl(&llc_occu_resc); if (ret < 0) return ret; llc_value = llc_occu_resc; } + ret = print_results_cache(param->filename, bm_pid, llc_value); if (ret) return ret; diff --git a/tools/testing/selftests/resctrl/cqm_test.c b/tools/testing/selftests/resctrl/cqm_test.c index f27b0363e518..3406c04ff110 100644 --- a/tools/testing/selftests/resctrl/cqm_test.c +++ b/tools/testing/selftests/resctrl/cqm_test.c @@ -13,73 +13,74 @@

#define RESULT_FILE_NAME "result_cqm" #define NUM_OF_RUNS 5 -#define MAX_DIFF 2000000 -#define MAX_DIFF_PERCENT 15 +#define MAX_DIFF_PERCENT 5

-int count_of_bits; char cbm_mask[256]; -unsigned long long_mask; -unsigned long cache_size;

-static int cqm_setup(struct resctrl_val_param *p) -{ - /* Run NUM_OF_RUNS times */ - if (p->num_of_runs >= NUM_OF_RUNS) - return -1; - - p->num_of_runs++; - - return 0; -} +static int count_of_bits, count_of_shareable_bits; +static unsigned long cache_size, shareable_mask;

-static void show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits, - unsigned long span) +static void show_cache_info(unsigned long *llc_occu_resc) { - unsigned long avg_llc_occu_resc = 0; - float diff_percent; - long avg_diff = 0; - bool res; + int bits, runs; + bool failed = false;

- avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1); - avg_diff = (long)abs(span - avg_llc_occu_resc); - - diff_percent = (((float)span - avg_llc_occu_resc) / span) * 100; + printf("# Results are displayed in (Bytes)\n");

- if ((abs((int)diff_percent) <= MAX_DIFF_PERCENT) || - (abs(avg_diff) <= MAX_DIFF)) - res = true; - else - res = false; + for (bits = 0; bits < count_of_bits - count_of_shareable_bits; bits++) { + unsigned long avg_llc_occu_resc, sum_llc_occu_resc = 0; + unsigned long alloc_llc, mask = 0; + int llc_occu_diff, i; + + /* + * The first run is discarded due to inaccurate value from + * phase transition. + */ + for (runs = NUM_OF_RUNS * bits + 1; + runs < NUM_OF_RUNS * bits + NUM_OF_RUNS; runs++) + sum_llc_occu_resc += llc_occu_resc[runs]; + + avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1); + alloc_llc = cache_size * ((float)(bits + 1) / count_of_bits); + llc_occu_diff = avg_llc_occu_resc - alloc_llc; + + for (i = 0; i < bits + 1; i++) { + mask <<= 1; + mask |= 1; + }

- printf("%sok CQM: diff within %d, %d%%\n", res ? "" : "not ", - MAX_DIFF, (int)MAX_DIFF_PERCENT); + if (llc_occu_diff > 0 && + llc_occu_diff > alloc_llc * ((float)MAX_DIFF_PERCENT / 100)) + failed = true;

- printf("# diff: %ld\n", avg_diff); - printf("# percent diff=%d\n", abs((int)diff_percent)); - printf("# Results are displayed in (Bytes)\n"); - printf("# Number of bits: %d\n", no_of_bits); - printf("# Avg_llc_occu_resc: %lu\n", avg_llc_occu_resc); - printf("# llc_occu_exp (span): %lu\n", span); + printf("%sok CQM: diff within %d%% for mask %lx\n", + failed ? "not " : "", MAX_DIFF_PERCENT, mask); + printf("# alloc_llc_cache_size: %lu\n", alloc_llc); + printf("# avg_llc_occu_resc: %lu\n", avg_llc_occu_resc); + tests_run++; + }

+ printf("%sok schemata change for CQM%s\n", failed ? "not " : "", + failed ? " # at least one test failed" : ""); tests_run++; }

-static int check_results(struct resctrl_val_param *param, int no_of_bits) +static int check_results(void) { - char *token_array[8], temp[512]; - unsigned long sum_llc_occu_resc = 0; - int runs = 0; + char *token_array[8], output[] = RESULT_FILE_NAME, temp[512]; + unsigned long llc_occu_resc[count_of_bits * NUM_OF_RUNS]; + int runs; FILE *fp;

- printf("# checking for pass/fail\n"); - fp = fopen(param->filename, "r"); + fp = fopen(output, "r"); if (!fp) { - perror("# Error in opening file\n"); + perror(output);

return errno; }

- while (fgets(temp, 1024, fp)) { + runs = 0; + while (fgets(temp, sizeof(temp), fp)) { char *token = strtok(temp, ":\t"); int fields = 0;

@@ -88,13 +89,14 @@ static int check_results(struct resctrl_val_param *param, int no_of_bits) token = strtok(NULL, ":\t"); }

- /* Field 3 is llc occ resc value */ - if (runs > 0) - sum_llc_occu_resc += strtoul(token_array[3], NULL, 0); + /* Field 3 is resctrl LLC occupancy value */ + llc_occu_resc[runs] = strtoul(token_array[3], NULL, 0); runs++; } + fclose(fp); - show_cache_info(sum_llc_occu_resc, no_of_bits, param->span); + + show_cache_info(llc_occu_resc);

return 0; } @@ -104,62 +106,107 @@ void cqm_test_cleanup(void) remove(RESULT_FILE_NAME); }

-int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd) +/* + * Change schemata from 1 to count_of_bits - 1. Write schemata to specified + * con_mon grp, mon_grp in resctrl FS. For each allocation, run "NUM_OF_RUNS" + * times to get average values. + */ +static int cqm_setup(struct resctrl_val_param *p) { - int ret, mum_resctrlfs; + static int runs_per_allocation = 0, num_of_bits = 1; + unsigned long mask = 0; + char schemata[64]; + int i, ret;

- cache_size = 0; - mum_resctrlfs = 1; + if (runs_per_allocation >= NUM_OF_RUNS) + runs_per_allocation = 0;

- ret = remount_resctrlfs(mum_resctrlfs); - if (ret) - return ret; + /* Only set up schemata once every NUM_OF_RUNS of allocations */ + if (runs_per_allocation++ != 0) + return 0;

- if (!validate_resctrl_feature_request("cqm")) + if (num_of_bits > count_of_bits - count_of_shareable_bits) return -1;

- ret = get_cbm_mask("L3"); - if (ret) - return ret; - - long_mask = strtoul(cbm_mask, NULL, 16); + /* Prepare cbm mask without any shareable bits */ + for (i = 0; i < num_of_bits; i++) { + mask <<= 1; + mask |= 1; + } + mask = ~shareable_mask & mask;

- ret = get_cache_size(cpu_no, "L3", &cache_size); + sprintf(schemata, "%lx", mask); + ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no, "cat"); if (ret) return ret; - printf("cache size :%lu\n", cache_size);

- count_of_bits = count_bits(long_mask); + p->mask = mask; + num_of_bits++;

- if (n < 1 || n > count_of_bits) { - printf("Invalid input value for numbr_of_bits n!\n"); - printf("Please Enter value in range 1 to %d\n", count_of_bits); - return -1; - } + return 0; +}

+int cqm_schemata_change(int cpu_no, int span, char *cache_type, + char **benchmark_cmd) +{ struct resctrl_val_param param = { .resctrl_val = "cqm", .ctrlgrp = "c1", .mongrp = "m1", .cpu_no = cpu_no, + .span = span, .mum_resctrlfs = 0, .filename = RESULT_FILE_NAME, - .mask = ~(long_mask << n) & long_mask, - .span = cache_size * n / count_of_bits, .num_of_runs = 0, - .setup = cqm_setup, + .setup = cqm_setup }; + int ret; + char schemata[64]; + unsigned long long_mask;

- if (strcmp(benchmark_cmd[0], "fill_buf") == 0) - sprintf(benchmark_cmd[1], "%lu", param.span); + ret = remount_resctrlfs(1); + if (ret) + return ret;

- remove(RESULT_FILE_NAME); + /* Check for both 'cat' and 'cqm' because CQM is validated using CAT */ + if (!validate_resctrl_feature_request("cqm")) + return -1; + + if (!validate_resctrl_feature_request("cat")) + return -1; + + ret = get_cache_size(cpu_no, cache_type, &cache_size); + if (ret) + return ret; + printf("# cache size: %lu\n", cache_size); + + ret = get_cbm_mask(cache_type); + if (ret) + return ret; + + long_mask = strtoul(cbm_mask, NULL, 16); + count_of_bits = count_bits(long_mask); + + /* + * Change root con_mon grp schemata to minimum (i.e. '1' bit) so that + * test could use all other bits + */ + sprintf(schemata, "%x", 1); + ret = write_schemata("", schemata, cpu_no, "cqm"); + if (ret) + return ret; + + ret = get_shareable_mask(cache_type, &shareable_mask); + if (ret) + return ret; + + count_of_shareable_bits = count_bits(shareable_mask);

ret = resctrl_val(benchmark_cmd, &param); if (ret) return ret;

- ret = check_results(&param, n); + ret = check_results(); if (ret) return ret;

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 79148cbbd7a4..cb67ad689475 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -103,7 +103,8 @@ int setup_critical_process(pid_t pid, struct resctrl_val_param *param); int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); -int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd); +int cqm_schemata_change(int cpu_no, int span, char *cache_type, + char **benchmark_cmd); unsigned int count_bits(unsigned long n); void cqm_test_cleanup(void); int get_core_sibling(int cpu_no); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 60db128312a6..3c408c636b6d 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -180,9 +180,11 @@ int main(int argc, char **argv)

if (cqm_test) { printf("# Starting CQM test ...\n"); - if (!has_ben) + if (!has_ben) { + sprintf(benchmark_cmd[1], "%d", span); sprintf(benchmark_cmd[5], "%s", "cqm"); - res = cqm_resctrl_val(cpu_no, no_of_bits, benchmark_cmd); + } + res = cqm_schemata_change(cpu_no, span, "L3", benchmark_cmd); printf("%sok CQM: test\n", res ? "not " : ""); cqm_test_cleanup(); tests_run++; diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 271cb5c976f5..c59fad6cb9b0 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -705,29 +705,21 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) goto out; }

- /* Give benchmark enough time to fully run */ - sleep(1); - /* Test runs until the callback setup() tells the test to stop. */ while (1) { + ret = param->setup(param); + if (ret) { + ret = 0; + break; + } + + /* Measure vals sleeps for a second */ if ((strcmp(resctrl_val, "mbm") == 0) || (strcmp(resctrl_val, "mba") == 0)) { - ret = param->setup(param); - if (ret) { - ret = 0; - break; - } - ret = measure_vals(param, &bw_resc_start); if (ret) break; } else if (strcmp(resctrl_val, "cqm") == 0) { - ret = param->setup(param); - if (ret) { - ret = 0; - break; - } - sleep(1); ret = measure_cache_vals(param, bm_pid); if (ret) break; diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 52452bb0178a..bd81a13ff9df 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -365,11 +365,7 @@ void run_benchmark(int signum, siginfo_t *info, void *ucontext) memflush = atoi(benchmark_cmd[3]); operation = atoi(benchmark_cmd[4]); sprintf(resctrl_val, "%s", benchmark_cmd[5]); - - if (strcmp(resctrl_val, "cqm") != 0) - buffer_span = span * MB; - else - buffer_span = span; + buffer_span = span * MB;

if (run_fill_buf(buffer_span, malloc_and_init_memory, memflush, operation, resctrl_val))

-- 2.7.4

Reinette Chatre

10 Mar 10 Mar

10:18 p.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...

The present CQM test runs fill_buff continuously with some user specified

Should this not be referred to as Cache Monitoring Technology (CMT) instead? There is a significant usage of "cqm" throughout this series of tests and creates confusion with it not being the accurate acronym for it as is currently done with cat, mbm, and mba. From what I can tell AMD does not refer to it as CQM either.

I understand the resctrl code uses cqm internally, that may be for historical reasons, but that usage seems to be limited to the code itself and not leaking to the user as done here.

...

buffer size and reads cqm_llc_occupancy every 1 second and tests if resctrl reported value is in 15% range of buffer that fill_buff is working on. If the difference is greater than 15% the test fails. This test assumes that the buffer fill_buff is working on will be identity mapped into cache from memory i.e. there won't be any overlap. But that might not always be true because of the way cache indexing works (two physical addresses could get indexed into the same cache line). If this happens, cqm_llc_occupancy will be less than buffer size and we cannot guarantee the percentage by which this might be less. Another issue with the test case is that, although it has 15% of guard band, the cache occupied by code (or other parts) of the process may not be within this range. While we are actively looking into approximating llc_occupancy through perf, fix this test case with the help of CAT.

The new CQM test runs fill_buff continuously with a buffer size that is much greater than cache size and uses CAT to change schemata (from 1 bit to max_bits available without shareable bits). For every change in schemata, it then averages cqm_llc_occupancy and checks if it is less than allocated cache size (with 5% guard band). If the average cqm_llc_occupancy is less than allocated cache size, the test passes. Please note that there is no lower bound on the expected cqm_llc_occupancy because presently that cannot be determined.

Note: The new test case assumes that

The system supports CAT

CAT is working as expected on the system

Reported-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Tony Luck tony.luck@intel.com Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/cache.c | 4 + tools/testing/selftests/resctrl/cqm_test.c | 203 +++++++++++++++--------- tools/testing/selftests/resctrl/resctrl.h | 3 +- tools/testing/selftests/resctrl/resctrl_tests.c | 6 +- tools/testing/selftests/resctrl/resctrl_val.c | 22 +-- tools/testing/selftests/resctrl/resctrlfs.c | 6 +- 6 files changed, 143 insertions(+), 101 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index e30cdd7b851c..ca794ad6fcfc 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -224,11 +224,15 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) /* Measure llc occupancy from resctrl */ if (!strcmp(param->resctrl_val, "cqm")) {
/* Sleep for a second so that benchmark gets to run */
sleep(1);
ret = get_llc_occu_resctrl(&llc_occu_resc); if (ret < 0) return ret; llc_value = llc_occu_resc; }

Extra newline here

...

ret = print_results_cache(param->filename, bm_pid, llc_value); if (ret) return ret; diff --git a/tools/testing/selftests/resctrl/cqm_test.c b/tools/testing/selftests/resctrl/cqm_test.c index f27b0363e518..3406c04ff110 100644 --- a/tools/testing/selftests/resctrl/cqm_test.c +++ b/tools/testing/selftests/resctrl/cqm_test.c @@ -13,73 +13,74 @@ #define RESULT_FILE_NAME "result_cqm" #define NUM_OF_RUNS 5 -#define MAX_DIFF 2000000 -#define MAX_DIFF_PERCENT 15 +#define MAX_DIFF_PERCENT 5 -int count_of_bits; char cbm_mask[256]; -unsigned long long_mask; -unsigned long cache_size; -static int cqm_setup(struct resctrl_val_param *p) -{
/* Run NUM_OF_RUNS times */

if (p->num_of_runs >= NUM_OF_RUNS)
return -1;
p->num_of_runs++;

return 0;
-} +static int count_of_bits, count_of_shareable_bits; +static unsigned long cache_size, shareable_mask; -static void show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits,
	    unsigned long span)
+static void show_cache_info(unsigned long *llc_occu_resc) {

unsigned long avg_llc_occu_resc = 0;

float diff_percent;

long avg_diff = 0;

bool res;

int bits, runs;

bool failed = false;

avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1);

avg_diff = (long)abs(span - avg_llc_occu_resc);

diff_percent = (((float)span - avg_llc_occu_resc) / span) * 100;

printf("# Results are displayed in (Bytes)\n");

Why are the parentheses used in "(Bytes)"? Printing the numbers in parentheses does not add value ... but it does not seem to be done either. Perhaps just "Results are displayed in bytes"?

...

if ((abs((int)diff_percent) <= MAX_DIFF_PERCENT) ||
   (abs(avg_diff) <= MAX_DIFF))
res = true;
else
res = false;
for (bits = 0; bits < count_of_bits - count_of_shareable_bits; bits++) {
unsigned long avg_llc_occu_resc, sum_llc_occu_resc = 0;
unsigned long alloc_llc, mask = 0;
int llc_occu_diff, i;
/*
 * The first run is discarded due to inaccurate value from
 * phase transition.
 */
for (runs = NUM_OF_RUNS * bits + 1;
     runs < NUM_OF_RUNS * bits + NUM_OF_RUNS; runs++)
	sum_llc_occu_resc += llc_occu_resc[runs];
avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1);
alloc_llc = cache_size * ((float)(bits + 1) / count_of_bits);
llc_occu_diff = avg_llc_occu_resc - alloc_llc;
for (i = 0; i < bits + 1; i++) {
	mask <<= 1;
	mask |= 1;
}
printf("%sok CQM: diff within %d, %d%%\n", res ? "" : "not ",
      MAX_DIFF, (int)MAX_DIFF_PERCENT);
if (llc_occu_diff > 0 &&
    llc_occu_diff > alloc_llc * ((float)MAX_DIFF_PERCENT / 100))
	failed = true;
printf("# diff: %ld\n", avg_diff);

printf("# percent diff=%d\n", abs((int)diff_percent));

printf("# Results are displayed in (Bytes)\n");

printf("# Number of bits: %d\n", no_of_bits);

printf("# Avg_llc_occu_resc: %lu\n", avg_llc_occu_resc);

printf("# llc_occu_exp (span): %lu\n", span);
printf("%sok CQM: diff within %d%% for mask %lx\n",
       failed ? "not " : "", MAX_DIFF_PERCENT, mask);
printf("# alloc_llc_cache_size: %lu\n", alloc_llc);
printf("# avg_llc_occu_resc: %lu\n", avg_llc_occu_resc);
tests_run++;
}
printf("%sok schemata change for CQM%s\n", failed ? "not " : "",
      failed ? " # at least one test failed" : "");
tests_run++;
} -static int check_results(struct resctrl_val_param *param, int no_of_bits) +static int check_results(void) {

char *token_array[8], temp[512];

unsigned long sum_llc_occu_resc = 0;

int runs = 0;

char *token_array[8], output[] = RESULT_FILE_NAME, temp[512];

unsigned long llc_occu_resc[count_of_bits * NUM_OF_RUNS];

int runs; FILE *fp;

printf("# checking for pass/fail\n");

fp = fopen(param->filename, "r");

fp = fopen(output, "r"); if (!fp) {
perror("# Error in opening file\n");
perror(output);
return errno; }

while (fgets(temp, 1024, fp)) {

runs = 0;

while (fgets(temp, sizeof(temp), fp)) { char *token = strtok(temp, ":\t"); int fields = 0;

@@ -88,13 +89,14 @@ static int check_results(struct resctrl_val_param *param, int no_of_bits) token = strtok(NULL, ":\t"); }
/* Field 3 is llc occ resc value */
if (runs > 0)
	sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
/* Field 3 is resctrl LLC occupancy value */
llc_occu_resc[runs] = strtoul(token_array[3], NULL, 0);
runs++; }
fclose(fp);
show_cache_info(sum_llc_occu_resc, no_of_bits, param->span);

show_cache_info(llc_occu_resc);

return 0; } @@ -104,62 +106,107 @@ void cqm_test_cleanup(void) remove(RESULT_FILE_NAME); } -int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd) +/*

Change schemata from 1 to count_of_bits - 1. Write schemata to specified

con_mon grp, mon_grp in resctrl FS. For each allocation, run "NUM_OF_RUNS"

times to get average values.

*/

+static int cqm_setup(struct resctrl_val_param *p) {

int ret, mum_resctrlfs;

static int runs_per_allocation = 0, num_of_bits = 1;

unsigned long mask = 0;

char schemata[64];

int i, ret;

cache_size = 0;

mum_resctrlfs = 1;
if (runs_per_allocation >= NUM_OF_RUNS)
runs_per_allocation = 0;
ret = remount_resctrlfs(mum_resctrlfs);

if (ret)
return ret;
/* Only set up schemata once every NUM_OF_RUNS of allocations */

if (runs_per_allocation++ != 0)
return 0;
if (!validate_resctrl_feature_request("cqm"))

if (num_of_bits > count_of_bits - count_of_shareable_bits) return -1;
ret = get_cbm_mask("L3");

if (ret)
return ret;
long_mask = strtoul(cbm_mask, NULL, 16);
/* Prepare cbm mask without any shareable bits */

for (i = 0; i < num_of_bits; i++) {
mask <<= 1;
mask |= 1;
}

mask = ~shareable_mask & mask;

If I understand correctly this function assumes that the shareable bits will also be the high order bits of the schemata. I do not believe that this is part of a spec. It also does not seem as though the code follows what the comment at the top of the function states. The comment states "Change schemata from 1 to count_of_bits - 1" while the code seems to change schemata from 1 to count_of_bits - count_of_shareable_bits ...

...

ret = get_cache_size(cpu_no, "L3", &cache_size);

sprintf(schemata, "%lx", mask);

ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no, "cat"); if (ret) return ret;

printf("cache size :%lu\n", cache_size);

count_of_bits = count_bits(long_mask);

p->mask = mask;

num_of_bits++;
if (n < 1 || n > count_of_bits) {
printf("Invalid input value for numbr_of_bits n!\n");
printf("Please Enter value in range 1 to %d\n", count_of_bits);
return -1;
}
return 0;

+} +int cqm_schemata_change(int cpu_no, int span, char *cache_type,
	char **benchmark_cmd)
+{ struct resctrl_val_param param = { .resctrl_val = "cqm", .ctrlgrp = "c1", .mongrp = "m1", .cpu_no = cpu_no,
.span		= span,

This function received the new function parameter "span" to be used here ... I am having trouble finding where this member is used within this test. Could you please help me navigate to this?

...

.mum_resctrlfs	= 0,
.filename	= RESULT_FILE_NAME,
.mask		= ~(long_mask << n) & long_mask,
.span		= cache_size * n / count_of_bits,
.num_of_runs = 0,
.setup		= cqm_setup,
.setup		= cqm_setup
};
int ret;

char schemata[64];

unsigned long long_mask;
if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
sprintf(benchmark_cmd[1], "%lu", param.span);
ret = remount_resctrlfs(1);

if (ret)
return ret;

Here resctrl is remounted and followed by some changes to the root group's schemata. That is followed by a call to resctrl_val that attempts to remount resctrl again that will undo all the configurations inbetween.

...

remove(RESULT_FILE_NAME);
/* Check for both 'cat' and 'cqm' because CQM is validated using CAT */

if (!validate_resctrl_feature_request("cqm"))
return -1;
if (!validate_resctrl_feature_request("cat"))
return -1;
ret = get_cache_size(cpu_no, cache_type, &cache_size);

if (ret)
return ret;
printf("# cache size: %lu\n", cache_size);

ret = get_cbm_mask(cache_type);

if (ret)
return ret;
long_mask = strtoul(cbm_mask, NULL, 16);

count_of_bits = count_bits(long_mask);

/*
* Change root con_mon grp schemata to minimum (i.e. '1' bit) so that
* test could use all other bits
*/
sprintf(schemata, "%x", 1);

ret = write_schemata("", schemata, cpu_no, "cqm");

if (ret)
return ret;

... here the schemata is written to resctrl

...

ret = get_shareable_mask(cache_type, &shareable_mask);

if (ret)
return ret;
count_of_shareable_bits = count_bits(shareable_mask);
ret = resctrl_val(benchmark_cmd, &param); if (ret) return ret;

here is the call to resctrl_val() that attempts to remount resctrl.

...

ret = check_results(&param, n);

ret = check_results(); if (ret) return ret;

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 79148cbbd7a4..cb67ad689475 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -103,7 +103,8 @@ int setup_critical_process(pid_t pid, struct resctrl_val_param *param); int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); -int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd); +int cqm_schemata_change(int cpu_no, int span, char *cache_type,
	char **benchmark_cmd);
unsigned int count_bits(unsigned long n); void cqm_test_cleanup(void); int get_core_sibling(int cpu_no); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 60db128312a6..3c408c636b6d 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -180,9 +180,11 @@ int main(int argc, char **argv) if (cqm_test) { printf("# Starting CQM test ...\n");
if (!has_ben)
if (!has_ben) {
	sprintf(benchmark_cmd[1], "%d", span);
sprintf(benchmark_cmd[5], "%s", "cqm");
res = cqm_resctrl_val(cpu_no, no_of_bits, benchmark_cmd);
}
res = cqm_schemata_change(cpu_no, span, "L3", benchmark_cmd);
printf("%sok CQM: test\n", res ? "not " : ""); cqm_test_cleanup(); tests_run++;
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 271cb5c976f5..c59fad6cb9b0 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -705,29 +705,21 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) goto out; }

/* Give benchmark enough time to fully run */

sleep(1);

/* Test runs until the callback setup() tells the test to stop. */ while (1) {
ret = param->setup(param);
if (ret) {
	ret = 0;
	break;
}
/* Measure vals sleeps for a second */
if ((strcmp(resctrl_val, "mbm") == 0) || (strcmp(resctrl_val, "mba") == 0)) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
ret = measure_vals(param, &bw_resc_start);
if (ret)
	break;
} else if (strcmp(resctrl_val, "cqm") == 0) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
	sleep(1);
ret = measure_cache_vals(param, bm_pid);
if (ret)
	break;

This change affects not just the cache monitoring test. Could this change be extracted into its own patch to be clear what is done here and how it impacts the other tests?

...

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 52452bb0178a..bd81a13ff9df 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -365,11 +365,7 @@ void run_benchmark(int signum, siginfo_t *info, void *ucontext) memflush = atoi(benchmark_cmd[3]); operation = atoi(benchmark_cmd[4]); sprintf(resctrl_val, "%s", benchmark_cmd[5]);
if (strcmp(resctrl_val, "cqm") != 0)
	buffer_span = span * MB;
else
	buffer_span = span;
buffer_span = span * MB;

This change seems to change the buffer_span used by the other tests. It is not obvious why this change is made to other tests while this commit intends to focus on the cache monitoring test. Perhaps this can be split into a separate patch to make this clear?

...

if (run_fill_buf(buffer_span, malloc_and_init_memory, memflush, operation, resctrl_val))

Reinette

Sai Praneeth Prakhya

11 Mar 11 Mar

2:46 a.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

Hi Reinette,

On Tue, 2020-03-10 at 15:18 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
The present CQM test runs fill_buff continuously with some user specified

Should this not be referred to as Cache Monitoring Technology (CMT) instead? There is a significant usage of "cqm" throughout this series of tests and creates confusion with it not being the accurate acronym for it as is currently done with cat, mbm, and mba. From what I can tell AMD does not refer to it as CQM either.

I understand the resctrl code uses cqm internally, that may be for historical reasons, but that usage seems to be limited to the code itself and not leaking to the user as done here.

Ok.. makes sense. CQM is just how I was introduced to the feature and it sticked along with me. I will change it to CMT.

...

...
buffer size and reads cqm_llc_occupancy every 1 second and tests if resctrl reported value is in 15% range of buffer that fill_buff is working on. If the difference is greater than 15% the test fails. This test assumes that the buffer fill_buff is working on will be identity mapped into cache from memory i.e. there won't be any overlap. But that might not always be true because of the way cache indexing works (two physical addresses could get indexed into the same cache line). If this happens, cqm_llc_occupancy will be less than buffer size and we cannot guarantee the percentage by which this might be less. Another issue with the test case is that, although it has 15% of guard band, the cache occupied by code (or other parts) of the process may not be within this range. While we are actively looking into approximating llc_occupancy through perf, fix this test case with the help of CAT.

The new CQM test runs fill_buff continuously with a buffer size that is much greater than cache size and uses CAT to change schemata (from 1 bit to max_bits available without shareable bits). For every change in schemata, it then averages cqm_llc_occupancy and checks if it is less than allocated cache size (with 5% guard band). If the average cqm_llc_occupancy is less than allocated cache size, the test passes. Please note that there is no lower bound on the expected cqm_llc_occupancy because presently that cannot be determined.

Note: The new test case assumes that

The system supports CAT

CAT is working as expected on the system

Reported-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Tony Luck tony.luck@intel.com Co-developed-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com

tools/testing/selftests/resctrl/cache.c | 4 + tools/testing/selftests/resctrl/cqm_test.c | 203 +++++++++++++++--

tools/testing/selftests/resctrl/resctrl.h | 3 +- tools/testing/selftests/resctrl/resctrl_tests.c | 6 +- tools/testing/selftests/resctrl/resctrl_val.c | 22 +-- tools/testing/selftests/resctrl/resctrlfs.c | 6 +- 6 files changed, 143 insertions(+), 101 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c index e30cdd7b851c..ca794ad6fcfc 100644 --- a/tools/testing/selftests/resctrl/cache.c +++ b/tools/testing/selftests/resctrl/cache.c @@ -224,11 +224,15 @@ int measure_cache_vals(struct resctrl_val_param *param, int bm_pid) /* Measure llc occupancy from resctrl */ if (!strcmp(param->resctrl_val, "cqm")) {
/* Sleep for a second so that benchmark gets to run */
sleep(1);
ret = get_llc_occu_resctrl(&llc_occu_resc); if (ret < 0) return ret; llc_value = llc_occu_resc; }
Extra newline here

Ok. Will remove it.

...

...
ret = print_results_cache(param->filename, bm_pid, llc_value); if (ret) return ret; diff --git a/tools/testing/selftests/resctrl/cqm_test.c b/tools/testing/selftests/resctrl/cqm_test.c index f27b0363e518..3406c04ff110 100644 --- a/tools/testing/selftests/resctrl/cqm_test.c +++ b/tools/testing/selftests/resctrl/cqm_test.c @@ -13,73 +13,74 @@ #define RESULT_FILE_NAME "result_cqm" #define NUM_OF_RUNS 5 -#define MAX_DIFF 2000000 -#define MAX_DIFF_PERCENT 15 +#define MAX_DIFF_PERCENT 5 -int count_of_bits; char cbm_mask[256]; -unsigned long long_mask; -unsigned long cache_size; -static int cqm_setup(struct resctrl_val_param *p) -{
/* Run NUM_OF_RUNS times */

if (p->num_of_runs >= NUM_OF_RUNS)
return -1;
p->num_of_runs++;

return 0;
-} +static int count_of_bits, count_of_shareable_bits; +static unsigned long cache_size, shareable_mask; -static void show_cache_info(unsigned long sum_llc_occu_resc, int no_of_bits,
	    unsigned long span)
+static void show_cache_info(unsigned long *llc_occu_resc) {

unsigned long avg_llc_occu_resc = 0;

float diff_percent;

long avg_diff = 0;

bool res;

int bits, runs;

bool failed = false;

avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1);

avg_diff = (long)abs(span - avg_llc_occu_resc);

diff_percent = (((float)span - avg_llc_occu_resc) / span) * 100;

printf("# Results are displayed in (Bytes)\n");
Why are the parentheses used in "(Bytes)"? Printing the numbers in parentheses does not add value ... but it does not seem to be done either. Perhaps just "Results are displayed in bytes"?

Good catch. Will fix it.

...

...
if ((abs((int)diff_percent) <= MAX_DIFF_PERCENT) ||
   (abs(avg_diff) <= MAX_DIFF))
res = true;
else
res = false;
for (bits = 0; bits < count_of_bits - count_of_shareable_bits; bits++)

{
unsigned long avg_llc_occu_resc, sum_llc_occu_resc = 0;
unsigned long alloc_llc, mask = 0;
int llc_occu_diff, i;
/*
 * The first run is discarded due to inaccurate value from
 * phase transition.
 */
for (runs = NUM_OF_RUNS * bits + 1;
     runs < NUM_OF_RUNS * bits + NUM_OF_RUNS; runs++)
	sum_llc_occu_resc += llc_occu_resc[runs];
avg_llc_occu_resc = sum_llc_occu_resc / (NUM_OF_RUNS - 1);
alloc_llc = cache_size * ((float)(bits + 1) / count_of_bits);
llc_occu_diff = avg_llc_occu_resc - alloc_llc;
for (i = 0; i < bits + 1; i++) {
	mask <<= 1;
	mask |= 1;
}
printf("%sok CQM: diff within %d, %d%%\n", res ? "" : "not ",
      MAX_DIFF, (int)MAX_DIFF_PERCENT);
if (llc_occu_diff > 0 &&
    llc_occu_diff > alloc_llc * ((float)MAX_DIFF_PERCENT /
100))
	failed = true;
printf("# diff: %ld\n", avg_diff);

printf("# percent diff=%d\n", abs((int)diff_percent));

printf("# Results are displayed in (Bytes)\n");

printf("# Number of bits: %d\n", no_of_bits);

printf("# Avg_llc_occu_resc: %lu\n", avg_llc_occu_resc);

printf("# llc_occu_exp (span): %lu\n", span);
printf("%sok CQM: diff within %d%% for mask %lx\n",
       failed ? "not " : "", MAX_DIFF_PERCENT, mask);
printf("# alloc_llc_cache_size: %lu\n", alloc_llc);
printf("# avg_llc_occu_resc: %lu\n", avg_llc_occu_resc);
tests_run++;
}
printf("%sok schemata change for CQM%s\n", failed ? "not " : "",
      failed ? " # at least one test failed" : "");
tests_run++;
} -static int check_results(struct resctrl_val_param *param, int no_of_bits) +static int check_results(void) {

char *token_array[8], temp[512];

unsigned long sum_llc_occu_resc = 0;

int runs = 0;

char *token_array[8], output[] = RESULT_FILE_NAME, temp[512];

unsigned long llc_occu_resc[count_of_bits * NUM_OF_RUNS];

int runs; FILE *fp;

printf("# checking for pass/fail\n");

fp = fopen(param->filename, "r");

fp = fopen(output, "r"); if (!fp) {
perror("# Error in opening file\n");
perror(output);
return errno; }

while (fgets(temp, 1024, fp)) {

runs = 0;

while (fgets(temp, sizeof(temp), fp)) { char *token = strtok(temp, ":\t"); int fields = 0;

@@ -88,13 +89,14 @@ static int check_results(struct resctrl_val_param *param, int no_of_bits) token = strtok(NULL, ":\t"); }
/* Field 3 is llc occ resc value */
if (runs > 0)
	sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
/* Field 3 is resctrl LLC occupancy value */
llc_occu_resc[runs] = strtoul(token_array[3], NULL, 0);
runs++; }
fclose(fp);
show_cache_info(sum_llc_occu_resc, no_of_bits, param->span);

show_cache_info(llc_occu_resc);

return 0; } @@ -104,62 +106,107 @@ void cqm_test_cleanup(void) remove(RESULT_FILE_NAME); } -int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd) +/*

Change schemata from 1 to count_of_bits - 1. Write schemata to

specified

con_mon grp, mon_grp in resctrl FS. For each allocation, run

"NUM_OF_RUNS"

times to get average values.

*/

+static int cqm_setup(struct resctrl_val_param *p) {

int ret, mum_resctrlfs;

static int runs_per_allocation = 0, num_of_bits = 1;

unsigned long mask = 0;

char schemata[64];

int i, ret;

cache_size = 0;

mum_resctrlfs = 1;
if (runs_per_allocation >= NUM_OF_RUNS)
runs_per_allocation = 0;
ret = remount_resctrlfs(mum_resctrlfs);

if (ret)
return ret;
/* Only set up schemata once every NUM_OF_RUNS of allocations */

if (runs_per_allocation++ != 0)
return 0;
if (!validate_resctrl_feature_request("cqm"))

if (num_of_bits > count_of_bits - count_of_shareable_bits) return -1;
ret = get_cbm_mask("L3");

if (ret)
return ret;
long_mask = strtoul(cbm_mask, NULL, 16);
/* Prepare cbm mask without any shareable bits */

for (i = 0; i < num_of_bits; i++) {
mask <<= 1;
mask |= 1;
}

mask = ~shareable_mask & mask;
If I understand correctly this function assumes that the shareable bits will also be the high order bits of the schemata. I do not believe that this is part of a spec.

Yes, that's right. Will fix it.

...

It also does not seem as though the code follows what the comment at the top of the function states. The comment states "Change schemata from 1 to count_of_bits - 1" while the code seems to change schemata from 1 to count_of_bits - count_of_shareable_bits ...

My bad! Will fix the comment.

...

...
ret = get_cache_size(cpu_no, "L3", &cache_size);

sprintf(schemata, "%lx", mask);

ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no, "cat"); if (ret) return ret;

printf("cache size :%lu\n", cache_size);

count_of_bits = count_bits(long_mask);

p->mask = mask;

num_of_bits++;
if (n < 1 || n > count_of_bits) {
printf("Invalid input value for numbr_of_bits n!\n");
printf("Please Enter value in range 1 to %d\n",
count_of_bits);
return -1;
}
return 0;

+} +int cqm_schemata_change(int cpu_no, int span, char *cache_type,
	char **benchmark_cmd)
+{ struct resctrl_val_param param = { .resctrl_val = "cqm", .ctrlgrp = "c1", .mongrp = "m1", .cpu_no = cpu_no,
.span		= span,
This function received the new function parameter "span" to be used here ... I am having trouble finding where this member is used within this test. Could you please help me navigate to this?

My bad. Will fix it.

...

...
.mum_resctrlfs	= 0,
.filename	= RESULT_FILE_NAME,
.mask		= ~(long_mask << n) & long_mask,
.span		= cache_size * n / count_of_bits,
.num_of_runs = 0,
.setup		= cqm_setup,
.setup		= cqm_setup
};
int ret;

char schemata[64];

unsigned long long_mask;
if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
sprintf(benchmark_cmd[1], "%lu", param.span);
ret = remount_resctrlfs(1);

if (ret)
return ret;
Here resctrl is remounted and followed by some changes to the root group's schemata. That is followed by a call to resctrl_val that attempts to remount resctrl again that will undo all the configurations inbetween.

No, it wouldn't because mum_resctrlfs is 0. When resctrl FS is already mounted and mum_resctrlfs is 0, then remount_resctrlfs() is a noop.

...

...
remove(RESULT_FILE_NAME);

/* Check for both 'cat' and 'cqm' because CQM is validated using CAT

*/
if (!validate_resctrl_feature_request("cqm"))
return -1;
if (!validate_resctrl_feature_request("cat"))
return -1;
ret = get_cache_size(cpu_no, cache_type, &cache_size);

if (ret)
return ret;
printf("# cache size: %lu\n", cache_size);

ret = get_cbm_mask(cache_type);

if (ret)
return ret;
long_mask = strtoul(cbm_mask, NULL, 16);

count_of_bits = count_bits(long_mask);

/*
* Change root con_mon grp schemata to minimum (i.e. '1' bit) so that
* test could use all other bits
*/
sprintf(schemata, "%x", 1);

ret = write_schemata("", schemata, cpu_no, "cqm");

if (ret)
return ret;
... here the schemata is written to resctrl

...
ret = get_shareable_mask(cache_type, &shareable_mask);

if (ret)
return ret;
count_of_shareable_bits = count_bits(shareable_mask);
ret = resctrl_val(benchmark_cmd, &param); if (ret) return ret;
here is the call to resctrl_val() that attempts to remount resctrl.

...
ret = check_results(&param, n);

ret = check_results(); if (ret) return ret;

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 79148cbbd7a4..cb67ad689475 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -103,7 +103,8 @@ int setup_critical_process(pid_t pid, struct resctrl_val_param *param); int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); -int cqm_resctrl_val(int cpu_no, int n, char **benchmark_cmd); +int cqm_schemata_change(int cpu_no, int span, char *cache_type,
	char **benchmark_cmd);
unsigned int count_bits(unsigned long n); void cqm_test_cleanup(void); int get_core_sibling(int cpu_no); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 60db128312a6..3c408c636b6d 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -180,9 +180,11 @@ int main(int argc, char **argv) if (cqm_test) { printf("# Starting CQM test ...\n");
if (!has_ben)
if (!has_ben) {
	sprintf(benchmark_cmd[1], "%d", span);
sprintf(benchmark_cmd[5], "%s", "cqm");
res = cqm_resctrl_val(cpu_no, no_of_bits, benchmark_cmd);
}
res = cqm_schemata_change(cpu_no, span, "L3", benchmark_cmd);
printf("%sok CQM: test\n", res ? "not " : ""); cqm_test_cleanup(); tests_run++;
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 271cb5c976f5..c59fad6cb9b0 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -705,29 +705,21 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) goto out; }

/* Give benchmark enough time to fully run */

sleep(1);

/* Test runs until the callback setup() tells the test to stop. */ while (1) {
ret = param->setup(param);
if (ret) {
	ret = 0;
	break;
}
/* Measure vals sleeps for a second */
if ((strcmp(resctrl_val, "mbm") == 0) || (strcmp(resctrl_val, "mba") == 0)) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
ret = measure_vals(param, &bw_resc_start);
if (ret)
	break;
} else if (strcmp(resctrl_val, "cqm") == 0) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
	sleep(1);
ret = measure_cache_vals(param, bm_pid);
if (ret)
	break;
This change affects not just the cache monitoring test. Could this change be extracted into its own patch to be clear what is done here and how it impacts the other tests?

This change shouldn't impact other tests (i.e. CAT) because CAT will not call resctrl_val().

...

...
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 52452bb0178a..bd81a13ff9df 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -365,11 +365,7 @@ void run_benchmark(int signum, siginfo_t *info, void *ucontext) memflush = atoi(benchmark_cmd[3]); operation = atoi(benchmark_cmd[4]); sprintf(resctrl_val, "%s", benchmark_cmd[5]);
if (strcmp(resctrl_val, "cqm") != 0)
	buffer_span = span * MB;
else
	buffer_span = span;
buffer_span = span * MB;
This change seems to change the buffer_span used by the other tests. It is not obvious why this change is made to other tests while this commit intends to focus on the cache monitoring test. Perhaps this can be split into a separate patch to make this clear?

The change here is that we don't need to check for test case. I think for previous CQM test, it was directly passing "bytes" and hence span was not being multiplied by MB. But for the new CQM test case, span has to be multiplied and hence is same as MBM/MBA and hence the change, don't check for test case (and please note that CAT test doesn't get here, so it's only three tests).

Regards, Sai

Reinette Chatre

5:19 p.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

Hi Sai,

On 3/10/2020 7:46 PM, Sai Praneeth Prakhya wrote:

...

On Tue, 2020-03-10 at 15:18 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
.mum_resctrlfs	= 0,
.filename	= RESULT_FILE_NAME,
.mask		= ~(long_mask << n) & long_mask,
.span		= cache_size * n / count_of_bits,
.num_of_runs = 0,
.setup		= cqm_setup,
.setup		= cqm_setup
};
int ret;

char schemata[64];

unsigned long long_mask;
if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
sprintf(benchmark_cmd[1], "%lu", param.span);
ret = remount_resctrlfs(1);

if (ret)
return ret;
Here resctrl is remounted and followed by some changes to the root group's schemata. That is followed by a call to resctrl_val that attempts to remount resctrl again that will undo all the configurations inbetween.
No, it wouldn't because mum_resctrlfs is 0. When resctrl FS is already mounted and mum_resctrlfs is 0, then remount_resctrlfs() is a noop.

I missed that. Thank you.

fyi ... when I tried these tests I encountered the following error related to unmounting:

[SNIP] ok Write schema "L3:1=7fff" to resctrl FS ok Write schema "L3:1=ffff" to resctrl FS ok Write schema "L3:1=1ffff" to resctrl FS ok Write schema "L3:1=3ffff" to resctrl FS # Unable to umount resctrl: Device or resource busy # Results are displayed in (Bytes) ok CQM: diff within 5% for mask 1 # alloc_llc_cache_size: 2883584 # avg_llc_occu_resc: 2973696 ok CQM: diff within 5% for mask 3 [SNIP]

This seems to originate from resctrl_val() that forces an unmount but if that fails the error is not propagated.

...

...
...
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 271cb5c976f5..c59fad6cb9b0 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -705,29 +705,21 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) goto out; }

/* Give benchmark enough time to fully run */

sleep(1);

/* Test runs until the callback setup() tells the test to stop. */ while (1) {
ret = param->setup(param);
if (ret) {
	ret = 0;
	break;
}
/* Measure vals sleeps for a second */
if ((strcmp(resctrl_val, "mbm") == 0) || (strcmp(resctrl_val, "mba") == 0)) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}

(I refer to the above snippet in my comment below)

...

...
...
	ret = measure_vals(param, &bw_resc_start);
	if (ret)
		break;
} else if (strcmp(resctrl_val, "cqm") == 0) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
	sleep(1);
ret = measure_cache_vals(param, bm_pid);
if (ret)
	break;
This change affects not just the cache monitoring test. Could this change be extracted into its own patch to be clear what is done here and how it impacts the other tests?
This change shouldn't impact other tests (i.e. CAT) because CAT will not call resctrl_val().

I was referring to the snippet above that seems to impact the "mbm" and "mba" tests by moving the call to "param->setup" for the them.

...

...
...
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 52452bb0178a..bd81a13ff9df 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -365,11 +365,7 @@ void run_benchmark(int signum, siginfo_t *info, void *ucontext) memflush = atoi(benchmark_cmd[3]); operation = atoi(benchmark_cmd[4]); sprintf(resctrl_val, "%s", benchmark_cmd[5]);
if (strcmp(resctrl_val, "cqm") != 0)
	buffer_span = span * MB;
else
	buffer_span = span;
buffer_span = span * MB;
This change seems to change the buffer_span used by the other tests. It is not obvious why this change is made to other tests while this commit intends to focus on the cache monitoring test. Perhaps this can be split into a separate patch to make this clear?

Got it. Thank you.

Reinette

Sai Praneeth Prakhya

5:33 p.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

Hi Reinette,

On Wed, 2020-03-11 at 10:19 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/10/2020 7:46 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 15:18 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
.mum_resctrlfs	= 0,
.filename	= RESULT_FILE_NAME,
.mask		= ~(long_mask << n) & long_mask,
.span		= cache_size * n / count_of_bits,
.num_of_runs = 0,
.setup		= cqm_setup,
.setup		= cqm_setup
};
int ret;

char schemata[64];

unsigned long long_mask;
if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
sprintf(benchmark_cmd[1], "%lu", param.span);
ret = remount_resctrlfs(1);

if (ret)
return ret;
Here resctrl is remounted and followed by some changes to the root group's schemata. That is followed by a call to resctrl_val that attempts to remount resctrl again that will undo all the configurations inbetween.
No, it wouldn't because mum_resctrlfs is 0. When resctrl FS is already mounted and mum_resctrlfs is 0, then remount_resctrlfs() is a noop.
I missed that. Thank you.

fyi ... when I tried these tests I encountered the following error related to unmounting:

[SNIP] ok Write schema "L3:1=7fff" to resctrl FS ok Write schema "L3:1=ffff" to resctrl FS ok Write schema "L3:1=1ffff" to resctrl FS ok Write schema "L3:1=3ffff" to resctrl FS # Unable to umount resctrl: Device or resource busy # Results are displayed in (Bytes) ok CQM: diff within 5% for mask 1 # alloc_llc_cache_size: 2883584 # avg_llc_occu_resc: 2973696 ok CQM: diff within 5% for mask 3 [SNIP]

This seems to originate from resctrl_val() that forces an unmount but if that fails the error is not propagated.

Yes, that's right and it's a good test. I didn't encounter this issue during my testing because I wasn't using resctrl FS from other terminals (I think you were using resctrl FS from other terminal and hence resctrl_test was unable to unmount it).

I think the error should not be propagated because unmounting resctrl FS shouldn't stop us from checking the results. If measuring values reports an error then we shouldn't check for results.

...

...
...
...
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 271cb5c976f5..c59fad6cb9b0 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -705,29 +705,21 @@ int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param) goto out; }

/* Give benchmark enough time to fully run */

sleep(1);

/* Test runs until the callback setup() tells the test to

stop. */ while (1) {
ret = param->setup(param);
if (ret) {
	ret = 0;
	break;
}
/* Measure vals sleeps for a second */
if ((strcmp(resctrl_val, "mbm") == 0) || (strcmp(resctrl_val, "mba") == 0)) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
(I refer to the above snippet in my comment below)

...
...
...
	ret = measure_vals(param, &bw_resc_start);
	if (ret)
		break;
} else if (strcmp(resctrl_val, "cqm") == 0) {
	ret = param->setup(param);
	if (ret) {
		ret = 0;
		break;
	}
	sleep(1);
ret = measure_cache_vals(param, bm_pid);
if (ret)
	break;
This change affects not just the cache monitoring test. Could this change be extracted into its own patch to be clear what is done here and how it impacts the other tests?
This change shouldn't impact other tests (i.e. CAT) because CAT will not call resctrl_val().
I was referring to the snippet above that seems to impact the "mbm" and "mba" tests by moving the call to "param->setup" for the them.

Ok.. makes sense. Sure! I will make it into separate patch.

Regards, Sai

Reinette Chatre

6:03 p.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

Hi Sai,

On 3/11/2020 10:33 AM, Sai Praneeth Prakhya wrote:

...

On Wed, 2020-03-11 at 10:19 -0700, Reinette Chatre wrote:

...
On 3/10/2020 7:46 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 15:18 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
.mum_resctrlfs	= 0,
.filename	= RESULT_FILE_NAME,
.mask		= ~(long_mask << n) & long_mask,
.span		= cache_size * n / count_of_bits,
.num_of_runs = 0,
.setup		= cqm_setup,
.setup		= cqm_setup
};
int ret;

char schemata[64];

unsigned long long_mask;
if (strcmp(benchmark_cmd[0], "fill_buf") == 0)
sprintf(benchmark_cmd[1], "%lu", param.span);
ret = remount_resctrlfs(1);

if (ret)
return ret;
Here resctrl is remounted and followed by some changes to the root group's schemata. That is followed by a call to resctrl_val that attempts to remount resctrl again that will undo all the configurations inbetween.
No, it wouldn't because mum_resctrlfs is 0. When resctrl FS is already mounted and mum_resctrlfs is 0, then remount_resctrlfs() is a noop.
I missed that. Thank you.

fyi ... when I tried these tests I encountered the following error related to unmounting:

[SNIP] ok Write schema "L3:1=7fff" to resctrl FS ok Write schema "L3:1=ffff" to resctrl FS ok Write schema "L3:1=1ffff" to resctrl FS ok Write schema "L3:1=3ffff" to resctrl FS # Unable to umount resctrl: Device or resource busy # Results are displayed in (Bytes) ok CQM: diff within 5% for mask 1 # alloc_llc_cache_size: 2883584 # avg_llc_occu_resc: 2973696 ok CQM: diff within 5% for mask 3 [SNIP]

This seems to originate from resctrl_val() that forces an unmount but if that fails the error is not propagated.
Yes, that's right and it's a good test. I didn't encounter this issue during my testing because I wasn't using resctrl FS from other terminals (I think you were using resctrl FS from other terminal and hence resctrl_test was unable to unmount it).

I was not explicitly testing for this but this may have been the case.

As a sidenote ... could remount_resctrlfs() be called consistently? It seems to switch between being called with true/false and 1/0. Since its parameter type is boolean using true/false seems most appropriate.

...

I think the error should not be propagated because unmounting resctrl FS shouldn't stop us from checking the results. If measuring values reports an error then we shouldn't check for results.

This sounds right. It is inconsistent though ... the CQM test unmounts resctrl after it is run but the CAT test does not. Looking closer the CAT test seems to leave its artifacts around in resctrl and this should be cleaned up.

I am not sure about the expectations here. Unmounting resctrl after a test is run is indeed the easiest to clean up and may be ok. It may be a surprise to the user though. Perhaps there can be a snippet in the README that warns people about this?

Thank you very much

Reinette

Sai Praneeth Prakhya

6:07 p.m.

New subject: [PATCH V1 11/13] selftests/resctrl: Change Cache Quality Monitoring (CQM) test

Hi Reinette,

On Wed, 2020-03-11 at 11:03 -0700, Reinette Chatre wrote:

...

...
...
...
...
...

[SNIP]

...

Hi Sai,

On 3/11/2020 10:33 AM, Sai Praneeth Prakhya wrote:

...
On Wed, 2020-03-11 at 10:19 -0700, Reinette Chatre wrote:

...
On 3/10/2020 7:46 PM, Sai Praneeth Prakhya wrote:

...
On Tue, 2020-03-10 at 15:18 -0700, Reinette Chatre wrote:

...
On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

I missed that. Thank you.

fyi ... when I tried these tests I encountered the following error related to unmounting:

[SNIP] ok Write schema "L3:1=7fff" to resctrl FS ok Write schema "L3:1=ffff" to resctrl FS ok Write schema "L3:1=1ffff" to resctrl FS ok Write schema "L3:1=3ffff" to resctrl FS # Unable to umount resctrl: Device or resource busy # Results are displayed in (Bytes) ok CQM: diff within 5% for mask 1 # alloc_llc_cache_size: 2883584 # avg_llc_occu_resc: 2973696 ok CQM: diff within 5% for mask 3 [SNIP]

This seems to originate from resctrl_val() that forces an unmount but if that fails the error is not propagated.

Yes, that's right and it's a good test. I didn't encounter this issue during my testing because I wasn't using resctrl FS from other terminals (I think you were using resctrl FS from other terminal and hence resctrl_test was unable to unmount it).

I was not explicitly testing for this but this may have been the case.

As a sidenote ... could remount_resctrlfs() be called consistently? It seems to switch between being called with true/false and 1/0. Since its parameter type is boolean using true/false seems most appropriate.

Agreed and make sense. I will fix this in a separate patch.

...

...
I think the error should not be propagated because unmounting resctrl FS shouldn't stop us from checking the results. If measuring values reports an error then we shouldn't check for results.

This sounds right. It is inconsistent though ... the CQM test unmounts resctrl after it is run but the CAT test does not. Looking closer the CAT test seems to leave its artifacts around in resctrl and this should be cleaned up.

Yes makes sense. I will fix CAT test to cleanup things.

...

I am not sure about the expectations here. Unmounting resctrl after a test is run is indeed the easiest to clean up and may be ok.

The main reason for unmounting is that assume user hasn't mounted resctrl FS before running the test then we want to make sure we get back to the same state as before running test and also to clean up any changes made to resctrl FS during test.

...

It may be a surprise to the user though. Perhaps there can be a snippet in the README that warns people about this?

Sure! makes sense. I will add it.

Regards, Sai

Sai Praneeth Prakhya

7 Mar 7 Mar

3:40 a.m.

New subject: [PATCH V1 12/13] selftests/resctrl: Dynamically select buffer size for CAT test

Presently, while running CAT test case, if user hasn't given any input for '-n' option, the test defaults to 5 bits to determine the buffer size that is used during test. Instead of statically running always with 5 bits, change it such that the buffer size is always half of the cache size.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cat_test.c | 16 +++++++++++----- tools/testing/selftests/resctrl/resctrl.h | 3 ++- tools/testing/selftests/resctrl/resctrl_tests.c | 7 ++++--- 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index f7a67f005fe5..d1c50430ab20 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -109,7 +109,8 @@ void cat_test_cleanup(void) remove(RESULT_FILE_NAME); }

-static int prepare_masks_for_two_processes(int no_of_bits, char *cache_type) +static int prepare_masks_for_two_processes(int *no_of_bits, bool user_bits, + char *cache_type) { int ret, i; unsigned long long_mask, shareable_mask; @@ -123,12 +124,15 @@ static int prepare_masks_for_two_processes(int no_of_bits, char *cache_type) long_mask = strtoul(cbm_mask, NULL, 16); count_of_bits = count_bits(long_mask);

+ if (!user_bits) + *no_of_bits = count_of_bits / 2; + /* * Max limit is count_of_bits - 1 because we need exclusive masks for * the two processes. So, the last saved bit will be used by the other * process. */ - if (no_of_bits < 1 || no_of_bits > count_of_bits - 1) { + if (*no_of_bits < 1 || *no_of_bits > count_of_bits - 1) { printf("Invalid input value for no_of_bits 'n'\n"); printf("Please Enter value in range 1 to %d\n", count_of_bits - 1); @@ -140,7 +144,7 @@ static int prepare_masks_for_two_processes(int no_of_bits, char *cache_type) return ret;

/* Prepare cbm mask without any shareable bits */ - for (i = 0; i < no_of_bits; i++) { + for (i = 0; i < *no_of_bits; i++) { p1_mask <<= 1; p1_mask |= 1; } @@ -176,7 +180,8 @@ static int start_noisy_process(pid_t pid, int sibling_cpu_no) return 0; }

-int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type) +int cat_perf_miss_val(int cpu_no, int no_of_bits, bool user_bits, + char *cache_type) { int ret, sibling_cpu_no; unsigned long buf_size; @@ -194,7 +199,8 @@ int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type) if (!validate_resctrl_feature_request("cat")) return -1;

- ret = prepare_masks_for_two_processes(no_of_bits, cache_type); + ret = prepare_masks_for_two_processes(&no_of_bits, user_bits, + cache_type); if (ret) return ret;

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cb67ad689475..393f2f34ccac 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -102,7 +102,8 @@ void ctrlc_handler(int signum, siginfo_t *info, void *ptr); int setup_critical_process(pid_t pid, struct resctrl_val_param *param); int run_critical_process(pid_t pid, struct resctrl_val_param *param); void cat_test_cleanup(void); -int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type); +int cat_perf_miss_val(int cpu_no, int no_of_bits, bool user_bits, + char *cache_type); int cqm_schemata_change(int cpu_no, int span, char *cache_type, char **benchmark_cmd); unsigned int count_bits(unsigned long n); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 3c408c636b6d..4461c3dc8cce 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -57,11 +57,11 @@ void tests_cleanup(void) int main(int argc, char **argv) { bool has_ben = false, mbm_test = true, mba_test = true, cqm_test = true; - int res, c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits = 5; + int res, c, cpu_no = 1, span = 250, argc_new = argc, i, no_of_bits; char *benchmark_cmd[BENCHMARK_ARGS], bw_report[64], bm_type[64]; char benchmark_cmd_area[BENCHMARK_ARGS][BENCHMARK_ARG_SIZE]; int ben_ind, ben_count; - bool cat_test = true; + bool cat_test = true, user_bits = false;

for (i = 0; i < argc; i++) { if (strcmp(argv[i], "-b") == 0) { @@ -105,6 +105,7 @@ int main(int argc, char **argv) cpu_no = atoi(optarg); break; case 'n': + user_bits = true; no_of_bits = atoi(optarg); break; case 'h': @@ -192,7 +193,7 @@ int main(int argc, char **argv)

if (cat_test) { printf("# Starting CAT test ...\n"); - res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); + res = cat_perf_miss_val(cpu_no, no_of_bits, user_bits, "L3"); printf("%sok CAT: test\n", res ? "not " : ""); cat_test_cleanup(); tests_run++;

-- 2.7.4

Reinette Chatre

10 Mar 10 Mar

10:19 p.m.

New subject: [PATCH V1 12/13] selftests/resctrl: Dynamically select buffer size for CAT test

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...

Presently, while running CAT test case, if user hasn't given any input for '-n' option, the test defaults to 5 bits to determine the buffer size that is used during test. Instead of statically running always with 5 bits, change it such that the buffer size is always half of the cache size.

This seems more appropriate as a preparation patch to not have to make so many changes on top of the earlier patches included in this series.

Reinette

Sai Praneeth Prakhya

11 Mar 11 Mar

2:52 a.m.

New subject: [PATCH V1 12/13] selftests/resctrl: Dynamically select buffer size for CAT test

Hi Reinette,

On Tue, 2020-03-10 at 15:19 -0700, Reinette Chatre wrote:

...

Hi Sai,

On 3/6/2020 7:40 PM, Sai Praneeth Prakhya wrote:

...
Presently, while running CAT test case, if user hasn't given any input for '-n' option, the test defaults to 5 bits to determine the buffer size that is used during test. Instead of statically running always with 5 bits, change it such that the buffer size is always half of the cache size.

This seems more appropriate as a preparation patch to not have to make so many changes on top of the earlier patches included in this series.

Ok.. makes sense.

Regards, Sai

Sai Praneeth Prakhya

7 Mar 7 Mar

3:40 a.m.

New subject: [PATCH V1 13/13] selftests/resctrl: Cleanup fill_buff after changing CAT test

The previous CAT test reads buffer only once and hence to accomodate this use case, name of the test case (i.e. cat) was passed as an argument to "fill_buff". Ideally, "fill_buff" doesn't need to know which test invoked it, hence, cleanup "fill_buff" and code that was carrying around this extra argument.

Signed-off-by: Sai Praneeth Prakhya sai.praneeth.prakhya@intel.com --- tools/testing/selftests/resctrl/cat_test.c | 6 +---- tools/testing/selftests/resctrl/fill_buf.c | 29 ++++++++++--------------- tools/testing/selftests/resctrl/resctrl.h | 4 ++-- tools/testing/selftests/resctrl/resctrl_tests.c | 11 +++------- tools/testing/selftests/resctrl/resctrlfs.c | 4 +--- 5 files changed, 18 insertions(+), 36 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index d1c50430ab20..b25c8f43d29c 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -169,11 +169,7 @@ static int start_noisy_process(pid_t pid, int sibling_cpu_no) if (ret) return ret;

- /* - * Passing 'cat' will not loop around buffer forever, hence don't pass - * test name - */ - ret = run_fill_buf(buf_size, 1, 1, 0, ""); + ret = run_fill_buf(buf_size, 1, 1, 0); if (ret) return ret;

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c index 0500dab90b2e..486e6b4c9924 100644 --- a/tools/testing/selftests/resctrl/fill_buf.c +++ b/tools/testing/selftests/resctrl/fill_buf.c @@ -107,16 +107,13 @@ static void fill_one_span_write(void) } }

-static int fill_cache_read(char *resctrl_val) +static int fill_cache_read(void) { int ret = 0; FILE *fp;

- while (1) { + while (1) ret = fill_one_span_read(); - if (!strcmp(resctrl_val, "cat")) - break; - }

/* Consume read result so that reading memory is not optimized out. */ fp = fopen("/dev/null", "w"); @@ -128,13 +125,10 @@ static int fill_cache_read(char *resctrl_val) return 0; }

-static int fill_cache_write(char *resctrl_val) +static int fill_cache_write(void) { - while (1) { + while (1) fill_one_span_write(); - if (!strcmp(resctrl_val, "cat")) - break; - }

return 0; } @@ -202,14 +196,14 @@ int use_buffer_once(int op) return 0; }

-int use_buffer_forever(int op, char *resctrl_val) +int use_buffer_forever(int op) { int ret;

if (op == 0) - ret = fill_cache_read(resctrl_val); + ret = fill_cache_read(); else - ret = fill_cache_write(resctrl_val); + ret = fill_cache_write();

if (ret) { printf("\n Error in fill cache read/write...\n"); @@ -221,7 +215,7 @@ int use_buffer_forever(int op, char *resctrl_val)

static int fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, - int op, char *resctrl_val) + int op) { int ret;

@@ -229,7 +223,7 @@ fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, if (ret) return ret;

- ret = use_buffer_forever(op, resctrl_val); + ret = use_buffer_forever(op); if (ret) return ret;

@@ -239,7 +233,7 @@ fill_cache(unsigned long long buf_size, int malloc_and_init, int memflush, }

int run_fill_buf(unsigned long span, int malloc_and_init_memory, - int memflush, int op, char *resctrl_val) + int memflush, int op) { unsigned long long cache_size = span; int ret; @@ -250,8 +244,7 @@ int run_fill_buf(unsigned long span, int malloc_and_init_memory, if (signal(SIGHUP, ctrl_handler) == SIG_ERR) printf("Failed to catch SIGHUP!\n");

- ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op, - resctrl_val); + ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op); if (ret) { printf("\n Error in fill cache\n"); return -1; diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 393f2f34ccac..18e27e3f71ae 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -86,9 +86,9 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags); int init_buffer(unsigned long long buf_size, int malloc_and_init, int memflush); int use_buffer_once(int op); -int use_buffer_forever(int op, char *resctrl_val); +int use_buffer_forever(int op); int run_fill_buf(unsigned long span, int malloc_and_init_memory, int memflush, - int op, char *resctrl_va); + int op); int resctrl_val(char **benchmark_cmd, struct resctrl_val_param *param); int mbm_bw_change(int span, int cpu_no, char *bw_report, char **benchmark_cmd); void tests_cleanup(void); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 4461c3dc8cce..503c68f2570f 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -141,7 +141,7 @@ int main(int argc, char **argv) benchmark_cmd[ben_count] = NULL; } else { /* If no benchmark is given by "-b" argument, use fill_buf. */ - for (i = 0; i < 6; i++) + for (i = 0; i < 5; i++) benchmark_cmd[i] = benchmark_cmd_area[i];

strcpy(benchmark_cmd[0], "fill_buf"); @@ -149,8 +149,7 @@ int main(int argc, char **argv) strcpy(benchmark_cmd[2], "1"); strcpy(benchmark_cmd[3], "1"); strcpy(benchmark_cmd[4], "0"); - strcpy(benchmark_cmd[5], ""); - benchmark_cmd[6] = NULL; + benchmark_cmd[5] = NULL; }

sprintf(bw_report, "reads"); @@ -161,8 +160,6 @@ int main(int argc, char **argv)

if (!is_amd && mbm_test) { printf("# Starting MBM BW change ...\n"); - if (!has_ben) - sprintf(benchmark_cmd[5], "%s", "mba"); res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd); printf("%sok MBM: bw change\n", res ? "not " : ""); mbm_test_cleanup(); @@ -181,10 +178,8 @@ int main(int argc, char **argv)

if (cqm_test) { printf("# Starting CQM test ...\n"); - if (!has_ben) { + if (!has_ben) sprintf(benchmark_cmd[1], "%d", span); - sprintf(benchmark_cmd[5], "%s", "cqm"); - } res = cqm_schemata_change(cpu_no, span, "L3", benchmark_cmd); printf("%sok CQM: test\n", res ? "not " : ""); cqm_test_cleanup(); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index bd81a13ff9df..dcc9e70cbf30 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -345,7 +345,6 @@ void run_benchmark(int signum, siginfo_t *info, void *ucontext) int operation, ret, malloc_and_init_memory, memflush; unsigned long span, buffer_span; char **benchmark_cmd; - char resctrl_val[64]; FILE *fp;

benchmark_cmd = info->si_ptr; @@ -364,11 +363,10 @@ void run_benchmark(int signum, siginfo_t *info, void *ucontext) malloc_and_init_memory = atoi(benchmark_cmd[2]); memflush = atoi(benchmark_cmd[3]); operation = atoi(benchmark_cmd[4]); - sprintf(resctrl_val, "%s", benchmark_cmd[5]); buffer_span = span * MB;

if (run_fill_buf(buffer_span, malloc_and_init_memory, memflush, - operation, resctrl_val)) + operation)) fprintf(stderr, "Error in running fill buffer\n"); } else { /* Execute specified benchmark */

-- 2.7.4

2130

days inactive

2134

days old

linux-kselftest-mirror@lists.linaro.org

41 comments

participants

tags (0)

participants (3)

Prakhya, Sai Praneeth
Reinette Chatre
Sai Praneeth Prakhya