[PATCH v2 0/2] selftests/resctrl: SNC kernel support discovery

List overview All Threads
Download

newer

older

Testing Quality Call notes -...

[PATCH v3 0/2] Allow userspace to...

Maciej Wieczor-Retman

15 May 2024 15 May '24

11:17 a.m.

Changes v2: - Removed patches 2 and 3 since now this part will be supported by the kernel.

Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory into multiple NUMA nodes. When enabled, NUMA-aware applications can achieve better performance on bigger server platforms.

SNC support in the kernel is currently in review [1]. With SNC enabled and kernel support in place all the tests will function normally. There might be a problem when SNC is enabled but the system is still using an older kernel version without SNC support. Currently the only message displayed in that situation is a guess that SNC might be enabled and is causing issues. That message also is displayed whenever the test fails on an Intel platform.

Add a mechanism to discover kernel support for SNC which will add more meaning and certainty to the error message.

Series was tested on Ice Lake server platforms with SNC disabled, SNC-2 and SNC-4. The tests were also ran with and without kernel support for SNC.

Series applies cleanly on kselftest/next.

[1] https://lore.kernel.org/all/20240503203325.21512-1-tony.luck@intel.com/

Previous versions of this series: [v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@intel...

Maciej Wieczor-Retman (2): selftests/resctrl: Adjust effective L3 cache size with SNC enabled selftests/resctrl: Adjust SNC support messages

tools/testing/selftests/resctrl/cat_test.c | 2 +- tools/testing/selftests/resctrl/cmt_test.c | 6 +- tools/testing/selftests/resctrl/mba_test.c | 2 + tools/testing/selftests/resctrl/mbm_test.c | 4 +- tools/testing/selftests/resctrl/resctrl.h | 8 +- tools/testing/selftests/resctrl/resctrlfs.c | 131 +++++++++++++++++++- 6 files changed, 144 insertions(+), 9 deletions(-)

-- 2.45.0

Show replies by date

Maciej Wieczor-Retman

15 May 15 May

11:18 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA nodes. Systems may support splitting into either two or four nodes.

When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3.

Detect which SNC mode is active by comparing the number of CPUs that share a cache with CPU0, with the number of CPUs on node0.

Signed-off-by: Tony Luck tony.luck@intel.com Co-developed-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com --- tools/testing/selftests/resctrl/resctrl.h | 3 ++ tools/testing/selftests/resctrl/resctrlfs.c | 59 +++++++++++++++++++++ 2 files changed, 62 insertions(+)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 00d51fa7531c..3dd5d6779786 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -11,6 +11,7 @@ #include <signal.h> #include <dirent.h> #include <stdbool.h> +#include <ctype.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <sys/mount.h> @@ -49,6 +50,7 @@ umount_resctrlfs(); \ exit(EXIT_FAILURE); \ } while (0) +#define MAX_SNC 4

/* * user_params: User supplied parameters @@ -131,6 +133,7 @@ extern pid_t bm_pid, ppid;

extern char llc_occup_path[1024];

+int snc_ways(void); int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 1cade75176eb..e4d3624a8817 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -156,6 +156,63 @@ int get_domain_id(const char *resource, int cpu_no, int *domain_id) return 0; }

+/* + * Count number of CPUs in a /sys bit map + */ +static unsigned int count_sys_bitmap_bits(char *name) +{ + FILE *fp = fopen(name, "r"); + int count = 0, c; + + if (!fp) + return 0; + + while ((c = fgetc(fp)) != EOF) { + if (!isxdigit(c)) + continue; + switch (c) { + case 'f': + count++; + case '7': case 'b': case 'd': case 'e': + count++; + case '3': case '5': case '6': case '9': case 'a': case 'c': + count++; + case '1': case '2': case '4': case '8': + count++; + } + } + fclose(fp); + + return count; +} + +/* + * Detect SNC by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. + * If some CPUs are offline the numbers may not be exact multiples of each + * other. Any offline CPUs on node0 will be also gone from shared_cpu_map of + * CPU0 but offline CPUs from other nodes will only make the cache_cpus value + * lower. Still try to get the ratio right by preventing the second possibility. + */ +int snc_ways(void) +{ + int node_cpus, cache_cpus, i; + + node_cpus = count_sys_bitmap_bits("/sys/devices/system/node/node0/cpumap"); + cache_cpus = count_sys_bitmap_bits("/sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_map"); + + if (!node_cpus || !cache_cpus) { + fprintf(stderr, "Warning could not determine Sub-NUMA Cluster mode\n"); + return 1; + } + + for (i = 1; i <= MAX_SNC ; i++) { + if (i * node_cpus >= cache_cpus) + return i; + } + + return 1; +} + /* * get_cache_size - Get cache size for a specified CPU * @cpu_no: CPU number @@ -211,6 +268,8 @@ int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size break; }

+ if (cache_num == 3) + *cache_size /= snc_ways(); return 0; }

-- 2.45.0

Luck, Tony

4:48 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

If/when my SNC patches go upstream the SNC check could become:

snc_ways=$(ls -d /sys/fs/resctrl/mon_data/mon_L3_00/mon_sub_L3_* 2>/dev/null | wc -l)

assuming you have /sys/fs/resctrl mounted.

-Tony

Maciej Wieczor-Retman

16 May 16 May

6:01 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

On 2024-05-15 at 16:48:44 +0000, Luck, Tony wrote:

...

If/when my SNC patches go upstream the SNC check could become:

snc_ways=$(ls -d /sys/fs/resctrl/mon_data/mon_L3_00/mon_sub_L3_* 2>/dev/null | wc -l)

But this won't work without your kernel patches right?

If they are already in the kernel used by the person launching the selftests then there shouldn't be any problems to report. The idea was that if the CMT/MBM/MBA selftests fail, the message with "SNC is the problem" is only displayed if SNC is enabled and there is no kernel support for SNC.

...

assuming you have /sys/fs/resctrl mounted.

-Tony

-- Kind regards Maciej Wieczór-Retman

Reinette Chatre

30 May 30 May

11:07 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hi Maciej,

Regarding shortlog: L3 cache size should no longer be adjusted when SNC is enabled. You mention that the tests are passing when running with this adjustment ... I think that this may be because the test now just runs on a smaller portion of the cache?

On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...

Sub-NUMA Cluster divides CPUs sharing an L3 cache into separate NUMA nodes. Systems may support splitting into either two or four nodes.

fyi ... from the most recent kernel submission 2, 3, or 4 nodes are possible: https://lore.kernel.org/lkml/20240528222006.58283-20-tony.luck@intel.com/

...

When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3.

This was a mistake in original implementation and no longer done.

...

Detect which SNC mode is active by comparing the number of CPUs that share a cache with CPU0, with the number of CPUs on node0.

Signed-off-by: Tony Luck tony.luck@intel.com Co-developed-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com

tools/testing/selftests/resctrl/resctrl.h | 3 ++ tools/testing/selftests/resctrl/resctrlfs.c | 59 +++++++++++++++++++++ 2 files changed, 62 insertions(+)

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 00d51fa7531c..3dd5d6779786 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -11,6 +11,7 @@ #include <signal.h> #include <dirent.h> #include <stdbool.h> +#include <ctype.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <sys/mount.h> @@ -49,6 +50,7 @@ umount_resctrlfs(); \ exit(EXIT_FAILURE); \ } while (0) +#define MAX_SNC 4 /*

user_params: User supplied parameters

@@ -131,6 +133,7 @@ extern pid_t bm_pid, ppid; extern char llc_occup_path[1024]; +int snc_ways(void); int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 1cade75176eb..e4d3624a8817 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -156,6 +156,63 @@ int get_domain_id(const char *resource, int cpu_no, int *domain_id) return 0; } +/*

Count number of CPUs in a /sys bit map

*/

+static unsigned int count_sys_bitmap_bits(char *name) +{
FILE *fp = fopen(name, "r");

int count = 0, c;

if (!fp)
return 0;
while ((c = fgetc(fp)) != EOF) {
if (!isxdigit(c))
	continue;
switch (c) {
case 'f':
	count++;
case '7': case 'b': case 'd': case 'e':
	count++;
case '3': case '5': case '6': case '9': case 'a': case 'c':
	count++;
case '1': case '2': case '4': case '8':
	count++;
}
}

fclose(fp);

return count;
+}

+/*

Detect SNC by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0.

If some CPUs are offline the numbers may not be exact multiples of each

other. Any offline CPUs on node0 will be also gone from shared_cpu_map of

CPU0 but offline CPUs from other nodes will only make the cache_cpus value

lower. Still try to get the ratio right by preventing the second possibility.

*/

+int snc_ways(void)

"ways" have a specific meaning in cache terminology. Perhaps rather something like "snc_nodes_per_cache()" or even copy the kernel's (which is still WIP though) snc_nodes_per_l3_cache()

...

+{
int node_cpus, cache_cpus, i;

node_cpus = count_sys_bitmap_bits("/sys/devices/system/node/node0/cpumap");

cache_cpus = count_sys_bitmap_bits("/sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_map");

if (!node_cpus || !cache_cpus) {
fprintf(stderr, "Warning could not determine Sub-NUMA Cluster mode\n");

The tests just use "ksft_print_msg()" for error messages. The "Warning could ..." is somewhat unexpected, perhaps just "Could not determine ..." or "Warning: Could not ..."?

...

return 1;
}

for (i = 1; i <= MAX_SNC ; i++) {
if (i * node_cpus >= cache_cpus)
	return i;
}

This is not obvious to me. From the function comments this seems to address the scenarios when CPUs from other nodes are offline. It is not clear to me how this loop addresses this. For example, let's say there are four SNC nodes associated with a cache and only the node0 CPUs are online. The above would detect this as "1", not "4", if I read this right?

I wonder if it may not be easier to just follow what the kernel does (in the new version). User space can learn the number of online and present CPUs from /sys/devices/system/cpu/online and /sys/devices/system/cpu/present respectively. A simple string compare of the contents can be used to determine if they are identical and a warning can be printed if they are not. With a warning when accurate detection cannot be done the simple check will do.

Could you please add an informational message indicating how many SNC nodes were indeed detected?

...

return 1;

+}

/*

get_cache_size - Get cache size for a specified CPU

@cpu_no: CPU number

@@ -211,6 +268,8 @@ int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size break; }
if (cache_num == 3)
*cache_size /= snc_ways();
return 0; }

I think this can be dropped.

Reinette

Luck, Tony

11:46 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

...

...
When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3.

This was a mistake in original implementation and no longer done.

My original kernel code adjusted value reported in the "size" file in resctrl. That's no longer done because the effective size depends on how applications are allocating and using memory. Since the kernel can't know that, it seemed best to just report the total size of the cache.

But I think the resctrl tests still need to take this into account when running llc_occupancy tests.

E.g. on a 2-way SNC system with a 100MB L3 cache a test that allocates memory from its local SNC node (default behavior without using libnuma) will only see 50 MB llc_occupancy with a fully populated L3 mask in the schemata file.

-Tony

Reinette Chatre

11:51 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hi Tony,

On 5/30/24 4:46 PM, Luck, Tony wrote:

...

...
...
When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3.

This was a mistake in original implementation and no longer done.

My original kernel code adjusted value reported in the "size" file in resctrl. That's no longer done because the effective size depends on how applications are allocating and using memory. Since the kernel can't know that, it seemed best to just report the total size of the cache.

But I think the resctrl tests still need to take this into account when running llc_occupancy tests.

E.g. on a 2-way SNC system with a 100MB L3 cache a test that allocates memory from its local SNC node (default behavior without using libnuma) will only see 50 MB llc_occupancy with a fully populated L3 mask in the schemata file.

This seems to contradict the "Cache and memory bandwidth allocation features continue to operate at the scope of the L3 cache." statement from [1]?

Reinette

[1] https://lore.kernel.org/lkml/20240528222006.58283-1-tony.luck@intel.com/

Luck, Tony

31 May 31 May

12:34 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

...

...
...
...
When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3.

This was a mistake in original implementation and no longer done.

My original kernel code adjusted value reported in the "size" file in resctrl. That's no longer done because the effective size depends on how applications are allocating and using memory. Since the kernel can't know that, it seemed best to just report the total size of the cache.

But I think the resctrl tests still need to take this into account when running llc_occupancy tests.

E.g. on a 2-way SNC system with a 100MB L3 cache a test that allocates memory from its local SNC node (default behavior without using libnuma) will only see 50 MB llc_occupancy with a fully populated L3 mask in the schemata file.

This seems to contradict the "Cache and memory bandwidth allocation features continue to operate at the scope of the L3 cache." statement from [1]?

I'll clean that up. MBA isn't affected. But cache allocation is affected in that the amount of cache represented by each bit in the masks in the schemata file is reduced by a factor equal to SNC nodes per L3 cache.

-Tony

...

Reinette

[1] https://lore.kernel.org/lkml/20240528222006.58283-1-tony.luck@intel.com/

Reinette Chatre

4:17 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hi Tony and Maciej,

On 5/30/24 5:34 PM, Luck, Tony wrote:

...

...
...
...
...
When SNC mode is enabled the effective amount of L3 cache available for allocation is divided by the number of nodes per L3.

This was a mistake in original implementation and no longer done.

My original kernel code adjusted value reported in the "size" file in resctrl. That's no longer done because the effective size depends on how applications are allocating and using memory. Since the kernel can't know that, it seemed best to just report the total size of the cache.

But I think the resctrl tests still need to take this into account when running llc_occupancy tests.

E.g. on a 2-way SNC system with a 100MB L3 cache a test that allocates memory from its local SNC node (default behavior without using libnuma) will only see 50 MB llc_occupancy with a fully populated L3 mask in the schemata file.

This seems to contradict the "Cache and memory bandwidth allocation features continue to operate at the scope of the L3 cache." statement from [1]?

I'll clean that up. MBA isn't affected. But cache allocation is affected in that the amount of cache represented by each bit in the masks in the schemata file is reduced by a factor equal to SNC nodes per L3 cache.

Thanks Tony. I trust that this is what Maciej intended since the change is specifically named "Adjuct _effective_ L3 cache size". I'd like to recommend that your comments be added before the change to get_cache_size() ...

/* * The amount of cache represented by each bit in the masks * in the schemata file is reduced by a factor equal to SNC * nodes per L3 cache. * E.g. on a SNC-2 system with a 100MB L3 cache a test that * allocates memory from its local SNC node (default behavior * without using libnuma) will only see 50 MB llc_occupancy * with a fully populated L3 mask in the schemata file. */

Reinette

Maciej Wieczor-Retman

25 Jun 25 Jun

11:04 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hello, sorry it took me so long to get back to this. I prepared the next version with your comments applied and Tony's replies taken into account.

I wanted to briefly discuss this before posting:

On 2024-05-30 at 16:07:29 -0700, Reinette Chatre wrote:

...

On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...
return 1;
}

for (i = 1; i <= MAX_SNC ; i++) {
if (i * node_cpus >= cache_cpus)
	return i;
}
This is not obvious to me. From the function comments this seems to address the scenarios when CPUs from other nodes are offline. It is not clear to me how this loop addresses this. For example, let's say there are four SNC nodes associated with a cache and only the node0 CPUs are online. The above would detect this as "1", not "4", if I read this right?

I wonder if it may not be easier to just follow what the kernel does (in the new version). User space can learn the number of online and present CPUs from /sys/devices/system/cpu/online and /sys/devices/system/cpu/present respectively. A simple string compare of the contents can be used to determine if they are identical and a warning can be printed if they are not. With a warning when accurate detection cannot be done the simple check will do.

Could you please add an informational message indicating how many SNC nodes were indeed detected?

Should the information "how many SNC nodes are detected?" get printed every time (by which I mean at the end of CMT and MBM tests) or only when we get the error "SNC enabled but kernel doesn't support it" happens? Of course in the first case if there is only 1 node detected nothing would be printed to avoid noise.

-- Kind regards Maciej Wieczór-Retman

Reinette Chatre

4:28 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hi Maciej,

On 6/25/24 4:04 AM, Maciej Wieczor-Retman wrote:

...

Hello, sorry it took me so long to get back to this. I prepared the next version with your comments applied and Tony's replies taken into account.

Thank you very much for sticking with this.

...

I wanted to briefly discuss this before posting:

On 2024-05-30 at 16:07:29 -0700, Reinette Chatre wrote:

...
On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...
return 1;
}

for (i = 1; i <= MAX_SNC ; i++) {
if (i * node_cpus >= cache_cpus)
	return i;
}
This is not obvious to me. From the function comments this seems to address the scenarios when CPUs from other nodes are offline. It is not clear to me how this loop addresses this. For example, let's say there are four SNC nodes associated with a cache and only the node0 CPUs are online. The above would detect this as "1", not "4", if I read this right?

I wonder if it may not be easier to just follow what the kernel does (in the new version). User space can learn the number of online and present CPUs from /sys/devices/system/cpu/online and /sys/devices/system/cpu/present respectively. A simple string compare of the contents can be used to determine if they are identical and a warning can be printed if they are not. With a warning when accurate detection cannot be done the simple check will do.

Could you please add an informational message indicating how many SNC nodes were indeed detected?
Should the information "how many SNC nodes are detected?" get printed every time (by which I mean at the end of CMT and MBM tests) or only when we get the error "SNC enabled but kernel doesn't support it" happens? Of course in the first case if there is only 1 node detected nothing would be printed to avoid noise.

I agree that it is not needed to print something about SNC if it is disabled. hmmm ... so SNC impacts every test but it is only detected by default during CAT and CMT test, with MBA and MBM "detection" only triggered if the test fails?

What if the "SNC detection" is moved to be within run_single_test() but instead of repeating the detection from scratch every time it rather works like get_vendor() where the full detection is only done on first attempt? run_single_test() can detect if SNC is enabled and (if number of SNC nodes > 1) print an informational message that is inherited by all tests. Any test that needs to know the number of SNC nodes can continue to use the same function used for detection (that only does actual detection once).

What do you think?

Reinette

Maciej Wieczor-Retman

26 Jun 26 Jun

7:09 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hello!,

On 2024-06-25 at 09:28:55 -0700, Reinette Chatre wrote:

...

Hi Maciej,

On 6/25/24 4:04 AM, Maciej Wieczor-Retman wrote:

...
Hello, sorry it took me so long to get back to this. I prepared the next version with your comments applied and Tony's replies taken into account.

Thank you very much for sticking with this.

...
I wanted to briefly discuss this before posting:

On 2024-05-30 at 16:07:29 -0700, Reinette Chatre wrote:

...
On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...
return 1;
}

for (i = 1; i <= MAX_SNC ; i++) {
if (i * node_cpus >= cache_cpus)
	return i;
}
This is not obvious to me. From the function comments this seems to address the scenarios when CPUs from other nodes are offline. It is not clear to me how this loop addresses this. For example, let's say there are four SNC nodes associated with a cache and only the node0 CPUs are online. The above would detect this as "1", not "4", if I read this right?

I wonder if it may not be easier to just follow what the kernel does (in the new version). User space can learn the number of online and present CPUs from /sys/devices/system/cpu/online and /sys/devices/system/cpu/present respectively. A simple string compare of the contents can be used to determine if they are identical and a warning can be printed if they are not. With a warning when accurate detection cannot be done the simple check will do.

Could you please add an informational message indicating how many SNC nodes were indeed detected?
Should the information "how many SNC nodes are detected?" get printed every time (by which I mean at the end of CMT and MBM tests) or only when we get the error "SNC enabled but kernel doesn't support it" happens? Of course in the first case if there is only 1 node detected nothing would be printed to avoid noise.
I agree that it is not needed to print something about SNC if it is disabled. hmmm ... so SNC impacts every test but it is only detected by default during CAT and CMT test, with MBA and MBM "detection" only triggered if the test fails?

Yes, snc_ways() ran before starting CAT and CMT to adjust cache size variable. And then after CAT,CMT,MBM and MBA if the return value indicated failure.

...

What if the "SNC detection" is moved to be within run_single_test() but instead of repeating the detection from scratch every time it rather works like get_vendor() where the full detection is only done on first attempt? run_single_test() can detect if SNC is enabled and (if number of SNC nodes > 1) print an informational message that is inherited by all tests. Any test that needs to know the number of SNC nodes can continue to use the same function used for detection (that only does actual detection once).

What do you think?

I think running the detection once at the start and then reusing the results is a good idea. You're proposing adding a value (global or passed through all the tests) that would get initialized on the first run_single_test()?

And then the SNC status (if enabled) + a warning if the detection could be wrong (because of the online/present cpus ratio) would happen before the test runs?

On the warning placement I think it should be moved out of being printed only on failure. I did some experiments using "chcpu" to enable/disable cores and then run selftests. They didn't have any problems succeeding even though SNC detection detected different mode every time (I added a printf() around the line that cache size is modified to show what SNC mode is detected). While I understand these tests shouldn't fail since they just use a different portion of the cache I think the user should be informed it's not really NUMA aware if the detection was wrong:

(this was a 2 socket machine with SNC-2 and 55296K L3 cache size)

This is without any changes: [root]# ./resctrl_tests -t CMT # dmesg: [ 11.464842] resctrl: Sub-NUMA Cluster mode detected with 2 nodes per L3 cache ... SNC NODES DETECTED : 2 # Cache size :28311552 ... # Average LLC val: 12413952 # Cache span (bytes): 11796480 ok 1 CMT: test

This is with all cores on node 1 disabled: [root]# ./resctrl_tests -t CMT # dmesg: [ 11.464842] resctrl: Sub-NUMA Cluster mode detected with 2 nodes per L3 cache ... SNC NODES DETECTED : 1 # Cache size :56623104 ... # Average LLC val: 22606848 # Cache span (bytes): 23592960 ok 1 CMT: test

And this with one core on node 0 disabled: [root]# ./resctrl_tests -t CMT # dmesg: [ 11.464842] resctrl: Sub-NUMA Cluster mode detected with 2 nodes per L3 cache ... SNC NODES DETECTED : 3 # Cache size :18874368 ... # Average LLC val: 7382016 # Cache span (bytes): 7864320 ok 1 CMT: test

CAT also succeeds although it reports bigger or smaller cache miss rates than normally:

SNC NODES DETECTED : 1 <-- all cpus on node 1 offline # Percent diff=12.7 # Percent diff=10.2 # Percent diff=7.8 # Percent diff=6.7 ok 1 L3_CAT: test

SNC NODES DETECTED : 2 <-- real # Percent diff=49.6 # Percent diff=37.8 # Percent diff=22.4 # Percent diff=16.0 ok 1 L3_CAT: test

SNC NODES DETECTED : 3 <-- one cpu on node 0 offline # Percent diff=76.6 # Percent diff=53.3 # Percent diff=35.1 # Percent diff=28.9 ok 1 L3_CAT: test

...

Reinette

-- Kind regards Maciej Wieczór-Retman

Reinette Chatre

4:46 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hi Maciej,

On 6/26/24 12:09 AM, Maciej Wieczor-Retman wrote:

...

Hello!,

On 2024-06-25 at 09:28:55 -0700, Reinette Chatre wrote:

...
Hi Maciej,

On 6/25/24 4:04 AM, Maciej Wieczor-Retman wrote:

...
Hello, sorry it took me so long to get back to this. I prepared the next version with your comments applied and Tony's replies taken into account.

Thank you very much for sticking with this.

...
I wanted to briefly discuss this before posting:

On 2024-05-30 at 16:07:29 -0700, Reinette Chatre wrote:

...
On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...
return 1;
}

for (i = 1; i <= MAX_SNC ; i++) {
if (i * node_cpus >= cache_cpus)
	return i;
}
This is not obvious to me. From the function comments this seems to address the scenarios when CPUs from other nodes are offline. It is not clear to me how this loop addresses this. For example, let's say there are four SNC nodes associated with a cache and only the node0 CPUs are online. The above would detect this as "1", not "4", if I read this right?

I wonder if it may not be easier to just follow what the kernel does (in the new version). User space can learn the number of online and present CPUs from /sys/devices/system/cpu/online and /sys/devices/system/cpu/present respectively. A simple string compare of the contents can be used to determine if they are identical and a warning can be printed if they are not. With a warning when accurate detection cannot be done the simple check will do.

Could you please add an informational message indicating how many SNC nodes were indeed detected?
Should the information "how many SNC nodes are detected?" get printed every time (by which I mean at the end of CMT and MBM tests) or only when we get the error "SNC enabled but kernel doesn't support it" happens? Of course in the first case if there is only 1 node detected nothing would be printed to avoid noise.
I agree that it is not needed to print something about SNC if it is disabled. hmmm ... so SNC impacts every test but it is only detected by default during CAT and CMT test, with MBA and MBM "detection" only triggered if the test fails?
Yes, snc_ways() ran before starting CAT and CMT to adjust cache size variable. And then after CAT,CMT,MBM and MBA if the return value indicated failure.

...
What if the "SNC detection" is moved to be within run_single_test() but instead of repeating the detection from scratch every time it rather works like get_vendor() where the full detection is only done on first attempt? run_single_test() can detect if SNC is enabled and (if number of SNC nodes > 1) print an informational message that is inherited by all tests. Any test that needs to know the number of SNC nodes can continue to use the same function used for detection (that only does actual detection once).

What do you think?

I think running the detection once at the start and then reusing the results is a good idea. You're proposing adding a value (global or passed through all the tests) that would get initialized on the first run_single_test()?

I was thinking about a solution similar to get_vendor() that uses a static local variable. A global variable could work also.

...

And then the SNC status (if enabled) + a warning if the detection could be wrong (because of the online/present cpus ratio) would happen before the test runs?

I do not think this was part of previous tests, but yes, this is a concern where a warning would be helpful.

...

On the warning placement I think it should be moved out of being printed only on failure. I did some experiments using "chcpu" to enable/disable cores and then run selftests. They didn't have any problems succeeding even though SNC detection detected different mode every time (I added a printf() around the line

I am not surprised here since there has not been much tuning of the CAT test.

...

that cache size is modified to show what SNC mode is detected). While I understand these tests shouldn't fail since they just use a different portion of the cache I think the user should be informed it's not really NUMA aware if the detection was wrong:

Seems like there are two warnings to consider: (a) SNC detection may be wrong. (b) If SNC is enabled and kernel does not support SNC then the tests may fail.

For (a) I think that it is possible to know when SNC detection may be wrong. A test similar to the kernel test that compares the "online" and "present" CPUs [1] can be used. The /sys/devices/system/cpu/online and /sys/devices/system/cpu/present files are available for this. A simpler way may be to just print a warning if /sys/devices/system/cpu/offline is not empty and set the number of SNC nodes to 1. Instead, a new "snc_unreliable" global can be set that can be used to print additional information during test failure.

I do think that it is fair to print all the SNC details during detection but I am concerned that those messages will be out of sight during test failures and I thus do think it is useful to have extra information during test failure.

Reinette

[1] https://lore.kernel.org/lkml/20240621223859.43471-18-tony.luck@intel.com/

Maciej Wieczor-Retman

27 Jun 27 Jun

9:50 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

On 2024-06-26 at 09:46:01 -0700, Reinette Chatre wrote:

...

Hi Maciej,

On 6/26/24 12:09 AM, Maciej Wieczor-Retman wrote:

...
Hello!,

On 2024-06-25 at 09:28:55 -0700, Reinette Chatre wrote:

...
Hi Maciej,

On 6/25/24 4:04 AM, Maciej Wieczor-Retman wrote:

...
Hello, sorry it took me so long to get back to this. I prepared the next version with your comments applied and Tony's replies taken into account.

Thank you very much for sticking with this.

...
I wanted to briefly discuss this before posting:

On 2024-05-30 at 16:07:29 -0700, Reinette Chatre wrote:

...
On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...
return 1;
}

for (i = 1; i <= MAX_SNC ; i++) {
if (i * node_cpus >= cache_cpus)
	return i;
}
This is not obvious to me. From the function comments this seems to address the scenarios when CPUs from other nodes are offline. It is not clear to me how this loop addresses this. For example, let's say there are four SNC nodes associated with a cache and only the node0 CPUs are online. The above would detect this as "1", not "4", if I read this right?

I wonder if it may not be easier to just follow what the kernel does (in the new version). User space can learn the number of online and present CPUs from /sys/devices/system/cpu/online and /sys/devices/system/cpu/present respectively. A simple string compare of the contents can be used to determine if they are identical and a warning can be printed if they are not. With a warning when accurate detection cannot be done the simple check will do.

Could you please add an informational message indicating how many SNC nodes were indeed detected?
Should the information "how many SNC nodes are detected?" get printed every time (by which I mean at the end of CMT and MBM tests) or only when we get the error "SNC enabled but kernel doesn't support it" happens? Of course in the first case if there is only 1 node detected nothing would be printed to avoid noise.
I agree that it is not needed to print something about SNC if it is disabled. hmmm ... so SNC impacts every test but it is only detected by default during CAT and CMT test, with MBA and MBM "detection" only triggered if the test fails?
Yes, snc_ways() ran before starting CAT and CMT to adjust cache size variable. And then after CAT,CMT,MBM and MBA if the return value indicated failure.

...
What if the "SNC detection" is moved to be within run_single_test() but instead of repeating the detection from scratch every time it rather works like get_vendor() where the full detection is only done on first attempt? run_single_test() can detect if SNC is enabled and (if number of SNC nodes > 1) print an informational message that is inherited by all tests. Any test that needs to know the number of SNC nodes can continue to use the same function used for detection (that only does actual detection once).

What do you think?

I think running the detection once at the start and then reusing the results is a good idea. You're proposing adding a value (global or passed through all the tests) that would get initialized on the first run_single_test()?
I was thinking about a solution similar to get_vendor() that uses a static local variable. A global variable could work also.

Oh yes, static local is good too, I'll try that.

...

...
And then the SNC status (if enabled) + a warning if the detection could be wrong (because of the online/present cpus ratio) would happen before the test runs?

I do not think this was part of previous tests, but yes, this is a concern where a warning would be helpful.

...
On the warning placement I think it should be moved out of being printed only on failure. I did some experiments using "chcpu" to enable/disable cores and then run selftests. They didn't have any problems succeeding even though SNC detection detected different mode every time (I added a printf() around the line

I am not surprised here since there has not been much tuning of the CAT test.

...
that cache size is modified to show what SNC mode is detected). While I understand these tests shouldn't fail since they just use a different portion of the cache I think the user should be informed it's not really NUMA aware if the detection was wrong:

Seems like there are two warnings to consider: (a) SNC detection may be wrong. (b) If SNC is enabled and kernel does not support SNC then the tests may fail.

For (a) I think that it is possible to know when SNC detection may be wrong. A test similar to the kernel test that compares the "online" and "present" CPUs [1] can be used. The /sys/devices/system/cpu/online and /sys/devices/system/cpu/present files are available for this. A simpler way may be to just print a warning if /sys/devices/system/cpu/offline is not empty and set the number of SNC nodes to 1. Instead, a new "snc_unreliable" global can be set that can be used to print additional information during test failure.

The empty offline file is a cool idea, less files to open :)

...

I do think that it is fair to print all the SNC details during detection but I am concerned that those messages will be out of sight during test failures and I thus do think it is useful to have extra information during test failure.

Yeah, I've been thinking about what is the best way to display these for a while. Maybe you're right that messages at the top will be lost. What about this set of messages:

1. First run of run_single_test() 1.1. For all tests: - detected snc mode (if > 1) - check if cpu/offline file is empty, set the global variable and print a message saying snc mode might be wrong

2. At the end of tests 2.1. For CMT, CAT, MBM, MBA: - test failed - snc detection reports it's enabled - kernel version doesn't support snc

2.2. Additional message for CMT, CAT (since the cache size is divided): - test failed or succeeded - snc detection reports the offline file is not empty - kernel version supports snc

The 1. message will be printed at the top since it's more informational (what is the SNC mode?) and then 2. messages will deal with possible issues / failures and will be nicely visible at the end. What do you think about this?

...

Reinette

[1] https://lore.kernel.org/lkml/20240621223859.43471-18-tony.luck@intel.com/

-- Kind regards Maciej Wieczór-Retman

Reinette Chatre

4:30 p.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

Hi Maciej

On 6/27/24 2:50 AM, Maciej Wieczor-Retman wrote:

...

Yeah, I've been thinking about what is the best way to display these for a while. Maybe you're right that messages at the top will be lost. What about this set of messages:

First run of run_single_test() 1.1. For all tests:

detected snc mode (if > 1)

check if cpu/offline file is empty, set the global variable and print a message saying snc mode might be wrong

When SNC detection is considered unreliable, everything else becomes unreliable also since kernel support for SNC is only visible (in future kernels) when SNC is enabled. I thus think that if it is found that SNC detection may be unreliable then the number of SNC nodes should be hardcoded to 1 and a default message about possible interference by SNC should be printed at all test failures.

...

At the end of tests 2.1. For CMT, CAT, MBM, MBA:

test failed

snc detection reports it's enabled

kernel version doesn't support snc

Sounds like the "all goes well" scenario when SNC support is reliably detected.

...

2.2. Additional message for CMT, CAT (since the cache size is divided): - test failed or succeeded - snc detection reports the offline file is not empty - kernel version supports snc

I am not able to follow what happens in these scenarios.

...

The 1. message will be printed at the top since it's more informational (what is the SNC mode?) and then 2. messages will deal with possible issues / failures and will be nicely visible at the end. What do you think about this?

It is not obvious to me what the messages may be but the times/locations when messages are printed sounds good to me.

Thank you

Reinette

Maciej Wieczor-Retman

28 Jun 28 Jun

7:52 a.m.

New subject: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

On 2024-06-27 at 09:30:24 -0700, Reinette Chatre wrote:

...

Hi Maciej

On 6/27/24 2:50 AM, Maciej Wieczor-Retman wrote:

...
Yeah, I've been thinking about what is the best way to display these for a while. Maybe you're right that messages at the top will be lost. What about this set of messages:

First run of run_single_test() 1.1. For all tests:

detected snc mode (if > 1)

check if cpu/offline file is empty, set the global variable and print a message saying snc mode might be wrong

When SNC detection is considered unreliable, everything else becomes unreliable also since kernel support for SNC is only visible (in future kernels) when SNC is enabled. I thus think that if it is found that SNC detection may be unreliable then the number of SNC nodes should be hardcoded to 1 and a default message about possible interference by SNC should be printed at all test failures.

Okay, that sounds good.

...

...

At the end of tests 2.1. For CMT, CAT, MBM, MBA:

test failed

snc detection reports it's enabled

kernel version doesn't support snc

Sounds like the "all goes well" scenario when SNC support is reliably detected.

This was supposed to be the error message that was already there before - now I'd just add the information about whether SNC was actually enabled.

...

...
2.2. Additional message for CMT, CAT (since the cache size is divided): - test failed or succeeded - snc detection reports the offline file is not empty - kernel version supports snc

I am not able to follow what happens in these scenarios.

And this I intended as an explanation to the example I mentioned earlier - the test succeeds but the cache size was miscalculated due to offline cpus. But after applying your suggestion above to just set the snc mode to 1 when SNC detection is unreliable I guess this doesn't matter anymore.

...

...
The 1. message will be printed at the top since it's more informational (what is the SNC mode?) and then 2. messages will deal with possible issues / failures and will be nicely visible at the end. What do you think about this?

It is not obvious to me what the messages may be but the times/locations when messages are printed sounds good to me.

Thank you

Thanks for all the tips and great ideas, I hope to send the next version between monday and wednesday after trying to trigger all possible corner cases.

...

Reinette

-- Kind regards Maciej Wieczór-Retman

Maciej Wieczor-Retman

15 May 15 May

11:18 a.m.

New subject: [PATCH v2 2/2] selftests/resctrl: Adjust SNC support messages

Resctrl selftest prints a message on test failure that Sub-Numa Clustering (SNC) could be enabled and points the user to check theirs BIOS settings. No actual check is performed before printing that message so it is not very accurate in pinpointing a problem.

Figuring out if SNC is enabled is only one part of the problem, the other being whether the kernel supports it. As there is no easy interface that simply states SNC support in the kernel one can find that information by comparing L3 cache sizes from different sources. Cache size reported by /sys/devices/system/node/node0/cpu0/cache/index3/size will always show the full cache size even if it's split by enabled SNC. On the other hand /sys/fs/resctrl/size has information about L3 size, that with kernel support is adjusted for enabled SNC.

Add a function to find a cache size from /sys/fs/resctrl/size since finding that information from the other source is already implemented.

Add a function that compares the two cache sizes and use it to make the SNC support message more meaningful.

Add the SNC support message just after MBA's check_results() since MBA shares code with MBM and also can suffer from enabled SNC if there is no support in the kernel.

Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com --- Changelog v2: - Move snc_ways() checks from individual tests into snc_kernel_support(). - Write better comment for snc_kernel_support().

tools/testing/selftests/resctrl/cat_test.c | 2 +- tools/testing/selftests/resctrl/cmt_test.c | 6 +- tools/testing/selftests/resctrl/mba_test.c | 2 + tools/testing/selftests/resctrl/mbm_test.c | 4 +- tools/testing/selftests/resctrl/resctrl.h | 5 +- tools/testing/selftests/resctrl/resctrlfs.c | 72 ++++++++++++++++++++- 6 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index c7686fb6641a..722b4fcaf788 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -253,7 +253,7 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param return ret;

/* Get L3/L2 cache size */ - ret = get_cache_size(uparams->cpu, test->resource, &cache_total_size); + ret = get_sys_cache_size(uparams->cpu, test->resource, &cache_total_size); if (ret) return ret; ksft_print_msg("Cache size :%lu\n", cache_total_size); diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c index a44e6fcd37b7..0ff232d38c26 100644 --- a/tools/testing/selftests/resctrl/cmt_test.c +++ b/tools/testing/selftests/resctrl/cmt_test.c @@ -112,7 +112,7 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param if (ret) return ret;

- ret = get_cache_size(uparams->cpu, "L3", &cache_total_size); + ret = get_sys_cache_size(uparams->cpu, "L3", &cache_total_size); if (ret) return ret; ksft_print_msg("Cache size :%lu\n", cache_total_size); @@ -157,8 +157,8 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param goto out;

ret = check_results(&param, span, n); - if (ret && (get_vendor() == ARCH_INTEL)) - ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); + if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support()) + ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled. Check BIOS configuration.\n");

out: free(span_str); diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c index 5d6af9e8afed..74e1ebb14904 100644 --- a/tools/testing/selftests/resctrl/mba_test.c +++ b/tools/testing/selftests/resctrl/mba_test.c @@ -161,6 +161,8 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param return ret;

ret = check_results(); + if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support()) + ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled. Check BIOS configuration.\n");

return ret; } diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c index 3059ccc51a5a..e542938272f9 100644 --- a/tools/testing/selftests/resctrl/mbm_test.c +++ b/tools/testing/selftests/resctrl/mbm_test.c @@ -129,8 +129,8 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param return ret;

ret = check_results(DEFAULT_SPAN); - if (ret && (get_vendor() == ARCH_INTEL)) - ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); + if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support()) + ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled. Check BIOS configuration.\n");

return ret; } diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index 3dd5d6779786..2bd7c3f71733 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -28,6 +28,7 @@ #define RESCTRL_PATH "/sys/fs/resctrl" #define PHYS_ID_PATH "/sys/devices/system/cpu/cpu" #define INFO_PATH "/sys/fs/resctrl/info" +#define SIZE_PATH "/sys/fs/resctrl/size"

/* * CPU vendor IDs @@ -165,12 +166,14 @@ unsigned long create_bit_mask(unsigned int start, unsigned int len); unsigned int count_contiguous_bits(unsigned long val, unsigned int *start); int get_full_cbm(const char *cache_type, unsigned long *mask); int get_mask_no_shareable(const char *cache_type, unsigned long *mask); -int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size); int resource_info_unsigned_get(const char *resource, const char *filename, unsigned int *val); +int get_sys_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size); +int get_resctrl_cache_size(const char *cache_type, unsigned long *cache_size); void ctrlc_handler(int signum, siginfo_t *info, void *ptr); int signal_handler_register(const struct resctrl_test *test); void signal_handler_unregister(void); unsigned int count_bits(unsigned long n); +int snc_kernel_support(void);

void perf_event_attr_initialize(struct perf_event_attr *pea, __u64 config); void perf_event_initialize_read_format(struct perf_event_read *pe_read); diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index e4d3624a8817..88f97db72246 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -214,14 +214,14 @@ int snc_ways(void) }

/* - * get_cache_size - Get cache size for a specified CPU + * get_sys_cache_size - Get cache size for a specified CPU * @cpu_no: CPU number * @cache_type: Cache level L2/L3 * @cache_size: pointer to cache_size * * Return: = 0 on success, < 0 on failure. */ -int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size) +int get_sys_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size) { char cache_path[1024], cache_str[64]; int length, i, cache_num; @@ -273,6 +273,44 @@ int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size return 0; }

+/* + * get_resctrl_cache_size - Get cache size as reported by resctrl + * @cache_type: Cache level L2/L3 + * @cache_size: pointer to cache_size + * + * Return: = 0 on success, < 0 on failure. + */ +int get_resctrl_cache_size(const char *cache_type, unsigned long *cache_size) +{ + char line[256], cache_prefix[16], *stripped_line, *token; + size_t len; + FILE *fp; + + strcpy(cache_prefix, cache_type); + strncat(cache_prefix, ":", 1); + + fp = fopen(SIZE_PATH, "r"); + if (!fp) { + ksft_print_msg("Failed to open %s : '%s'\n", + SIZE_PATH, strerror(errno)); + return -1; + } + + while (fgets(line, sizeof(line), fp)) { + stripped_line = strstr(line, cache_prefix); + + if (stripped_line) { + len = strlen(cache_prefix); + stripped_line += len; + token = strtok(stripped_line, ";"); + if (sscanf(token, "0=%lu", cache_size) <= 0) + return -1; + } + } + fclose(fp); + return 0; +} + #define CORE_SIBLINGS_PATH "/sys/bus/cpu/devices/cpu"

/* @@ -935,3 +973,33 @@ unsigned int count_bits(unsigned long n)

return count; } + +/** + * snc_kernel_support - Compare system reported cache size and resctrl + * reported cache size to get an idea if SNC is supported on the kernel side. + * + * Return: 0 if not supported, 1 if SNC is disabled or SNC is both enabled and + * supported, < 0 on failure. + */ +int snc_kernel_support(void) +{ + unsigned long resctrl_cache_size, node_cache_size; + int ret; + + /* If SNC is disabled then its kernel support isn't important. */ + if (snc_ways() == 1) + return 1; + + ret = get_sys_cache_size(0, "L3", &node_cache_size); + if (ret < 0) + return ret; + + ret = get_resctrl_cache_size("L3", &resctrl_cache_size); + if (ret < 0) + return ret; + + if (resctrl_cache_size == node_cache_size) + return 1; + + return 0; +}

-- 2.45.0

Reinette Chatre

30 May 30 May

11:07 p.m.

New subject: [PATCH v2 2/2] selftests/resctrl: Adjust SNC support messages

Hi Maciej,

On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...

Resctrl selftest prints a message on test failure that Sub-Numa Clustering (SNC) could be enabled and points the user to check theirs BIOS settings. No actual check is performed before printing that message so it is not very accurate in pinpointing a problem.

Figuring out if SNC is enabled is only one part of the problem, the other being whether the kernel supports it. As there is no easy interface that simply states SNC support in the kernel one can find that information by comparing L3 cache sizes from different sources. Cache size reported by /sys/devices/system/node/node0/cpu0/cache/index3/size will always show the full cache size even if it's split by enabled SNC. On the other hand /sys/fs/resctrl/size has information about L3 size, that with kernel support is adjusted for enabled SNC.

Add a function to find a cache size from /sys/fs/resctrl/size since finding that information from the other source is already implemented.

Add a function that compares the two cache sizes and use it to make the SNC support message more meaningful.

Please note that the new version of SNC kernel support ([1]) that this series is based on no longer adjusts the cache size. Detecting kernel support for SNC (if the new solution is accepted) can be done with the test for the existence of the files Tony mentioned in [2].

...

Add the SNC support message just after MBA's check_results() since MBA shares code with MBM and also can suffer from enabled SNC if there is no support in the kernel.

Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com

Reinette

[1] https://lore.kernel.org/all/20240503203325.21512-1-tony.luck@intel.com/ [2] https://lore.kernel.org/lkml/SJ1PR11MB6083320F30DBCBB59574F0BDFCEC2@SJ1PR11M...

Maciej Wieczor-Retman

31 May 31 May

6:39 a.m.

New subject: [PATCH v2 2/2] selftests/resctrl: Adjust SNC support messages

Hello!

On 2024-05-30 at 16:07:34 -0700, Reinette Chatre wrote:

...

Hi Maciej,

On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:

...
Resctrl selftest prints a message on test failure that Sub-Numa Clustering (SNC) could be enabled and points the user to check theirs BIOS settings. No actual check is performed before printing that message so it is not very accurate in pinpointing a problem.

Figuring out if SNC is enabled is only one part of the problem, the other being whether the kernel supports it. As there is no easy interface that simply states SNC support in the kernel one can find that information by comparing L3 cache sizes from different sources. Cache size reported by /sys/devices/system/node/node0/cpu0/cache/index3/size will always show the full cache size even if it's split by enabled SNC. On the other hand /sys/fs/resctrl/size has information about L3 size, that with kernel support is adjusted for enabled SNC.

Add a function to find a cache size from /sys/fs/resctrl/size since finding that information from the other source is already implemented.

Add a function that compares the two cache sizes and use it to make the SNC support message more meaningful.

Please note that the new version of SNC kernel support ([1]) that this series is based on no longer adjusts the cache size. Detecting kernel support for SNC (if the new solution is accepted) can be done with the test for the existence of the files Tony mentioned in [2].

Thank you for your comments on both patches, I don't know how I missed this fact. I'll revise my patches accordingly.

...

...
Add the SNC support message just after MBA's check_results() since MBA shares code with MBM and also can suffer from enabled SNC if there is no support in the kernel.

Signed-off-by: Maciej Wieczor-Retman maciej.wieczor-retman@intel.com

Reinette

[1] https://lore.kernel.org/all/20240503203325.21512-1-tony.luck@intel.com/ [2] https://lore.kernel.org/lkml/SJ1PR11MB6083320F30DBCBB59574F0BDFCEC2@SJ1PR11M...

-- Kind regards Maciej Wieczór-Retman

558

days inactive

602

days old

linux-kselftest-mirror@lists.linaro.org

18 comments

participants

tags (0)

participants (3)

Luck, Tony
Maciej Wieczor-Retman
Reinette Chatre