The resctrl selftest currently exhibits several failures on Hygon CPUs due to missing vendor detection and edge-case handling specific to Hygon's architecture.
This patch series addresses three distinct issues: 1. Missing CPU vendor detection, causing the test to fail with "# Can not get vendor info..." on Hygon CPUs. 2. A division-by-zero crash in SNC detection on Hygon CPUs. 3. Incorrect handling of non-contiguous CBM support on Hygon CPUs.
These changes enable resctrl selftest to run successfully on Hygon CPUs that support Platform QoS features.
Changelog: v3: - Patch 1: 1. Update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions (Fenghua). 2. Split the code changes of "define CPU vendor IDs as bits to match usage" from original patch 1 into a separate patch (this patch, suggested by Fenghua and Reinette). 3. Introduce the flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic (Reinette). - Patch 2 (original patch 1): 1. Move the code changes of "define CPU vendor IDs as bits to match usage" into patch 1. - Patch 3 (original patch 2): 1. Fix a nit of code comment for affected platforms (Fenghua). 2. Add Reviewed-by: Fenghua Yu fenghuay@nvidia.com. - Patch 4 (original patch 3): 1. Fix a nit to avoid calling get_vendor() twice (Fenghua). 2. Add Reviewed-by: Fenghua Yu fenghuay@nvidia.com.
v2: - Patch 1: switch all of the vendor id bitmasks to use BIT() (Reinette) - Patch 2: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com - Patch 3: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com add a maintainer note to highlight it is not a candidate for backport (Reinette)
Xiaochen Shen (4): selftests/resctrl: Define CPU vendor IDs as bits to match usage selftests/resctrl: Add CPU vendor detection for Hygon selftests/resctrl: Fix a division by zero error on Hygon selftests/resctrl: Fix non-contiguous CBM check for Hygon
tools/testing/selftests/resctrl/cat_test.c | 6 ++-- tools/testing/selftests/resctrl/resctrl.h | 8 ++++-- .../testing/selftests/resctrl/resctrl_tests.c | 28 +++++++++++++------ tools/testing/selftests/resctrl/resctrlfs.c | 10 +++++++ 4 files changed, 39 insertions(+), 13 deletions(-)
The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific
However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2
A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs.
Accordingly, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Furthermore, introduce a bool flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic. This ensures the vendor ID is detected only once and resolves the ambiguity of using the same variable 'vendor' both as a value and as a state.
Suggested-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Fenghua Yu fenghuay@nvidia.com Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net --- tools/testing/selftests/resctrl/resctrl.h | 7 ++--- .../testing/selftests/resctrl/resctrl_tests.c | 26 +++++++++++++------ 2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..d0f094360e6f 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h"
#define MB (1024 * 1024) @@ -36,8 +37,8 @@ * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1)
#define END_OF_TESTS 1
@@ -163,7 +164,7 @@ extern int snc_unreliable; extern char llc_occup_path[1024];
int snc_nodes_per_l3_cache(void); -int get_vendor(void); +unsigned int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..08cbd094e936 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -23,16 +23,24 @@ static struct resctrl_test *resctrl_tests[] = { &l2_noncont_cat_test, };
-static int detect_vendor(void) +static unsigned int detect_vendor(void) { - FILE *inf = fopen("/proc/cpuinfo", "r"); - int vendor_id = 0; + static bool initialized; + static unsigned int vendor_id; + FILE *inf; char *s = NULL; char *res;
- if (!inf) + if (initialized) return vendor_id;
+ inf = fopen("/proc/cpuinfo", "r"); + if (!inf) { + vendor_id = 0; + initialized = true; + return vendor_id; + } + res = fgrep(inf, "vendor_id");
if (res) @@ -45,15 +53,17 @@ static int detect_vendor(void)
fclose(inf); free(res); + + initialized = true; return vendor_id; }
-int get_vendor(void) +unsigned int get_vendor(void) { - static int vendor = -1; + unsigned int vendor; + + vendor = detect_vendor();
- if (vendor == -1) - vendor = detect_vendor(); if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net --- tools/testing/selftests/resctrl/resctrl.h | 1 + tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index d0f094360e6f..2817c9e41797 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -39,6 +39,7 @@ */ #define ARCH_INTEL BIT(0) #define ARCH_AMD BIT(1) +#define ARCH_HYGON BIT(2)
#define END_OF_TESTS 1
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 08cbd094e936..92cc6aaef338 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -50,6 +50,8 @@ static unsigned int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD; + else if (s && !strcmp(s, ": HygonGenuine\n")) + vendor_id = ARCH_HYGON;
fclose(inf); free(res);
Commit
a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with: "Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform.
Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com --- tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 195f04c4d158..b9c1bfb6cc02 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus;
+ /* + * On some platforms (e.g. Hygon), + * cache_cpus < node_cpus, the calculated snc_mode is 0. + * + * Set snc_mode = 1 to indicate that SNC mode is not + * supported on the platform. + */ + if (!snc_mode) + snc_mode = 1; + if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Fenghua Yu fenghuay@nvidia.com --- Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..f00b622c1460 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,10 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param
static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + unsigned int vendor_id = get_vendor(); + + /* AMD and Hygon always support non-contiguous CBM. */ + if (vendor_id == ARCH_AMD || vendor_id == ARCH_HYGON) return true;
#if defined(__i386__) || defined(__x86_64__) /* arch */
Hi Xiaochen,
On 12/10/25 10:46 PM, Xiaochen Shen wrote:
The CPU vendor IDs are required to be unique bits because they're used for vendor_specific bitmask in the struct resctrl_test. Consider for example their usage in test_vendor_specific_check(): return get_vendor() & test->vendor_specific
However, the definitions of CPU vendor IDs in file resctrl.h is quite subtle as a bitmask value: #define ARCH_INTEL 1 #define ARCH_AMD 2
A clearer and more maintainable approach is to define these CPU vendor IDs using BIT(). This ensures each vendor corresponds to a distinct bit and makes it obvious when adding new vendor IDs.
Accordingly, update the return types of detect_vendor() and get_vendor() from 'int' to 'unsigned int' to align with their usage as bitmask values and to prevent potentially risky type conversions.
Furthermore, introduce a bool flag 'initialized' to simplify the get_vendor() -> detect_vendor() logic. This ensures the vendor ID is detected only once and resolves the ambiguity of using the same variable 'vendor' both as a value and as a state.
Suggested-by: Reinette Chatre reinette.chatre@intel.com Suggested-by: Fenghua Yu fenghuay@nvidia.com Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net
tools/testing/selftests/resctrl/resctrl.h | 7 ++--- .../testing/selftests/resctrl/resctrl_tests.c | 26 +++++++++++++------ 2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..d0f094360e6f 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h" #define MB (1024 * 1024)
I tried this series against latest upstream kernel and found a conflict with some recent kselftest refactoring via commit e6fbd1759c9e ("selftests: complete kselftest include centralization").
Usually the strategy for resctrl tests is to base them on "next" branch of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git ... but I notice that the conflicting change was routed differently and thus difficult to have anticipated.
Since we are in merge window the maintainer repos are not ready for new features yet. Until the repo is ready, could you please base on latest upstream?
Looking at the series it is not obvious how you want these patches handled though. Patch #3 is the only one with a "Fixes:" tag (and thus candidate for automatic backport) but it is in the middle of the series. It is usually best to have fixes at beginning of series to simplify their handling. Even so, all patches are fixes but only patch #4 has a note not to consider for backport. Could you please consider how you want these patches handled, communicate that clearly in cover letter, and re-organize the series to have the ones needing backport to be at beginning of series?
@@ -36,8 +37,8 @@
- Define as bits because they're used for vendor_specific bitmask in
- the struct resctrl_test.
*/ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1) #define END_OF_TESTS 1 @@ -163,7 +164,7 @@ extern int snc_unreliable; extern char llc_occup_path[1024]; int snc_nodes_per_l3_cache(void); -int get_vendor(void); +unsigned int get_vendor(void); bool check_resctrlfs_support(void); int filter_dmesg(void); int get_domain_id(const char *resource, int cpu_no, int *domain_id); diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..08cbd094e936 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -23,16 +23,24 @@ static struct resctrl_test *resctrl_tests[] = { &l2_noncont_cat_test, }; -static int detect_vendor(void) +static unsigned int detect_vendor(void) {
- FILE *inf = fopen("/proc/cpuinfo", "r");
- int vendor_id = 0;
- static bool initialized;
- static unsigned int vendor_id;
- FILE *inf;
Please maintain the reverse fir ordering.
char *s = NULL; char *res;
- if (!inf)
- if (initialized) return vendor_id;
- inf = fopen("/proc/cpuinfo", "r");
- if (!inf) {
vendor_id = 0;initialized = true;return vendor_id;- }
- res = fgrep(inf, "vendor_id");
if (res) @@ -45,15 +53,17 @@ static int detect_vendor(void) fclose(inf); free(res);
- initialized = true; return vendor_id;
} -int get_vendor(void) +unsigned int get_vendor(void) {
- static int vendor = -1;
- unsigned int vendor;
- vendor = detect_vendor();
- if (vendor == -1)
if (vendor == 0) ksft_print_msg("Can not get vendor info...\n");vendor = detect_vendor();
Patch looks good to me.
Thank you
Reinette
Hi Xiaochen,
On 12/10/25 10:46 PM, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net
Changelog is clear this is a fix, does patch need backport?
Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Reinette
linux-kselftest-mirror@lists.linaro.org