The resctrl selftest currently exhibits several failures on Hygon CPUs due to missing vendor detection and edge-case handling specific to Hygon's architecture.
This patch series addresses three distinct issues: 1. Missing CPU vendor detection, causing the test to fail with "# Can not get vendor info..." on Hygon CPUs. 2. A division-by-zero crash in SNC detection on Hygon CPUs. 3. Incorrect handling of non-contiguous CBM support on Hygon CPUs.
These changes enable resctrl selftest to run successfully on Hygon CPUs that support Platform QoS features.
Changelog: v2: - Patch 1: switch all of the vendor id bitmasks to use BIT() (Reinette) - Patch 2: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com - Patch 3: add Reviewed-by: Reinette Chatre reinette.chatre@intel.com add a maintainer note to highlight it is not a candidate for backport (Reinette)
Xiaochen Shen (3): selftests/resctrl: Add CPU vendor detection for Hygon selftests/resctrl: Fix a division by zero error on Hygon selftests/resctrl: Fix non-contiguous CBM check for Hygon
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- tools/testing/selftests/resctrl/resctrl.h | 6 ++++-- tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 4 files changed, 18 insertions(+), 4 deletions(-)
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net --- tools/testing/selftests/resctrl/resctrl.h | 6 ++++-- tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..411ee10380a5 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h"
#define MB (1024 * 1024) @@ -36,8 +37,9 @@ * Define as bits because they're used for vendor_specific bitmask in * the struct resctrl_test. */ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1) +#define ARCH_HYGON BIT(2)
#define END_OF_TESTS 1
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..9bf35f3beb6b 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -42,6 +42,8 @@ static int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD; + else if (s && !strcmp(s, ": HygonGenuine\n")) + vendor_id = ARCH_HYGON;
fclose(inf); free(res);
Hi, Xiaochen,
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that support Platform QoS features, printing the error:
"# Can not get vendor info..."
This occurs because vendor detection is missing for Hygon CPUs.
Fix this by extending the CPU vendor detection logic to include Hygon's vendor ID.
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net
tools/testing/selftests/resctrl/resctrl.h | 6 ++++-- tools/testing/selftests/resctrl/resctrl_tests.c | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index cd3adfc14969..411ee10380a5 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -23,6 +23,7 @@ #include <asm/unistd.h> #include <linux/perf_event.h> #include <linux/compiler.h> +#include <linux/bits.h> #include "../kselftest.h" #define MB (1024 * 1024) @@ -36,8 +37,9 @@
- Define as bits because they're used for vendor_specific bitmask in
- the struct resctrl_test.
*/ -#define ARCH_INTEL 1 -#define ARCH_AMD 2 +#define ARCH_INTEL BIT(0) +#define ARCH_AMD BIT(1) +#define ARCH_HYGON BIT(2) #define END_OF_TESTS 1 diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 5154ffd821c4..9bf35f3beb6b 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -42,6 +42,8 @@ static int detect_vendor(void) vendor_id = ARCH_INTEL; else if (s && !strcmp(s, ": AuthenticAMD\n")) vendor_id = ARCH_AMD;
- else if (s && !strcmp(s, ": HygonGenuine\n"))
vendor_id = ARCH_HYGON;
Since vendor_id is bitmask now and BIT() is a UL value, it's better to define it as "unsigned int" (unsigned long is a bit overkill). Otherwise, type conversion may be risky.
Is it better to change vendor_id as "unsigned int", static unsigned int detect_vendor(), and a couple of other places?
fclose(inf); free(res);
Thanks. -Fenghua
Commit
a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with: "Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform.
Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com --- tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 195f04c4d158..2b075e7334bf 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus;
+ /* + * On certain Hygon platforms: + * cache_cpus < node_cpus, the calculated snc_mode is 0. + * + * Set snc_mode = 1 to indicate that SNC mode is not + * supported on the platform. + */ + if (!snc_mode) + snc_mode = 1; + if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
Hi, Xiaochen,
On 12/5/25 01:25, Xiaochen Shen wrote:
Commit
a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
introduced the snc_nodes_per_l3_cache() function to detect the Intel Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs sharing LLC with CPU0. The function was designed to return: (1) >1: SNC mode is enabled. (2) 1: SNC mode is not enabled or not supported.
However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually less than #CPUs in node0. This results in snc_nodes_per_l3_cache() returning 0 (calculated as cache_cpus / node_cpus).
This leads to a division by zero error in get_cache_size(): *cache_size /= snc_nodes_per_l3_cache();
Causing the resctrl selftest to fail with: "Floating point exception (core dumped)"
Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC mode is not supported on the platform.
Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled") Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c index 195f04c4d158..2b075e7334bf 100644 --- a/tools/testing/selftests/resctrl/resctrlfs.c +++ b/tools/testing/selftests/resctrl/resctrlfs.c @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void) } snc_mode = cache_cpus / node_cpus;
/** On certain Hygon platforms:
nit. This situation could happen on other platforms than Hygon. Maybe it's better to have a more generic comment here? * On some platforms (e.g. Hygon),
Reviewed-by: Fenghua Yu fenghuay@nvidia.com
* cache_cpus < node_cpus, the calculated snc_mode is 0.** Set snc_mode = 1 to indicate that SNC mode is not* supported on the platform.*/if (!snc_mode)snc_mode = 1;- if (snc_mode > 1) ksft_print_msg("SNC-%d mode discovered.\n", snc_mode); }
Thanks. -Fenghua
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com --- Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param
static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON) return true;
#if defined(__i386__) || defined(__x86_64__) /* arch */
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param static bool arch_supports_noncont_cat(const struct resctrl_test *test) {
- /* AMD always supports non-contiguous CBM. */
- if (get_vendor() == ARCH_AMD)
- /* AMD and Hygon always supports non-contiguous CBM. */
- if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
unsigned int vendor_id = get_vendor();
if (vendor_id == ARCH_AMD || vendor_id == ARCH_HYGON)
return true;#if defined(__i386__) || defined(__x86_64__) /* arch */
Reviewed-by: Fenghua Yu fenghuay@nvidia.com
Thanks.
-Fenghua
Hi Fenghua,
On 12/5/25 11:39 AM, Fenghua Yu wrote:
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
Are you perhaps referring to detect_vendor()? detect_vendor() does the actual digging to determine the vendor ID and is indeed called just once by get_vendor(). In subsequent calls get_vendor() just returns the static ID.
Reinette
Hi, Reinette,
On 12/5/25 13:30, Reinette Chatre wrote:
Hi Fenghua,
On 12/5/25 11:39 AM, Fenghua Yu wrote:
On 12/5/25 01:25, Xiaochen Shen wrote:
The resctrl selftest currently fails on Hygon CPUs that always supports non-contiguous CBM, printing the error:
"# Hardware and kernel differ on non-contiguous CBM support!"
This occurs because the arch_supports_noncont_cat() function lacks vendor detection for Hygon CPUs, preventing proper identification of their non-contiguous CBM capability.
Fix this by adding Hygon vendor ID detection to arch_supports_noncont_cat().
Signed-off-by: Xiaochen Shen shenxiaochen@open-hieco.net Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Maintainer note: Even though this is a fix it is not a candidate for backport since it is based on another patch series (x86/resctrl: Fix Platform QoS issues for Hygon) which is in process of being added to resctrl.
tools/testing/selftests/resctrl/cat_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 94cfdba5308d..59a0f80fdc5a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -290,8 +290,8 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param static bool arch_supports_noncont_cat(const struct resctrl_test *test) { - /* AMD always supports non-contiguous CBM. */ - if (get_vendor() == ARCH_AMD) + /* AMD and Hygon always supports non-contiguous CBM. */ + if (get_vendor() == ARCH_AMD || get_vendor() == ARCH_HYGON)
nit. Better to avoid call get_vendor() twice (or even more in the future)?
Are you perhaps referring to detect_vendor()? detect_vendor() does the actual digging to determine the vendor ID and is indeed called just once by get_vendor(). In subsequent calls get_vendor() just returns the static ID.
There is still cost to call get_vendor() (call, push, cmp, pop, ret, etc) in subsequent calls. I just feel it's redundant to call it multiple times in just one sentence.
Thanks.
-Fenghua
linux-kselftest-mirror@lists.linaro.org