Hi all:
The core frequency is subjected to the process variation in semiconductors. Not all cores are able to reach the maximum frequency respecting the infrastructure limits. Consequently, AMD has redefined the concept of maximum frequency of a part. This means that a fraction of cores can reach maximum frequency. To find the best process scheduling policy for a given scenario, OS needs to know the core ordering informed by the platform through highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static core ranking and targeted performance. Now it has the ability to dynamically change the preferred core based on the workload and platform conditions and accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. Amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time. It relies on the CPPC interface to communicate the core ranking to the operating system and scheduler to make sure that OS is choosing the cores with highest performance firstly for scheduling the process. When amd-pstate driver receives a message with the highest performance change, it will update the core ranking.
Changes form V4->V5: - cpufreq: amd-pstate: - - modify sysfs attribute for CPPC highest perf. - - modify warning about comments - - rebase linux-next - cpufreq: - - Moidfy warning about function declarations. - Documentation: amd-pstate: - - align with ``amd-pstat``
Changes form V3->V4: - Documentation: amd-pstate: - - Modify inappropriate descriptions.
Changes form V2->V3: - x86: - - Modify kconfig and description. - cpufreq: amd-pstate: - - Add Co-developed-by tag in commit message. - cpufreq: - - Modify commit message. - Documentation: amd-pstate: - - Modify inappropriate descriptions.
Changes form V1->V2: - acpi: cppc: - - Add reference link. - cpufreq: - - Moidfy link error. - cpufreq: amd-pstate: - - Init the priorities of all online CPUs - - Use a single variable to represent the status of preferred core. - Documentation: - - Default enabled preferred core. - Documentation: amd-pstate: - - Modify inappropriate descriptions. - - Default enabled preferred core. - - Use a single variable to represent the status of preferred core.
Meng Li (7): x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion. acpi: cppc: Add get the highest performance cppc control cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. cpufreq: Add a notification message that the highest perf has changed cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically Documentation: amd-pstate: introduce amd-pstate preferred core Documentation: introduce amd-pstate preferrd core mode kernel command line options
.../admin-guide/kernel-parameters.txt | 5 + Documentation/admin-guide/pm/amd-pstate.rst | 68 ++++++- arch/x86/Kconfig | 5 +- drivers/acpi/cppc_acpi.c | 13 ++ drivers/acpi/processor_driver.c | 6 + drivers/cpufreq/amd-pstate.c | 167 ++++++++++++++++-- drivers/cpufreq/cpufreq.c | 13 ++ include/acpi/cppc_acpi.h | 5 + include/linux/amd-pstate.h | 11 ++ include/linux/cpufreq.h | 5 + 10 files changed, 277 insertions(+), 21 deletions(-)
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Signed-off-by: Meng Li li.meng@amd.com --- arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8d9e4b362572..887421b5ee8f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1052,8 +1052,9 @@ config SCHED_MC
config SCHED_MC_PRIO bool "CPU core priorities scheduler support" - depends on SCHED_MC && CPU_SUP_INTEL - select X86_INTEL_PSTATE + depends on SCHED_MC + select X86_INTEL_PSTATE if CPU_SUP_INTEL + select X86_AMD_PSTATE if CPU_SUP_AMD select CPU_FREQ default y help
On 9/4/2023 20:51, Meng Li wrote:
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Signed-off-by: Meng Li li.meng@amd.com
Reviewed-by: Mario Limonciello mario.limonciello@amd.com
arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8d9e4b362572..887421b5ee8f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1052,8 +1052,9 @@ config SCHED_MC config SCHED_MC_PRIO bool "CPU core priorities scheduler support"
- depends on SCHED_MC && CPU_SUP_INTEL
- select X86_INTEL_PSTATE
- depends on SCHED_MC
- select X86_INTEL_PSTATE if CPU_SUP_INTEL
- select X86_AMD_PSTATE if CPU_SUP_AMD select CPU_FREQ default y help
On Tue, Sep 05, 2023 at 09:51:10AM +0800, Meng, Li (Jassmine) wrote:
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Signed-off-by: Meng Li li.meng@amd.com
Acked-by: Huang Rui ray.huang@amd.com
arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8d9e4b362572..887421b5ee8f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1052,8 +1052,9 @@ config SCHED_MC config SCHED_MC_PRIO bool "CPU core priorities scheduler support"
- depends on SCHED_MC && CPU_SUP_INTEL
- select X86_INTEL_PSTATE
- depends on SCHED_MC
- select X86_INTEL_PSTATE if CPU_SUP_INTEL
- select X86_AMD_PSTATE if CPU_SUP_AMD select CPU_FREQ default y help
-- 2.34.1
Add support for getting the highest performance to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values.
Please refer to the ACPI_Spec for details on continuous performance control of CPPC.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?... --- drivers/acpi/cppc_acpi.c | 13 +++++++++++++ include/acpi/cppc_acpi.h | 5 +++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 7ff269a78c20..ad388a0e8484 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1154,6 +1154,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); }
+/** + * cppc_get_highest_perf - Get the highest performance register value. + * @cpunum: CPU from which to get highest performance. + * @highest_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{ + return cppc_get_perf(cpunum, HIGHEST_PERF, highest_perf); +} +EXPORT_SYMBOL_GPL(cppc_get_highest_perf); + /** * cppc_get_epp_perf - Get the epp register value. * @cpunum: CPU from which to get epp preference value. diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 6126c977ece0..c0b69ffe7bdb 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -139,6 +139,7 @@ struct cppc_cpudata { #ifdef CONFIG_ACPI_CPPC_LIB extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); +extern int cppc_get_highest_perf(int cpunum, u64 *highest_perf); extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); @@ -165,6 +166,10 @@ static inline int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) { return -ENOTSUPP; } +static inline int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{ + return -ENOTSUPP; +} static inline int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs) { return -ENOTSUPP;
On Tue, Sep 05, 2023 at 09:51:11AM +0800, Meng, Li (Jassmine) wrote:
Add support for getting the highest performance to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values.
Please refer to the ACPI_Spec for details on continuous performance control of CPPC.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?...
Acked-by: Huang Rui ray.huang@amd.com
drivers/acpi/cppc_acpi.c | 13 +++++++++++++ include/acpi/cppc_acpi.h | 5 +++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 7ff269a78c20..ad388a0e8484 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1154,6 +1154,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); } +/**
- cppc_get_highest_perf - Get the highest performance register value.
- @cpunum: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{
- return cppc_get_perf(cpunum, HIGHEST_PERF, highest_perf);
+} +EXPORT_SYMBOL_GPL(cppc_get_highest_perf);
/**
- cppc_get_epp_perf - Get the epp register value.
- @cpunum: CPU from which to get epp preference value.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 6126c977ece0..c0b69ffe7bdb 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -139,6 +139,7 @@ struct cppc_cpudata { #ifdef CONFIG_ACPI_CPPC_LIB extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); +extern int cppc_get_highest_perf(int cpunum, u64 *highest_perf); extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); @@ -165,6 +166,10 @@ static inline int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) { return -ENOTSUPP; } +static inline int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{
- return -ENOTSUPP;
+} static inline int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs) { return -ENOTSUPP; -- 2.34.1
amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the system boots.
Add device attribute for hardware preferred core. It will check if the processor and power firmware support preferred core feature.
Add device attribute for preferred core. Only when hardware supports preferred core and user set `enabled` in early parameter, it can be set to enabled.
Add one new early parameter `disable` to allow user to disable the preferred core.
Signed-off-by: Perry Yuan Perry.Yuan@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Co-developed-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com --- drivers/cpufreq/amd-pstate.c | 131 ++++++++++++++++++++++++++++++----- 1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..454eb6e789e7 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h>
#include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@
#define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255
/* * TODO: We need more time to fine tune processors with shared memory solution @@ -65,6 +68,12 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled;
+/*HW preferred Core featue is supported*/ +static bool hw_prefcore = true; + +/*Preferred Core featue is supported*/ +static bool prefcore = true; + /* * AMD Energy Preference Performance (EPP) * The EPP is used in the CCLK DPM controller to drive @@ -290,23 +299,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1; - u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /* - * TODO: Introduce AMD specific power feature. - * - * CPPC entry doesn't indicate the highest performance in some ASICs. + /* For platforms that do not support the preferred core feature, the + * highest_pef may be configured with 166 or 255, to avoid max frequency + * calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as + * the default max perf. */ - highest_perf = amd_get_highest_perf(); - if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1)) - highest_perf = AMD_CPPC_HIGHEST_PERF(cap1); - - WRITE_ONCE(cpudata->highest_perf, highest_perf); + if (prefcore) + WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD); + else + WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +325,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf; - u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf(); - if (highest_perf > cppc_perf.highest_perf) - highest_perf = cppc_perf.highest_perf; - - WRITE_ONCE(cpudata->highest_perf, highest_perf); + if (prefcore) + WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD); + else + WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +681,73 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); }
+/* + * Set amd-pstate preferred core enable can't be done directly from cpufreq callbacks + * due to locking, so queue the work for later. + */ +static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{ + sched_set_itmt_support(); +} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn); + +/* + * Get the highest performance register value. + * @cpu: CPU from which to get highest performance. + * @highest_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{ + int ret; + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + u64 cap1; + + ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1); + if (ret) + return ret; + WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1)); + } else { + ret = cppc_get_highest_perf(cpu, highest_perf); + } + + return (ret); +} + +static void amd_pstate_init_prefcore(void) +{ + int cpu, ret; + u64 highest_perf; + + if (!prefcore) + return; + + for_each_online_cpu(cpu) { + ret = amd_pstate_get_highest_perf(cpu, &highest_perf); + if (ret) + break; + + sched_set_itmt_core_prio(highest_perf, cpu); + + /* check if CPPC preferred core feature is enabled*/ + if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) { + hw_prefcore = false; + prefcore = false; + return; + } + } + + /* + * This code can be run during CPU online under the + * CPU hotplug locks, so sched_set_amd_prefcore_support() + * cannot be called from here. Queue up a work item + * to invoke it. + */ + schedule_work(&sched_prefcore_work); +} + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -1037,6 +1109,18 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; }
+static ssize_t hw_prefcore_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", hw_prefcore ? "supported" : "unsupported"); +} + +static ssize_t prefcore_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", prefcore ? "enabled" : "disabled"); +} + cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -1044,6 +1128,8 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(hw_prefcore); +static DEVICE_ATTR_RO(prefcore);
static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1149,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr, + &dev_attr_prefcore.attr, NULL };
@@ -1506,6 +1593,8 @@ static int __init amd_pstate_init(void) } }
+ amd_pstate_init_prefcore(); + return ret;
global_attr_free: @@ -1527,7 +1616,17 @@ static int __init amd_pstate_param(char *str)
return amd_pstate_set_driver(mode_idx); } + +static int __init amd_prefcore_param(char *str) +{ + if (!strcmp(str, "disable")) + prefcore = false; + + return 0; +} + early_param("amd_pstate", amd_pstate_param); +early_param("amd_prefcore", amd_prefcore_param);
MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
On 9/4/2023 20:51, Meng Li wrote:
amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the system boots.
Add device attribute for hardware preferred core. It will check if the processor and power firmware support preferred core feature.
Add device attribute for preferred core. Only when hardware supports preferred core and user set `enabled` in early parameter, it can be set to enabled.
Add one new early parameter `disable` to allow user to disable the preferred core.
Signed-off-by: Perry Yuan Perry.Yuan@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Co-developed-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com
You've got the tag order wrong I believe for this. Did checkpatch not make noise?
I think it's supposed to be:
Reviewed-by: Mario Limonciello mario.limonciello@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com
drivers/cpufreq/amd-pstate.c | 131 ++++++++++++++++++++++++++++++----- 1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..454eb6e789e7 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h> #include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@ #define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255 /*
- TODO: We need more time to fine tune processors with shared memory solution
@@ -65,6 +68,12 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; +/*HW preferred Core featue is supported*/ +static bool hw_prefcore = true;
+/*Preferred Core featue is supported*/ +static bool prefcore = true;
- /*
- AMD Energy Preference Performance (EPP)
- The EPP is used in the CCLK DPM controller to drive
@@ -290,23 +299,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1;
- u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /*
* TODO: Introduce AMD specific power feature.
*
* CPPC entry doesn't indicate the highest performance in some ASICs.
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
*/* the default max perf.
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +325,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf;
- u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf();
- if (highest_perf > cppc_perf.highest_perf)
highest_perf = cppc_perf.highest_perf;
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +681,73 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); } +/*
- Set amd-pstate preferred core enable can't be done directly from cpufreq callbacks
- due to locking, so queue the work for later.
- */
+static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{
- sched_set_itmt_support();
+} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn);
+/*
- Get the highest performance register value.
- @cpu: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{
- int ret;
- if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
- } else {
ret = cppc_get_highest_perf(cpu, highest_perf);
- }
- return (ret);
+}
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (!prefcore)
return;
- for_each_online_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
/* check if CPPC preferred core feature is enabled*/
if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
hw_prefcore = false;
prefcore = false;
return;
}
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
- static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -1037,6 +1109,18 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; } +static ssize_t hw_prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", hw_prefcore ? "supported" : "unsupported");
+}
+static ssize_t prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", prefcore ? "enabled" : "disabled");
+}
- cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -1044,6 +1128,8 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(hw_prefcore); +static DEVICE_ATTR_RO(prefcore); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1149,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr,
- &dev_attr_prefcore.attr, NULL };
@@ -1506,6 +1593,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free: @@ -1527,7 +1616,17 @@ static int __init amd_pstate_param(char *str) return amd_pstate_set_driver(mode_idx); }
+static int __init amd_prefcore_param(char *str) +{
- if (!strcmp(str, "disable"))
prefcore = false;
- return 0;
+}
- early_param("amd_pstate", amd_pstate_param);
+early_param("amd_prefcore", amd_prefcore_param); MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
Hi Meng,
kernel test robot noticed the following build warnings:
[auto build test WARNING on rafael-pm/linux-next] [also build test WARNING on linus/master v6.5 next-20230905] [cannot apply to tip/x86/core] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Meng-Li/x86-Drop-CPU_SUP_INTE... base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20230905015116.2268926-4-li.meng%40amd.com patch subject: [PATCH V5 3/7] cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. config: x86_64-randconfig-r022-20230906 (https://download.01.org/0day-ci/archive/20230906/202309061049.2ag7qkvI-lkp@i...) compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230906/202309061049.2ag7qkvI-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202309061049.2ag7qkvI-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/cpufreq/amd-pstate.c:1131:8: warning: unused variable 'dev_attr_hw_prefcore' [-Wunused-variable]
static DEVICE_ATTR_RO(hw_prefcore); ^ include/linux/device.h:198:26: note: expanded from macro 'DEVICE_ATTR_RO' struct device_attribute dev_attr_##_name = __ATTR_RO(_name) ^ <scratch space>:91:1: note: expanded from here dev_attr_hw_prefcore ^ 1 warning generated.
vim +/dev_attr_hw_prefcore +1131 drivers/cpufreq/amd-pstate.c
1126 1127 cpufreq_freq_attr_ro(amd_pstate_highest_perf); 1128 cpufreq_freq_attr_rw(energy_performance_preference); 1129 cpufreq_freq_attr_ro(energy_performance_available_preferences); 1130 static DEVICE_ATTR_RW(status);
1131 static DEVICE_ATTR_RO(hw_prefcore);
1132 static DEVICE_ATTR_RO(prefcore); 1133
Hi Meng,
kernel test robot noticed the following build warnings:
[auto build test WARNING on rafael-pm/linux-next] [also build test WARNING on linus/master v6.5 next-20230906] [cannot apply to tip/x86/core] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Meng-Li/x86-Drop-CPU_SUP_INTE... base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20230905015116.2268926-4-li.meng%40amd.com patch subject: [PATCH V5 3/7] cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20230906/202309061958.4wimkcbo-lkp@i...) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230906/202309061958.4wimkcbo-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202309061958.4wimkcbo-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from include/linux/node.h:18, from include/linux/cpu.h:17, from include/linux/cpufreq.h:12, from drivers/cpufreq/amd-pstate.c:30:
include/linux/device.h:198:33: warning: 'dev_attr_hw_prefcore' defined but not used [-Wunused-variable]
198 | struct device_attribute dev_attr_##_name = __ATTR_RO(_name) | ^~~~~~~~~ drivers/cpufreq/amd-pstate.c:1131:8: note: in expansion of macro 'DEVICE_ATTR_RO' 1131 | static DEVICE_ATTR_RO(hw_prefcore); | ^~~~~~~~~~~~~~
vim +/dev_attr_hw_prefcore +198 include/linux/device.h
ca22e56debc57b4 Kay Sievers 2011-12-14 123 ca22e56debc57b4 Kay Sievers 2011-12-14 124 ssize_t device_show_ulong(struct device *dev, struct device_attribute *attr, ca22e56debc57b4 Kay Sievers 2011-12-14 125 char *buf); ca22e56debc57b4 Kay Sievers 2011-12-14 126 ssize_t device_store_ulong(struct device *dev, struct device_attribute *attr, ca22e56debc57b4 Kay Sievers 2011-12-14 127 const char *buf, size_t count); ca22e56debc57b4 Kay Sievers 2011-12-14 128 ssize_t device_show_int(struct device *dev, struct device_attribute *attr, ca22e56debc57b4 Kay Sievers 2011-12-14 129 char *buf); ca22e56debc57b4 Kay Sievers 2011-12-14 130 ssize_t device_store_int(struct device *dev, struct device_attribute *attr, ca22e56debc57b4 Kay Sievers 2011-12-14 131 const char *buf, size_t count); 91872392f08486f Borislav Petkov 2012-10-09 132 ssize_t device_show_bool(struct device *dev, struct device_attribute *attr, 91872392f08486f Borislav Petkov 2012-10-09 133 char *buf); 91872392f08486f Borislav Petkov 2012-10-09 134 ssize_t device_store_bool(struct device *dev, struct device_attribute *attr, 91872392f08486f Borislav Petkov 2012-10-09 135 const char *buf, size_t count); ca22e56debc57b4 Kay Sievers 2011-12-14 136 cd00bc2ca42705b James Seo 2023-05-08 137 /** cd00bc2ca42705b James Seo 2023-05-08 138 * DEVICE_ATTR - Define a device attribute. cd00bc2ca42705b James Seo 2023-05-08 139 * @_name: Attribute name. cd00bc2ca42705b James Seo 2023-05-08 140 * @_mode: File mode. cd00bc2ca42705b James Seo 2023-05-08 141 * @_show: Show handler. Optional, but mandatory if attribute is readable. cd00bc2ca42705b James Seo 2023-05-08 142 * @_store: Store handler. Optional, but mandatory if attribute is writable. cd00bc2ca42705b James Seo 2023-05-08 143 * cd00bc2ca42705b James Seo 2023-05-08 144 * Convenience macro for defining a struct device_attribute. cd00bc2ca42705b James Seo 2023-05-08 145 * cd00bc2ca42705b James Seo 2023-05-08 146 * For example, ``DEVICE_ATTR(foo, 0644, foo_show, foo_store);`` expands to: cd00bc2ca42705b James Seo 2023-05-08 147 * cd00bc2ca42705b James Seo 2023-05-08 148 * .. code-block:: c cd00bc2ca42705b James Seo 2023-05-08 149 * cd00bc2ca42705b James Seo 2023-05-08 150 * struct device_attribute dev_attr_foo = { cd00bc2ca42705b James Seo 2023-05-08 151 * .attr = { .name = "foo", .mode = 0644 }, cd00bc2ca42705b James Seo 2023-05-08 152 * .show = foo_show, cd00bc2ca42705b James Seo 2023-05-08 153 * .store = foo_store, cd00bc2ca42705b James Seo 2023-05-08 154 * }; cd00bc2ca42705b James Seo 2023-05-08 155 */ a7fd67062efc5b0 Kay Sievers 2005-10-01 156 #define DEVICE_ATTR(_name, _mode, _show, _store) \ a7fd67062efc5b0 Kay Sievers 2005-10-01 157 struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store) cd00bc2ca42705b James Seo 2023-05-08 158 cd00bc2ca42705b James Seo 2023-05-08 159 /** cd00bc2ca42705b James Seo 2023-05-08 160 * DEVICE_ATTR_PREALLOC - Define a preallocated device attribute. cd00bc2ca42705b James Seo 2023-05-08 161 * @_name: Attribute name. cd00bc2ca42705b James Seo 2023-05-08 162 * @_mode: File mode. cd00bc2ca42705b James Seo 2023-05-08 163 * @_show: Show handler. Optional, but mandatory if attribute is readable. cd00bc2ca42705b James Seo 2023-05-08 164 * @_store: Store handler. Optional, but mandatory if attribute is writable. cd00bc2ca42705b James Seo 2023-05-08 165 * cd00bc2ca42705b James Seo 2023-05-08 166 * Like DEVICE_ATTR(), but ``SYSFS_PREALLOC`` is set on @_mode. cd00bc2ca42705b James Seo 2023-05-08 167 */ 7fda9100bb8258b Christophe Leroy 2017-12-18 168 #define DEVICE_ATTR_PREALLOC(_name, _mode, _show, _store) \ 7fda9100bb8258b Christophe Leroy 2017-12-18 169 struct device_attribute dev_attr_##_name = \ 7fda9100bb8258b Christophe Leroy 2017-12-18 170 __ATTR_PREALLOC(_name, _mode, _show, _store) cd00bc2ca42705b James Seo 2023-05-08 171 cd00bc2ca42705b James Seo 2023-05-08 172 /** cd00bc2ca42705b James Seo 2023-05-08 173 * DEVICE_ATTR_RW - Define a read-write device attribute. cd00bc2ca42705b James Seo 2023-05-08 174 * @_name: Attribute name. cd00bc2ca42705b James Seo 2023-05-08 175 * cd00bc2ca42705b James Seo 2023-05-08 176 * Like DEVICE_ATTR(), but @_mode is 0644, @_show is <_name>_show, cd00bc2ca42705b James Seo 2023-05-08 177 * and @_store is <_name>_store. cd00bc2ca42705b James Seo 2023-05-08 178 */ ced321bf9151535 Greg Kroah-Hartman 2013-07-14 179 #define DEVICE_ATTR_RW(_name) \ ced321bf9151535 Greg Kroah-Hartman 2013-07-14 180 struct device_attribute dev_attr_##_name = __ATTR_RW(_name) cd00bc2ca42705b James Seo 2023-05-08 181 cd00bc2ca42705b James Seo 2023-05-08 182 /** cd00bc2ca42705b James Seo 2023-05-08 183 * DEVICE_ATTR_ADMIN_RW - Define an admin-only read-write device attribute. cd00bc2ca42705b James Seo 2023-05-08 184 * @_name: Attribute name. cd00bc2ca42705b James Seo 2023-05-08 185 * cd00bc2ca42705b James Seo 2023-05-08 186 * Like DEVICE_ATTR_RW(), but @_mode is 0600. cd00bc2ca42705b James Seo 2023-05-08 187 */ 3022c6a1b4b76c4 Dan Williams 2020-06-25 188 #define DEVICE_ATTR_ADMIN_RW(_name) \ 3022c6a1b4b76c4 Dan Williams 2020-06-25 189 struct device_attribute dev_attr_##_name = __ATTR_RW_MODE(_name, 0600) cd00bc2ca42705b James Seo 2023-05-08 190 cd00bc2ca42705b James Seo 2023-05-08 191 /** cd00bc2ca42705b James Seo 2023-05-08 192 * DEVICE_ATTR_RO - Define a readable device attribute. cd00bc2ca42705b James Seo 2023-05-08 193 * @_name: Attribute name. cd00bc2ca42705b James Seo 2023-05-08 194 * cd00bc2ca42705b James Seo 2023-05-08 195 * Like DEVICE_ATTR(), but @_mode is 0444 and @_show is <_name>_show. cd00bc2ca42705b James Seo 2023-05-08 196 */ ced321bf9151535 Greg Kroah-Hartman 2013-07-14 197 #define DEVICE_ATTR_RO(_name) \ ced321bf9151535 Greg Kroah-Hartman 2013-07-14 @198 struct device_attribute dev_attr_##_name = __ATTR_RO(_name) cd00bc2ca42705b James Seo 2023-05-08 199
On Tue, Sep 05, 2023 at 09:51:12AM +0800, Meng, Li (Jassmine) wrote:
amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the system boots.
Add device attribute for hardware preferred core. It will check if the processor and power firmware support preferred core feature.
Add device attribute for preferred core. Only when hardware supports preferred core and user set `enabled` in early parameter, it can be set to enabled.
Add one new early parameter `disable` to allow user to disable the preferred core.
Signed-off-by: Perry Yuan Perry.Yuan@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Co-developed-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com
drivers/cpufreq/amd-pstate.c | 131 ++++++++++++++++++++++++++++++----- 1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..454eb6e789e7 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h> #include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@ #define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255 /*
- TODO: We need more time to fine tune processors with shared memory solution
@@ -65,6 +68,12 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; +/*HW preferred Core featue is supported*/ +static bool hw_prefcore = true;
+/*Preferred Core featue is supported*/ +static bool prefcore = true;
/*
- AMD Energy Preference Performance (EPP)
- The EPP is used in the CCLK DPM controller to drive
@@ -290,23 +299,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1;
- u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /*
* TODO: Introduce AMD specific power feature.
*
* CPPC entry doesn't indicate the highest performance in some ASICs.
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
*/* the default max perf.
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +325,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf;
- u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf();
- if (highest_perf > cppc_perf.highest_perf)
highest_perf = cppc_perf.highest_perf;
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +681,73 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); } +/*
- Set amd-pstate preferred core enable can't be done directly from cpufreq callbacks
- due to locking, so queue the work for later.
- */
+static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{
- sched_set_itmt_support();
+} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn);
+/*
- Get the highest performance register value.
- @cpu: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{
- int ret;
- if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
- } else {
ret = cppc_get_highest_perf(cpu, highest_perf);
- }
- return (ret);
+}
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (!prefcore)
return;
- for_each_online_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
/* check if CPPC preferred core feature is enabled*/
if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
hw_prefcore = false;
prefcore = false;
I think you should use prefcore which embeds into cpudata structure instead of global variable. Here, actually, you walked through all online cpus, the last cpu's status will overwrite the previous one.
return;
}
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -1037,6 +1109,18 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; } +static ssize_t hw_prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", hw_prefcore ? "supported" : "unsupported");
+}
Is there any requirement from user space (cpupower or other tool) to query the capacity at runtime? In fact, we can simplify the codes that use a print in the kernel to let user know whether current cpu supports prefcore in hardware side.
Thanks, Ray
+static ssize_t prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", prefcore ? "enabled" : "disabled");
+}
cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); @@ -1044,6 +1128,8 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(hw_prefcore); +static DEVICE_ATTR_RO(prefcore); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1149,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr,
- &dev_attr_prefcore.attr, NULL
}; @@ -1506,6 +1593,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free: @@ -1527,7 +1616,17 @@ static int __init amd_pstate_param(char *str) return amd_pstate_set_driver(mode_idx); }
+static int __init amd_prefcore_param(char *str) +{
- if (!strcmp(str, "disable"))
prefcore = false;
- return 0;
+}
early_param("amd_pstate", amd_pstate_param); +early_param("amd_prefcore", amd_prefcore_param); MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); -- 2.34.1
[AMD Official Use Only - General]
Hi Ray:
-----Original Message----- From: Huang, Ray Ray.Huang@amd.com Sent: Wednesday, September 6, 2023 9:53 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; linux- pm@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org; linux- acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux- kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de Subject: Re: [PATCH V5 3/7] cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
On Tue, Sep 05, 2023 at 09:51:12AM +0800, Meng, Li (Jassmine) wrote:
amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the system boots.
Add device attribute for hardware preferred core. It will check if the processor and power firmware support preferred core feature.
Add device attribute for preferred core. Only when hardware supports preferred core and user set `enabled` in early parameter, it can be set to enabled.
Add one new early parameter `disable` to allow user to disable the preferred core.
Signed-off-by: Perry Yuan Perry.Yuan@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Co-developed-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com
drivers/cpufreq/amd-pstate.c | 131 ++++++++++++++++++++++++++++++----- 1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..454eb6e789e7
100644
--- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h>
#include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@
#define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255
/*
- TODO: We need more time to fine tune processors with shared memory
solution @@ -65,6 +68,12 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state =
AMD_PSTATE_UNDEFINED;
static bool cppc_enabled;
+/*HW preferred Core featue is supported*/ static bool hw_prefcore = +true;
+/*Preferred Core featue is supported*/ static bool prefcore = true;
/*
- AMD Energy Preference Performance (EPP)
- The EPP is used in the CCLK DPM controller to drive @@ -290,23
+299,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1;
u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
/*
- TODO: Introduce AMD specific power feature.
- CPPC entry doesn't indicate the highest performance in some
ASICs.
- /* For platforms that do not support the preferred core feature, the
- highest_pef may be configured with 166 or 255, to avoid max
frequency
- calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1)
value as
*/
- the default max perf.
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf,
AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf,
AMD_CPPC_HIGHEST_PERF(cap1));
WRITE_ONCE(cpudata->nominal_perf,
AMD_CPPC_NOMINAL_PERF(cap1));
WRITE_ONCE(cpudata->lowest_nonlinear_perf,
AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +325,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf;
u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
highest_perf = amd_get_highest_perf();
if (highest_perf > cppc_perf.highest_perf)
highest_perf = cppc_perf.highest_perf;
WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf,
AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf,
cppc_perf.highest_perf);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf,
@@ -676,6 +681,73 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); }
+/*
- Set amd-pstate preferred core enable can't be done directly from
+cpufreq callbacks
- due to locking, so queue the work for later.
- */
+static void amd_pstste_sched_prefcore_workfn(struct work_struct +*work) {
- sched_set_itmt_support();
+} +static DECLARE_WORK(sched_prefcore_work, +amd_pstste_sched_prefcore_workfn);
+/*
- Get the highest performance register value.
- @cpu: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) {
- int ret;
- if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1,
&cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf,
AMD_CPPC_HIGHEST_PERF(cap1));
- } else {
ret = cppc_get_highest_perf(cpu, highest_perf);
- }
- return (ret);
+}
+static void amd_pstate_init_prefcore(void) {
- int cpu, ret;
- u64 highest_perf;
- if (!prefcore)
return;
- for_each_online_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
/* check if CPPC preferred core feature is enabled*/
if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
hw_prefcore = false;
prefcore = false;
I think you should use prefcore which embeds into cpudata structure instead of global variable. Here, actually, you walked through all online cpus, the last cpu's status will overwrite the previous one.
[Meng, Li (Jassmine)] The variable "prefcore" is an early kernel param. User can set it status to enabled or disabled. I think it cannot be embedded into "cpudata" structure.
return;
}
- }
- /*
- This code can be run during CPU online under the
- CPU hotplug locks, so sched_set_amd_prefcore_support()
- cannot be called from here. Queue up a work item
- to invoke it.
- */
- schedule_work(&sched_prefcore_work);
+}
static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@
-1037,6 +1109,18 @@ static ssize_t status_store(struct device *a, struct
device_attribute *b,
return ret < 0 ? ret : count;
}
+static ssize_t hw_prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf) {
- return sysfs_emit(buf, "%s\n", hw_prefcore ? "supported" :
+"unsupported"); }
Is there any requirement from user space (cpupower or other tool) to query the capacity at runtime? In fact, we can simplify the codes that use a print in the kernel to let user know whether current cpu supports prefcore in hardware side.
Thanks, Ray
[Meng, Li (Jassmine)] I will modify it to pr_debug() message.
+static ssize_t prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf) {
- return sysfs_emit(buf, "%s\n", prefcore ? "enabled" : "disabled"); }
cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -1044,6 +1128,8 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(hw_prefcore); +static DEVICE_ATTR_RO(prefcore);
static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1149,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr,
- &dev_attr_prefcore.attr, NULL
};
@@ -1506,6 +1593,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free: @@ -1527,7 +1616,17 @@ static int __init amd_pstate_param(char *str)
return amd_pstate_set_driver(mode_idx); }
+static int __init amd_prefcore_param(char *str) {
- if (!strcmp(str, "disable"))
prefcore = false;
- return 0;
+}
early_param("amd_pstate", amd_pstate_param); +early_param("amd_prefcore", amd_prefcore_param);
MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); -- 2.34.1
On 05 Sep 09:51, Meng Li wrote:
amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the system boots.
Add device attribute for hardware preferred core. It will check if the processor and power firmware support preferred core feature.
Add device attribute for preferred core. Only when hardware supports preferred core and user set `enabled` in early parameter, it can be set to enabled.
Add one new early parameter `disable` to allow user to disable the preferred core.
Signed-off-by: Perry Yuan Perry.Yuan@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Co-developed-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com
drivers/cpufreq/amd-pstate.c | 131 ++++++++++++++++++++++++++++++----- 1 file changed, 115 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..454eb6e789e7 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h> #include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@ #define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255 /*
- TODO: We need more time to fine tune processors with shared memory solution
@@ -65,6 +68,12 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; +/*HW preferred Core featue is supported*/ +static bool hw_prefcore = true;
+/*Preferred Core featue is supported*/ +static bool prefcore = true;
/*
- AMD Energy Preference Performance (EPP)
- The EPP is used in the CCLK DPM controller to drive
@@ -290,23 +299,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1;
- u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /*
* TODO: Introduce AMD specific power feature.
*
* CPPC entry doesn't indicate the highest performance in some ASICs.
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
*/* the default max perf.
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
As mentioned in the v3, this should be checking hw_prefcore.
For example: Let's consider a cap1 value for a CPU is: e752380d and the system can go to freq 3.5 GHz. The system supports preferred core but user added prefcore=disable command line option.
Therefore below will be the perf values after amd_pstate init:
lowest_perf: 13 lowest_nonlinear_perf: 56 nominal_perf: 82 highest_perf: 231
Let's say user selected userspace governor and tries to set frequency 2.8 Ghz. Then des_perf calculation would be as below:
des_perf = 2800000 * 231 / 3500000 = 184
Which is wrong. Because HW only allows des_perf upto 166 (because HW supports preferred core).
If we restrict highest_perf to 166 in preferred core supported (by HW) system, even when we disable preferred core from SW, then the des_perf calculation stays correct.
des_perf = 2800000 * 166 / 3500000 = 132
Therefore when user disables preferred core with prefcore=disable command line in a preferred core supported system, the highest_perf should be AMD_PSTATE_PREFCORE_THRESHOLD (166).
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +325,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf;
- u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf();
- if (highest_perf > cppc_perf.highest_perf)
highest_perf = cppc_perf.highest_perf;
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (prefcore)
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
- else
WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
Same here.
Thanks, Wyes
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +681,73 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); } +/*
- Set amd-pstate preferred core enable can't be done directly from cpufreq callbacks
- due to locking, so queue the work for later.
- */
+static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{
- sched_set_itmt_support();
+} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn);
+/*
- Get the highest performance register value.
- @cpu: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{
- int ret;
- if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
- } else {
ret = cppc_get_highest_perf(cpu, highest_perf);
- }
- return (ret);
+}
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (!prefcore)
return;
- for_each_online_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
/* check if CPPC preferred core feature is enabled*/
if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
hw_prefcore = false;
prefcore = false;
return;
}
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -1037,6 +1109,18 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; } +static ssize_t hw_prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", hw_prefcore ? "supported" : "unsupported");
+}
+static ssize_t prefcore_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", prefcore ? "enabled" : "disabled");
+}
cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); @@ -1044,6 +1128,8 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(hw_prefcore); +static DEVICE_ATTR_RO(prefcore); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1149,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr,
- &dev_attr_prefcore.attr, NULL
}; @@ -1506,6 +1593,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free: @@ -1527,7 +1616,17 @@ static int __init amd_pstate_param(char *str) return amd_pstate_set_driver(mode_idx); }
+static int __init amd_prefcore_param(char *str) +{
- if (!strcmp(str, "disable"))
prefcore = false;
- return 0;
+}
early_param("amd_pstate", amd_pstate_param); +early_param("amd_prefcore", amd_prefcore_param); MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); -- 2.34.1
On Tue, Sep 05, 2023 at 09:51:12AM +0800, Meng Li wrote:
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
There is no such function... ?
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
On Tue, Sep 05, 2023 at 09:51:12AM +0800, Meng Li wrote:
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (!prefcore)
return;
- for_each_online_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
/* check if CPPC preferred core feature is enabled*/
if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
hw_prefcore = false;
prefcore = false;
return;
}
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
@@ -1506,6 +1593,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free:
I'm confused,... you call amd_pstate_init_prefcore() at device_initcall(). Once per boot.
Then it iterates all online CPUs..
But what if you boot with some CPUs offline and bring then online later?
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance register. Add support for this event.
Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?... --- drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break; + case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED: + cpufreq_update_highest_perf(pr->id); + acpi_bus_generate_netlink_event(device->pnp.device_class, + dev_name(&device->dev), event, 0); + break; default: acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event); break; diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 60ed89000e82..4ada787ff105 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2718,6 +2718,19 @@ void cpufreq_update_limits(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_update_limits);
+/** + * cpufreq_update_highest_perf - Update highest performance for a given CPU. + * @cpu: CPU to update the highest performance for. + * + * Invoke the driver's ->update_highest_perf callback if present + */ +void cpufreq_update_highest_perf(unsigned int cpu) +{ + if (cpufreq_driver->update_highest_perf) + cpufreq_driver->update_highest_perf(cpu); +} +EXPORT_SYMBOL_GPL(cpufreq_update_highest_perf); + /********************************************************************* * BOOST * *********************************************************************/ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 71d186d6933a..1cc1241fb698 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -235,6 +235,7 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu); void refresh_frequency_limits(struct cpufreq_policy *policy); void cpufreq_update_policy(unsigned int cpu); void cpufreq_update_limits(unsigned int cpu); +void cpufreq_update_highest_perf(unsigned int cpu); bool have_governor_per_policy(void); bool cpufreq_supports_freq_invariance(void); struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy); @@ -263,6 +264,7 @@ static inline bool cpufreq_supports_freq_invariance(void) return false; } static inline void disable_cpufreq(void) { } +static inline void cpufreq_update_highest_perf(unsigned int cpu) { } #endif
#ifdef CONFIG_CPU_FREQ_STAT @@ -380,6 +382,9 @@ struct cpufreq_driver { /* Called to update policy limits on firmware notifications. */ void (*update_limits)(unsigned int cpu);
+ /* Called to update highest performance on firmware notifications. */ + void (*update_highest_perf)(unsigned int cpu); + /* optional */ int (*bios_limit)(int cpu, unsigned int *limit);
On Tue, Sep 05, 2023 at 09:51:13AM +0800, Meng Li wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance register. Add support for this event.
Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?...
Does uefi.org guarantee this is a stable link?
On Tue, Sep 05, 2023 at 09:51:13AM +0800, Meng Li wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance register. Add support for this event.
Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?...
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85 MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
- case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
default: acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event); break;break;
I've obviously not read the link, but the above seems to suggest that every CPU that has its limits changed gets the 'interrupt' ?
[AMD Official Use Only - General]
Hi Peter:
-----Original Message----- From: Peter Zijlstra peterz@infradead.org Sent: Friday, September 8, 2023 9:24 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de Subject: Re: [PATCH V5 4/7] cpufreq: Add a notification message that the highest perf has changed
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Tue, Sep 05, 2023 at 09:51:13AM +0800, Meng Li wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance register. Add support for this event.
Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control .html?highlight=cppc#cpc-continuous-performance-control
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void
*data)
acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break; default: acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event); break;
I've obviously not read the link, but the above seems to suggest that every CPU that has its limits changed gets the 'interrupt' ?
[Meng, Li (Jassmine)] Yes. I will modify the link to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#proc...
Preferred core rankings can be changed dynamically by the platform based on the workload and platform conditions and accounting for thermals and aging. When this occurs, cpu priority need to be set.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com --- drivers/cpufreq/amd-pstate.c | 36 ++++++++++++++++++++++++++++++++++-- include/linux/amd-pstate.h | 11 +++++++++++ 2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 454eb6e789e7..8c19e1d50d29 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -318,6 +318,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1)); + WRITE_ONCE(cpudata->cppc_highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
return 0; } @@ -339,6 +340,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->lowest_nonlinear_perf, cppc_perf.lowest_nonlinear_perf); WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); + WRITE_ONCE(cpudata->cppc_highest_perf, cppc_perf.highest_perf);
if (cppc_state == AMD_PSTATE_ACTIVE) return 0; @@ -545,7 +547,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu, if (target_perf < capacity) des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity);
- min_perf = READ_ONCE(cpudata->highest_perf); + min_perf = READ_ONCE(cpudata->lowest_perf); if (_min_perf < capacity) min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity);
@@ -748,6 +750,34 @@ static void amd_pstate_init_prefcore(void) schedule_work(&sched_prefcore_work); }
+static void amd_pstate_update_highest_perf(unsigned int cpu) +{ + struct cpufreq_policy *policy; + struct amd_cpudata *cpudata; + u32 prev_high = 0, cur_high = 0; + u64 highest_perf; + int ret; + + if (!prefcore) + return; + + ret = amd_pstate_get_highest_perf(cpu, &highest_perf); + if (ret) + return; + + policy = cpufreq_cpu_get(cpu); + cpudata = policy->driver_data; + cur_high = highest_perf; + prev_high = READ_ONCE(cpudata->cppc_highest_perf); + + if (prev_high != cur_high) { + WRITE_ONCE(cpudata->cppc_highest_perf, cur_high); + sched_set_itmt_core_prio(cur_high, cpu); + } + + cpufreq_cpu_put(policy); +} + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -912,7 +942,7 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, u32 perf; struct amd_cpudata *cpudata = policy->driver_data;
- perf = READ_ONCE(cpudata->highest_perf); + perf = READ_ONCE(cpudata->cppc_highest_perf);
return sysfs_emit(buf, "%u\n", perf); } @@ -1479,6 +1509,7 @@ static struct cpufreq_driver amd_pstate_driver = { .suspend = amd_pstate_cpu_suspend, .resume = amd_pstate_cpu_resume, .set_boost = amd_pstate_set_boost, + .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate", .attr = amd_pstate_attr, }; @@ -1493,6 +1524,7 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .online = amd_pstate_epp_cpu_online, .suspend = amd_pstate_epp_suspend, .resume = amd_pstate_epp_resume, + .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate-epp", .attr = amd_pstate_epp_attr, }; diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 446394f84606..2159fd5693fe 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -31,6 +31,11 @@ struct amd_aperf_mperf { u64 mperf; u64 tsc; }; + /* For platforms that do not support the preferred core feature, the + * highest_pef may be configured with 166 or 255, to avoid max frequency + * calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as + * the default max perf. + */
/** * struct amd_cpudata - private CPU data for AMD P-State @@ -39,11 +44,16 @@ struct amd_aperf_mperf { * @cppc_req_cached: cached performance request hints * @highest_perf: the maximum performance an individual processor may reach, * assuming ideal conditions + * For platforms that do not support the preferred core feature, the + * highest_pef may be configured with 166 or 255, to avoid max frequency + * calculated wrongly. we take the fixed value as the highest_perf. * @nominal_perf: the maximum sustained performance level of the processor, * assuming ideal operating conditions * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power * savings are achieved * @lowest_perf: the absolute lowest performance level of the processor + * @cppc_highest_perf: the maximum performance an individual processor may reach, + * assuming ideal conditions * @max_freq: the frequency that mapped to highest_perf * @min_freq: the frequency that mapped to lowest_perf * @nominal_freq: the frequency that mapped to nominal_perf @@ -70,6 +80,7 @@ struct amd_cpudata { u32 nominal_perf; u32 lowest_nonlinear_perf; u32 lowest_perf; + u32 cppc_highest_perf;
u32 max_freq; u32 min_freq;
On Tue, Sep 05, 2023 at 09:51:14AM +0800, Meng, Li (Jassmine) wrote:
Preferred core rankings can be changed dynamically by the platform based on the workload and platform conditions and accounting for thermals and aging. When this occurs, cpu priority need to be set.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com
drivers/cpufreq/amd-pstate.c | 36 ++++++++++++++++++++++++++++++++++-- include/linux/amd-pstate.h | 11 +++++++++++ 2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 454eb6e789e7..8c19e1d50d29 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -318,6 +318,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1));
- WRITE_ONCE(cpudata->cppc_highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
return 0; } @@ -339,6 +340,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->lowest_nonlinear_perf, cppc_perf.lowest_nonlinear_perf); WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
- WRITE_ONCE(cpudata->cppc_highest_perf, cppc_perf.highest_perf);
if (cppc_state == AMD_PSTATE_ACTIVE) return 0; @@ -545,7 +547,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu, if (target_perf < capacity) des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity);
- min_perf = READ_ONCE(cpudata->highest_perf);
- min_perf = READ_ONCE(cpudata->lowest_perf); if (_min_perf < capacity) min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity);
@@ -748,6 +750,34 @@ static void amd_pstate_init_prefcore(void) schedule_work(&sched_prefcore_work); } +static void amd_pstate_update_highest_perf(unsigned int cpu) +{
- struct cpufreq_policy *policy;
- struct amd_cpudata *cpudata;
- u32 prev_high = 0, cur_high = 0;
- u64 highest_perf;
- int ret;
- if (!prefcore)
return;
- ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
- if (ret)
return;
- policy = cpufreq_cpu_get(cpu);
- cpudata = policy->driver_data;
- cur_high = highest_perf;
- prev_high = READ_ONCE(cpudata->cppc_highest_perf);
- if (prev_high != cur_high) {
WRITE_ONCE(cpudata->cppc_highest_perf, cur_high);
sched_set_itmt_core_prio(cur_high, cpu);
- }
- cpufreq_cpu_put(policy);
+}
static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -912,7 +942,7 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, u32 perf; struct amd_cpudata *cpudata = policy->driver_data;
- perf = READ_ONCE(cpudata->highest_perf);
- perf = READ_ONCE(cpudata->cppc_highest_perf);
return sysfs_emit(buf, "%u\n", perf); } @@ -1479,6 +1509,7 @@ static struct cpufreq_driver amd_pstate_driver = { .suspend = amd_pstate_cpu_suspend, .resume = amd_pstate_cpu_resume, .set_boost = amd_pstate_set_boost,
- .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate", .attr = amd_pstate_attr,
}; @@ -1493,6 +1524,7 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .online = amd_pstate_epp_cpu_online, .suspend = amd_pstate_epp_suspend, .resume = amd_pstate_epp_resume,
- .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate-epp", .attr = amd_pstate_epp_attr,
}; diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 446394f84606..2159fd5693fe 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -31,6 +31,11 @@ struct amd_aperf_mperf { u64 mperf; u64 tsc; };
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
* the default max perf.
*/
This seems a duplicate commments?
Thanks, Ray
/**
- struct amd_cpudata - private CPU data for AMD P-State
@@ -39,11 +44,16 @@ struct amd_aperf_mperf {
- @cppc_req_cached: cached performance request hints
- @highest_perf: the maximum performance an individual processor may reach,
assuming ideal conditions
For platforms that do not support the preferred core feature, the
highest_pef may be configured with 166 or 255, to avoid max frequency
calculated wrongly. we take the fixed value as the highest_perf.
- @nominal_perf: the maximum sustained performance level of the processor,
assuming ideal operating conditions
- @lowest_nonlinear_perf: the lowest performance level at which nonlinear power
savings are achieved
- @lowest_perf: the absolute lowest performance level of the processor
- @cppc_highest_perf: the maximum performance an individual processor may reach,
assuming ideal conditions
- @max_freq: the frequency that mapped to highest_perf
- @min_freq: the frequency that mapped to lowest_perf
- @nominal_freq: the frequency that mapped to nominal_perf
@@ -70,6 +80,7 @@ struct amd_cpudata { u32 nominal_perf; u32 lowest_nonlinear_perf; u32 lowest_perf;
- u32 cppc_highest_perf;
u32 max_freq; u32 min_freq; -- 2.34.1
Hi Meng Li,
On 05 Sep 09:51, Meng Li wrote:
Preferred core rankings can be changed dynamically by the platform based on the workload and platform conditions and accounting for thermals and aging. When this occurs, cpu priority need to be set.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com
drivers/cpufreq/amd-pstate.c | 36 ++++++++++++++++++++++++++++++++++-- include/linux/amd-pstate.h | 11 +++++++++++ 2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 454eb6e789e7..8c19e1d50d29 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -318,6 +318,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1));
- WRITE_ONCE(cpudata->cppc_highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
Is there any reason to change this variable name form `prefcore_highest_perf`(in v3) to `cppc_highest_perf`? I feel `cppc_highest_perf` is bit confusing as there is already `highest_perf` variable present. How about something like `prefcore_ranking` variable name?
Thanks, Wyes
return 0; } @@ -339,6 +340,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->lowest_nonlinear_perf, cppc_perf.lowest_nonlinear_perf); WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
- WRITE_ONCE(cpudata->cppc_highest_perf, cppc_perf.highest_perf);
if (cppc_state == AMD_PSTATE_ACTIVE) return 0; @@ -545,7 +547,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu, if (target_perf < capacity) des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity);
- min_perf = READ_ONCE(cpudata->highest_perf);
- min_perf = READ_ONCE(cpudata->lowest_perf); if (_min_perf < capacity) min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity);
@@ -748,6 +750,34 @@ static void amd_pstate_init_prefcore(void) schedule_work(&sched_prefcore_work); } +static void amd_pstate_update_highest_perf(unsigned int cpu) +{
- struct cpufreq_policy *policy;
- struct amd_cpudata *cpudata;
- u32 prev_high = 0, cur_high = 0;
- u64 highest_perf;
- int ret;
- if (!prefcore)
return;
- ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
- if (ret)
return;
- policy = cpufreq_cpu_get(cpu);
- cpudata = policy->driver_data;
- cur_high = highest_perf;
- prev_high = READ_ONCE(cpudata->cppc_highest_perf);
- if (prev_high != cur_high) {
WRITE_ONCE(cpudata->cppc_highest_perf, cur_high);
sched_set_itmt_core_prio(cur_high, cpu);
- }
- cpufreq_cpu_put(policy);
+}
static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -912,7 +942,7 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, u32 perf; struct amd_cpudata *cpudata = policy->driver_data;
- perf = READ_ONCE(cpudata->highest_perf);
- perf = READ_ONCE(cpudata->cppc_highest_perf);
return sysfs_emit(buf, "%u\n", perf); } @@ -1479,6 +1509,7 @@ static struct cpufreq_driver amd_pstate_driver = { .suspend = amd_pstate_cpu_suspend, .resume = amd_pstate_cpu_resume, .set_boost = amd_pstate_set_boost,
- .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate", .attr = amd_pstate_attr,
}; @@ -1493,6 +1524,7 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .online = amd_pstate_epp_cpu_online, .suspend = amd_pstate_epp_suspend, .resume = amd_pstate_epp_resume,
- .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate-epp", .attr = amd_pstate_epp_attr,
}; diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 446394f84606..2159fd5693fe 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -31,6 +31,11 @@ struct amd_aperf_mperf { u64 mperf; u64 tsc; };
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
* the default max perf.
*/
/**
- struct amd_cpudata - private CPU data for AMD P-State
@@ -39,11 +44,16 @@ struct amd_aperf_mperf {
- @cppc_req_cached: cached performance request hints
- @highest_perf: the maximum performance an individual processor may reach,
assuming ideal conditions
For platforms that do not support the preferred core feature, the
highest_pef may be configured with 166 or 255, to avoid max frequency
calculated wrongly. we take the fixed value as the highest_perf.
- @nominal_perf: the maximum sustained performance level of the processor,
assuming ideal operating conditions
- @lowest_nonlinear_perf: the lowest performance level at which nonlinear power
savings are achieved
- @lowest_perf: the absolute lowest performance level of the processor
- @cppc_highest_perf: the maximum performance an individual processor may reach,
assuming ideal conditions
- @max_freq: the frequency that mapped to highest_perf
- @min_freq: the frequency that mapped to lowest_perf
- @nominal_freq: the frequency that mapped to nominal_perf
@@ -70,6 +80,7 @@ struct amd_cpudata { u32 nominal_perf; u32 lowest_nonlinear_perf; u32 lowest_perf;
- u32 cppc_highest_perf;
u32 max_freq; u32 min_freq; -- 2.34.1
On Tue, Sep 05, 2023 at 09:51:14AM +0800, Meng Li wrote:
+static void amd_pstate_update_highest_perf(unsigned int cpu) +{
- struct cpufreq_policy *policy;
- struct amd_cpudata *cpudata;
- u32 prev_high = 0, cur_high = 0;
- u64 highest_perf;
- int ret;
- if (!prefcore)
return;
- ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
- if (ret)
return;
- policy = cpufreq_cpu_get(cpu);
- cpudata = policy->driver_data;
- cur_high = highest_perf;
- prev_high = READ_ONCE(cpudata->cppc_highest_perf);
- if (prev_high != cur_high) {
WRITE_ONCE(cpudata->cppc_highest_perf, cur_high);
sched_set_itmt_core_prio(cur_high, cpu);
I just noticed, your cur_high is explicitly 'u32', but sched_set_itmt_core_prio() and the rest of the scheduler use 'int' (aka s32). If you somehow get that top bit set things might not work out as expected.
Please double check.
- }
- cpufreq_cpu_put(policy);
+}
On Tue, Sep 05, 2023 at 09:51:14AM +0800, Meng Li wrote:
diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 446394f84606..2159fd5693fe 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -31,6 +31,11 @@ struct amd_aperf_mperf { u64 mperf; u64 tsc; };
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
* the default max perf.
*/
Invalid comment style, also seems randomly (mis)placed.
Introduce amd-pstate preferred core.
check hardware preferred core state: $ cat /sys/devices/system/cpu/amd-pstate/hw_prefcore check preferred core state: $ cat /sys/devices/system/cpu/amd-pstate/prefcore
Signed-off-by: Meng Li li.meng@amd.com --- Documentation/admin-guide/pm/amd-pstate.rst | 68 ++++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 1cf40f69278c..204f5d49d47d 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -300,7 +300,7 @@ platforms. The AMD P-States mechanism is the more performance and energy efficiency frequency management method on AMD processors.
-AMD Pstate Driver Operation Modes +``amd-pstate`` Driver Operation Modes =================================
``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode, @@ -353,6 +353,48 @@ is activated. In this mode, driver requests minimum and maximum performance level and the platform autonomously selects a performance level in this range and appropriate to the current workload.
+``amd-pstate`` Preferred Core +================================= + +The core frequency is subjected to the process variation in semiconductors. +Not all cores are able to reach the maximum frequency respecting the +infrastructure limits. Consequently, AMD has redefined the concept of +maximum frequency of a part. This means that a fraction of cores can reach +maximum frequency. To find the best process scheduling policy for a given +scenario, OS needs to know the core ordering informed by the platform through +highest performance capability register of the CPPC interface. + +``amd-pstate`` preferred core enables the scheduler to prefer scheduling on +cores that can achieve a higher frequency with lower voltage. The preferred +core rankings can dynamically change based on the workload, platform conditions, +thermals and ageing. + +The priority metric will be initialized by the ``amd-pstate`` driver. The ``amd-pstate`` +driver will also determine whether or not ``amd-pstate`` preferred core is +supported by the platform. + +``amd-pstate`` driver will provide an initial core ordering when the system boots. +The platform uses the CPPC interfaces to communicate the core ranking to the +operating system and scheduler to make sure that OS is choosing the cores +with highest performance firstly for scheduling the process. When ``amd-pstate`` +driver receives a message with the highest performance change, it will +update the core ranking and set the cpu's priority. + +``amd-pstate`` Preferred Core Switch +================================= +Kernel Parameters +----------------- + +``amd-pstate`` peferred core`` has two states: enable and disable. +Enable/disable states can be chosen by different kernel parameters. +Default enable ``amd-pstate`` preferred core. + +``amd_prefcore=disable`` + +for systems that support ``amd-pstate`` preferred core, the core rankings will +always be advertised by the platform. But OS can choose to ignore that via the +kernel parameter ``amd_prefcore=disable``. + User Space Interface in ``sysfs`` - General ===========================================
@@ -385,6 +427,30 @@ control its functionality at the system level. They are located in the to the operation mode represented by that string - or to be unregistered in the "disable" case.
+``hw_prefcore`` + Preferred core state of hardware: "supported" or "unsupported". + + "supported" + The processor and power firmware support preferred core. + + "unsupported" + The processor and power firmware don't support preferred core. + + + This attribute is read-only to check the state of hardware preferred core. + +``prefcore`` + Preferred core state of the driver: "enabled" or "disabled". + + "enabled" + Enable the ``amd-pstate`` preferred core. + + "disabled" + Disable the ``amd-pstate`` preferred core + + + This attribute is read-only to check the state of preferred core. + ``cpupower`` tool support for ``amd-pstate`` ===============================================
Hi Meng,
kernel test robot noticed the following build warnings:
[auto build test WARNING on rafael-pm/linux-next] [also build test WARNING on linus/master v6.5 next-20230906] [cannot apply to tip/x86/core] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Meng-Li/x86-Drop-CPU_SUP_INTE... base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20230905015116.2268926-7-li.meng%40amd.com patch subject: [PATCH V5 6/7] Documentation: amd-pstate: introduce amd-pstate preferred core reproduce: (https://download.01.org/0day-ci/archive/20230907/202309070502.YxzVpYTO-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202309070502.YxzVpYTO-lkp@intel.com/
All warnings (new ones prefixed by >>):
Documentation/admin-guide/pm/amd-pstate.rst:304: WARNING: Title underline too short.
vim +304 Documentation/admin-guide/pm/amd-pstate.rst
c22760885fd6f7 Huang Rui 2021-12-24 301 92e6088427c5da Perry Yuan 2023-01-31 302 3a7b575560efa6 Meng Li 2023-09-05 303 ``amd-pstate`` Driver Operation Modes 92e6088427c5da Perry Yuan 2023-01-31 @304 ================================= 92e6088427c5da Perry Yuan 2023-01-31 305
amd-pstate driver support enable/disable preferred core. Default enabled on platforms supporting amd-pstate preferred core. Disable amd-pstate preferred core with "amd_prefcore=disable" added to the kernel command line.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com --- Documentation/admin-guide/kernel-parameters.txt | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0c38a8af95ce..3145782b3c00 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -363,6 +363,11 @@ selects a performance level in this range and appropriate to the current workload.
+ amd_prefcore= + [X86] + disable + Disable amd-pstate preferred core. + amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT Format: <a>,<b>
linux-kselftest-mirror@lists.linaro.org