Hi all:
The core frequency is subjected to the process variation in semiconductors. Not all cores are able to reach the maximum frequency respecting the infrastructure limits. Consequently, AMD has redefined the concept of maximum frequency of a part. This means that a fraction of cores can reach maximum frequency. To find the best process scheduling policy for a given scenario, OS needs to know the core ordering informed by the platform through highest performance capability register of the CPPC interface.
Earlier implementations of AMD Pstate Preferred Core only support a static core ranking and targeted performance. Now it has the ability to dynamically change the preferred core based on the workload and platform conditions and accounting for thermals and aging.
AMD Pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it AMD Pstate Preferrred Core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. AMD Pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
AMD Pstate driver will provide an initial core ordering at boot time. It relies on the CPPC interface to communicate the core ranking to the operating system and scheduler to make sure that OS is choosing the cores with highest performance firstly for scheduling the process. When AMD Pstate driver receives a message with the highest performance change, it will update the core ranking.
Meng Li (6): ACPI: CPPC: Add get the highest performance cppc control cpufreq: amd-pstate: Enable AMD Pstate Preferred Core Supporting. cpufreq: Add a notification message that the highest perf has changed cpufreq: amd-pstate: Update AMD Pstate Preferred Core ranking dynamically Documentation: amd-pstate: introduce AMD Pstate Preferred Core Documentation: introduce AMD Pstate Preferrd Core mode kernel command line options
.../admin-guide/kernel-parameters.txt | 5 + Documentation/admin-guide/pm/amd-pstate.rst | 55 ++++++ drivers/acpi/cppc_acpi.c | 13 ++ drivers/acpi/processor_driver.c | 6 + drivers/cpufreq/amd-pstate.c | 181 ++++++++++++++++-- drivers/cpufreq/cpufreq.c | 13 ++ include/acpi/cppc_acpi.h | 5 + include/linux/amd-pstate.h | 1 + include/linux/cpufreq.h | 4 + 9 files changed, 267 insertions(+), 16 deletions(-)
Add support for getting the highest performance to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values.
Signed-off-by: Meng Li li.meng@amd.com --- drivers/acpi/cppc_acpi.c | 13 +++++++++++++ include/acpi/cppc_acpi.h | 5 +++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 7ff269a78c20..ad388a0e8484 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1154,6 +1154,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); }
+/** + * cppc_get_highest_perf - Get the highest performance register value. + * @cpunum: CPU from which to get highest performance. + * @highest_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{ + return cppc_get_perf(cpunum, HIGHEST_PERF, highest_perf); +} +EXPORT_SYMBOL_GPL(cppc_get_highest_perf); + /** * cppc_get_epp_perf - Get the epp register value. * @cpunum: CPU from which to get epp preference value. diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 6126c977ece0..c0b69ffe7bdb 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -139,6 +139,7 @@ struct cppc_cpudata { #ifdef CONFIG_ACPI_CPPC_LIB extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); +extern int cppc_get_highest_perf(int cpunum, u64 *highest_perf); extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); @@ -165,6 +166,10 @@ static inline int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) { return -ENOTSUPP; } +static inline int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{ + return -ENOTSUPP; +} static inline int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs) { return -ENOTSUPP;
On 8/8/2023 03:09, Meng Li wrote:
Add support for getting the highest performance to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values.
I suggest adding this to commit message:
Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?...
Signed-off-by: Meng Li li.meng@amd.com
Reviewed-by: Mario Limonciello mario.limonciello@amd.com
drivers/acpi/cppc_acpi.c | 13 +++++++++++++ include/acpi/cppc_acpi.h | 5 +++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 7ff269a78c20..ad388a0e8484 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1154,6 +1154,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); } +/**
- cppc_get_highest_perf - Get the highest performance register value.
- @cpunum: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{
- return cppc_get_perf(cpunum, HIGHEST_PERF, highest_perf);
+} +EXPORT_SYMBOL_GPL(cppc_get_highest_perf);
- /**
- cppc_get_epp_perf - Get the epp register value.
- @cpunum: CPU from which to get epp preference value.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 6126c977ece0..c0b69ffe7bdb 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -139,6 +139,7 @@ struct cppc_cpudata { #ifdef CONFIG_ACPI_CPPC_LIB extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); +extern int cppc_get_highest_perf(int cpunum, u64 *highest_perf); extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); @@ -165,6 +166,10 @@ static inline int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) { return -ENOTSUPP; } +static inline int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{
- return -ENOTSUPP;
+} static inline int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs) { return -ENOTSUPP;
AMD Pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it AMD Pstate Preferrred Core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. AMD Pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by AMD Pstate when the system boots.
Add device attribute for preferred core states.
Add one new early parameter `enable` to allow user to enable the preferred core if the processor and power firmware can support preferred core feature.
Signed-off-by: Meng Li li.meng@amd.com Signed-off-by: Perry Yuan Perry.Yuan@amd.com --- drivers/cpufreq/amd-pstate.c | 149 +++++++++++++++++++++++++++++++---- 1 file changed, 133 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..e919b3f4ab18 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h>
#include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@
#define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255
/* * TODO: We need more time to fine tune processors with shared memory solution @@ -65,6 +68,14 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled;
+/* + * CPPC Preferred Core feature is supported by power firmware + */ +static bool prefcore_enabled = false; + +/* Disable AMD Pstate Preferred Core loading */ +static bool no_prefcore __read_mostly = true; + /* * AMD Energy Preference Performance (EPP) * The EPP is used in the CCLK DPM controller to drive @@ -290,23 +301,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1; - u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /* - * TODO: Introduce AMD specific power feature. - * - * CPPC entry doesn't indicate the highest performance in some ASICs. + /* For platforms that do not support the preferred core feature, the + * highest_pef may be configured with 166 or 255, to avoid max frequency + * calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as + * the default max perf. */ - highest_perf = amd_get_highest_perf(); - if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1)) - highest_perf = AMD_CPPC_HIGHEST_PERF(cap1); - - WRITE_ONCE(cpudata->highest_perf, highest_perf); + if (!prefcore_enabled) + WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1)); + else + WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +327,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf; - u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf(); - if (highest_perf > cppc_perf.highest_perf) - highest_perf = cppc_perf.highest_perf; - - WRITE_ONCE(cpudata->highest_perf, highest_perf); + if (!prefcore_enabled) + WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf); + else + WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +683,90 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); }
+/* + * Set AMD Pstate Preferred Core enable can't be done directly from cpufreq callbacks + * due to locking, so queue the work for later. + */ +static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{ + sched_set_itmt_support(); +} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn); + +/** + * Get the highest performance register value. + * @cpu: CPU from which to get highest performance. + * @highest_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{ + int ret; + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + u64 cap1; + + ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1); + if (ret) + return ret; + WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1)); + } else { + ret = cppc_get_highest_perf(cpu, highest_perf); + } + + return (ret); +} + +static void amd_pstate_init_prefcore(void) +{ + int cpu, ret; + u64 highest_perf; + + if (no_prefcore) + return; + + for_each_possible_cpu(cpu) { + ret = amd_pstate_get_highest_perf(cpu, &highest_perf); + if (ret) + break; + + sched_set_itmt_core_prio(highest_perf, cpu); + } + + /* + * This code can be run during CPU online under the + * CPU hotplug locks, so sched_set_amd_prefcore_support() + * cannot be called from here. Queue up a work item + * to invoke it. + */ + schedule_work(&sched_prefcore_work); +} + +/* + * Check if AMD Pstate Preferred core feature is supported and enabled + * 1) no_prefcore is used to enable or disable AMD Pstate Preferred Core + * loading when user would like to enable or disable it. Without that, + * AMD Pstate Preferred Core will be disabled by default if the processor + * and power firmware can support preferred core feature. + * 2) prefcore_enabled is used to indicate whether CPPC preferred core is enabled. + */ +static void check_prefcore_supported(int cpu) +{ + u64 highest_perf; + int ret; + + if (no_prefcore) + return; + + ret = amd_pstate_get_highest_perf(cpu, &highest_perf); + if (ret) + return; + + if(highest_perf < AMD_PSTATE_MAX_CPPC_PERF) + prefcore_enabled = true; +} + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -697,6 +788,9 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
cpudata->cpu = policy->cpu;
+ /* check if CPPC preferred core feature is enabled*/ + check_prefcore_supported(policy->cpu); + ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1; @@ -1037,6 +1131,12 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; }
+static ssize_t prefcore_state_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", prefcore_enabled ? "enabled" : "disabled"); +} + cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -1044,6 +1144,7 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(prefcore_state);
static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1164,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr, + &dev_attr_prefcore_state.attr, NULL };
@@ -1114,6 +1216,9 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu; cpudata->epp_policy = 0;
+ /* check if CPPC preferred core feature is supported*/ + check_prefcore_supported(policy->cpu); + ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1; @@ -1506,6 +1611,8 @@ static int __init amd_pstate_init(void) } }
+ amd_pstate_init_prefcore(); + return ret;
global_attr_free: @@ -1527,7 +1634,17 @@ static int __init amd_pstate_param(char *str)
return amd_pstate_set_driver(mode_idx); } + +static int __init amd_prefcore_param(char *str) +{ + if (!strcmp(str, "enable")) + no_prefcore = false; + + return 0; +} + early_param("amd_pstate", amd_pstate_param); +early_param("amd_prefcore", amd_prefcore_param);
MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
On Tue, Aug 08, 2023 at 04:09:57PM +0800, Meng Li wrote:
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{
int ret;
if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
} else {
ret = cppc_get_highest_perf(cpu, highest_perf);
}
return (ret);
+}
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (no_prefcore)
return;
- for_each_possible_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
So what is the intended behaviour when online != possible ?
sched_set_itmt_core_prio(highest_perf, cpu);
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
On 8/8/2023 03:09, Meng Li wrote:
AMD Pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it AMD Pstate Preferrred Core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. AMD Pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by AMD Pstate when the system boots.
Add device attribute for preferred core states.
Add one new early parameter `enable` to allow user to enable the preferred core if the processor and power firmware can support preferred core feature.
Signed-off-by: Meng Li li.meng@amd.com Signed-off-by: Perry Yuan Perry.Yuan@amd.com
drivers/cpufreq/amd-pstate.c | 149 +++++++++++++++++++++++++++++++---- 1 file changed, 133 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..e919b3f4ab18 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h> #include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@ #define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255 /*
- TODO: We need more time to fine tune processors with shared memory solution
@@ -65,6 +68,14 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; +/*
- CPPC Preferred Core feature is supported by power firmware
- */
+static bool prefcore_enabled = false;
+/* Disable AMD Pstate Preferred Core loading */ +static bool no_prefcore __read_mostly = true;
I feel like it's confusing to have two variables to keep track of here when it comes to the state machine and determining if you're enabled or not.
As an alternative can't you just use a single boolean? You can initialize it as:
static bool prefcore = true;
When you process early params you set it to false if it's turned off on command line.
If it's enabled by the time you try to set up prefcore then when you check if the firmware supports it, if it doesn't you set it to false.
Your sysfs value can always accurately reflect the contents of this variable then.
/*
- AMD Energy Preference Performance (EPP)
- The EPP is used in the CCLK DPM controller to drive
@@ -290,23 +301,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1;
- u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /*
* TODO: Introduce AMD specific power feature.
*
* CPPC entry doesn't indicate the highest performance in some ASICs.
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
*/* the default max perf.
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (!prefcore_enabled)
WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
- else
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +327,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf;
- u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf();
- if (highest_perf > cppc_perf.highest_perf)
highest_perf = cppc_perf.highest_perf;
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (!prefcore_enabled)
WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
- else
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +683,90 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); } +/*
- Set AMD Pstate Preferred Core enable can't be done directly from cpufreq callbacks
- due to locking, so queue the work for later.
- */
+static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{
- sched_set_itmt_support();
+} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn);
+/**
- Get the highest performance register value.
- @cpu: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{
int ret;
if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
} else {
ret = cppc_get_highest_perf(cpu, highest_perf);
}
return (ret);
+}
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (no_prefcore)
return;
- for_each_possible_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
+/*
- Check if AMD Pstate Preferred core feature is supported and enabled
- no_prefcore is used to enable or disable AMD Pstate Preferred Core
- loading when user would like to enable or disable it. Without that,
- AMD Pstate Preferred Core will be disabled by default if the processor
- and power firmware can support preferred core feature.
- prefcore_enabled is used to indicate whether CPPC preferred core is enabled.
- */
+static void check_prefcore_supported(int cpu) +{
- u64 highest_perf;
- int ret;
- if (no_prefcore)
return;
- ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
- if (ret)
return;
- if(highest_perf < AMD_PSTATE_MAX_CPPC_PERF)
prefcore_enabled = true;
+}
- static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -697,6 +788,9 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu;
- /* check if CPPC preferred core feature is enabled*/
- check_prefcore_supported(policy->cpu);
- ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1;
@@ -1037,6 +1131,12 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; } +static ssize_t prefcore_state_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", prefcore_enabled ? "enabled" : "disabled");
+}
- cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -1044,6 +1144,7 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(prefcore_state); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1164,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr,
- &dev_attr_prefcore_state.attr, NULL };
@@ -1114,6 +1216,9 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu; cpudata->epp_policy = 0;
- /* check if CPPC preferred core feature is supported*/
- check_prefcore_supported(policy->cpu);
- ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1;
@@ -1506,6 +1611,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free: @@ -1527,7 +1634,17 @@ static int __init amd_pstate_param(char *str) return amd_pstate_set_driver(mode_idx); }
+static int __init amd_prefcore_param(char *str) +{
- if (!strcmp(str, "enable"))
no_prefcore = false;
- return 0;
+}
- early_param("amd_pstate", amd_pstate_param);
+early_param("amd_prefcore", amd_prefcore_param); MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
On 8/8/2023 03:09, Meng Li wrote:
AMD Pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it AMD Pstate Preferrred Core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature.
By using this function you need to also ensure that CONFIG_SCHED_MC_PRIO has been set up in drivers/cpufreq/Kconfig.x86 when amd-pstate is used.
Also I think it's worth changing arch/x86/Kconfig to: 1) Drop the requirement for CPU_SUP_INTEL 2) select X86_AMD_PSTATE
AMD Pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by AMD Pstate when the system boots.
Add device attribute for preferred core states.
Add one new early parameter `enable` to allow user to enable the preferred core if the processor and power firmware can support preferred core feature.
Signed-off-by: Meng Li li.meng@amd.com Signed-off-by: Perry Yuan Perry.Yuan@amd.com
drivers/cpufreq/amd-pstate.c | 149 +++++++++++++++++++++++++++++++---- 1 file changed, 133 insertions(+), 16 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9a1e194d5cf8..e919b3f4ab18 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h> #include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,8 @@ #define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166 +#define AMD_PSTATE_MAX_CPPC_PERF 255 /*
- TODO: We need more time to fine tune processors with shared memory solution
@@ -65,6 +68,14 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; +/*
- CPPC Preferred Core feature is supported by power firmware
- */
+static bool prefcore_enabled = false;
+/* Disable AMD Pstate Preferred Core loading */ +static bool no_prefcore __read_mostly = true;
- /*
- AMD Energy Preference Performance (EPP)
- The EPP is used in the CCLK DPM controller to drive
@@ -290,23 +301,21 @@ static inline int amd_pstate_enable(bool enable) static int pstate_init_perf(struct amd_cpudata *cpudata) { u64 cap1;
- u32 highest_perf;
int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &cap1); if (ret) return ret;
- /*
* TODO: Introduce AMD specific power feature.
*
* CPPC entry doesn't indicate the highest performance in some ASICs.
- /* For platforms that do not support the preferred core feature, the
* highest_pef may be configured with 166 or 255, to avoid max frequency
* calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
*/* the default max perf.
- highest_perf = amd_get_highest_perf();
- if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (!prefcore_enabled)
WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
- else
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); @@ -318,17 +327,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) static int cppc_init_perf(struct amd_cpudata *cpudata) { struct cppc_perf_caps cppc_perf;
- u32 highest_perf;
int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); if (ret) return ret;
- highest_perf = amd_get_highest_perf();
- if (highest_perf > cppc_perf.highest_perf)
highest_perf = cppc_perf.highest_perf;
- WRITE_ONCE(cpudata->highest_perf, highest_perf);
- if (!prefcore_enabled)
WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
- else
WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); WRITE_ONCE(cpudata->lowest_nonlinear_perf, @@ -676,6 +683,90 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); } +/*
- Set AMD Pstate Preferred Core enable can't be done directly from cpufreq callbacks
- due to locking, so queue the work for later.
- */
+static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{
- sched_set_itmt_support();
+} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn);
+/**
- Get the highest performance register value.
- @cpu: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+static int amd_pstate_get_highest_perf(int cpu, u64 *highest_perf) +{
int ret;
if (boot_cpu_has(X86_FEATURE_CPPC)) {
u64 cap1;
ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
if (ret)
return ret;
WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
} else {
ret = cppc_get_highest_perf(cpu, highest_perf);
}
return (ret);
+}
+static void amd_pstate_init_prefcore(void) +{
- int cpu, ret;
- u64 highest_perf;
- if (no_prefcore)
return;
- for_each_possible_cpu(cpu) {
ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
if (ret)
break;
sched_set_itmt_core_prio(highest_perf, cpu);
- }
- /*
* This code can be run during CPU online under the
* CPU hotplug locks, so sched_set_amd_prefcore_support()
* cannot be called from here. Queue up a work item
* to invoke it.
*/
- schedule_work(&sched_prefcore_work);
+}
+/*
- Check if AMD Pstate Preferred core feature is supported and enabled
- no_prefcore is used to enable or disable AMD Pstate Preferred Core
- loading when user would like to enable or disable it. Without that,
- AMD Pstate Preferred Core will be disabled by default if the processor
- and power firmware can support preferred core feature.
- prefcore_enabled is used to indicate whether CPPC preferred core is enabled.
- */
+static void check_prefcore_supported(int cpu) +{
- u64 highest_perf;
- int ret;
- if (no_prefcore)
return;
- ret = amd_pstate_get_highest_perf(cpu, &highest_perf);
- if (ret)
return;
- if(highest_perf < AMD_PSTATE_MAX_CPPC_PERF)
prefcore_enabled = true;
+}
- static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -697,6 +788,9 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu;
- /* check if CPPC preferred core feature is enabled*/
- check_prefcore_supported(policy->cpu);
- ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1;
@@ -1037,6 +1131,12 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; } +static ssize_t prefcore_state_show(struct device *dev,
struct device_attribute *attr, char *buf)
+{
- return sysfs_emit(buf, "%s\n", prefcore_enabled ? "enabled" : "disabled");
+}
- cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
@@ -1044,6 +1144,7 @@ cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(prefcore_state); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -1063,6 +1164,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr,
- &dev_attr_prefcore_state.attr, NULL };
@@ -1114,6 +1216,9 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu; cpudata->epp_policy = 0;
- /* check if CPPC preferred core feature is supported*/
- check_prefcore_supported(policy->cpu);
- ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1;
@@ -1506,6 +1611,8 @@ static int __init amd_pstate_init(void) } }
- amd_pstate_init_prefcore();
- return ret;
global_attr_free: @@ -1527,7 +1634,17 @@ static int __init amd_pstate_param(char *str) return amd_pstate_set_driver(mode_idx); }
+static int __init amd_prefcore_param(char *str) +{
- if (!strcmp(str, "enable"))
no_prefcore = false;
- return 0;
+}
- early_param("amd_pstate", amd_pstate_param);
+early_param("amd_prefcore", amd_prefcore_param); MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
Please refer to the ACPI_Spec for details on the highest performance and notify events of CPPC.
Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/htmlspecs/AddCPI_Spec_6_4_html/08_Processor_Configuration_a... --- drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 4 ++++ 3 files changed, 23 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break; + case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED: + cpufreq_update_highest_perf(pr->id); + acpi_bus_generate_netlink_event(device->pnp.device_class, + dev_name(&device->dev), event, 0); + break; default: acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event); break; diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 50bbc969ffe5..842357abfae6 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2675,6 +2675,19 @@ void cpufreq_update_limits(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_update_limits);
+/** + * cpufreq_update_highest_perf - Update highest performance for a given CPU. + * @cpu: CPU to update the highest performance for. + * + * Invoke the driver's ->update_highest_perf callback if present + */ +void cpufreq_update_highest_perf(unsigned int cpu) +{ + if (cpufreq_driver->update_highest_perf) + cpufreq_driver->update_highest_perf(cpu); +} +EXPORT_SYMBOL_GPL(cpufreq_update_highest_perf); + /********************************************************************* * BOOST * *********************************************************************/ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 172ff51c1b2a..766c83a4fae7 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -231,6 +231,7 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu); void refresh_frequency_limits(struct cpufreq_policy *policy); void cpufreq_update_policy(unsigned int cpu); void cpufreq_update_limits(unsigned int cpu); +void cpufreq_update_highest_perf(unsigned int cpu); bool have_governor_per_policy(void); bool cpufreq_supports_freq_invariance(void); struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy); @@ -376,6 +377,9 @@ struct cpufreq_driver { /* Called to update policy limits on firmware notifications. */ void (*update_limits)(unsigned int cpu);
+ /* Called to update highest performance on firmware notifications. */ + void (*update_highest_perf)(unsigned int cpu); + /* optional */ int (*bios_limit)(int cpu, unsigned int *limit);
On Tue, Aug 08, 2023 at 04:09:58PM +0800, Meng Li wrote:
Please refer to the ACPI_Spec for details on the highest performance and notify events of CPPC.
Please summarise so that we don't get to click random links on the interweb just to try and make sense of things.
Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/htmlspecs/AddCPI_Spec_6_4_html/08_Processor_Configuration_a...
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 4 ++++ 3 files changed, 23 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
Isn't that spelled: 'highest' ? ^
Preferred core rankings can be changed dynamically by the platform based on the workload and platform conditions and accounting for thermals and aging. When this occurs, cpu priority need to be set.
Signed-off-by: Meng Li li.meng@amd.com --- drivers/cpufreq/amd-pstate.c | 32 ++++++++++++++++++++++++++++++++ include/linux/amd-pstate.h | 1 + 2 files changed, 33 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index e919b3f4ab18..ba10aa971dcb 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -320,6 +320,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1)); + WRITE_ONCE(cpudata->prefcore_highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
return 0; } @@ -341,6 +342,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->lowest_nonlinear_perf, cppc_perf.lowest_nonlinear_perf); WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); + WRITE_ONCE(cpudata->prefcore_highest_perf, cppc_perf.highest_perf);
if (cppc_state == AMD_PSTATE_ACTIVE) return 0; @@ -743,6 +745,34 @@ static void amd_pstate_init_prefcore(void) schedule_work(&sched_prefcore_work); }
+static void amd_pstate_update_highest_perf(unsigned int cpu) +{ + struct cpufreq_policy *policy; + struct amd_cpudata *cpudata; + u32 prev_high = 0, cur_high = 0; + u64 highest_perf; + int ret; + + if (!prefcore_enabled) + return; + + ret = amd_pstate_get_highest_perf(cpu, &highest_perf); + if (ret) + return; + + policy = cpufreq_cpu_get(cpu); + cpudata = policy->driver_data; + cur_high = highest_perf; + prev_high = READ_ONCE(cpudata->prefcore_highest_perf); + + if (prev_high != cur_high) { + WRITE_ONCE(cpudata->prefcore_highest_perf, cur_high); + sched_set_itmt_core_prio(cur_high, cpu); + } + + cpufreq_cpu_put(policy); +} + /* * Check if AMD Pstate Preferred core feature is supported and enabled * 1) no_prefcore is used to enable or disable AMD Pstate Preferred Core @@ -1497,6 +1527,7 @@ static struct cpufreq_driver amd_pstate_driver = { .suspend = amd_pstate_cpu_suspend, .resume = amd_pstate_cpu_resume, .set_boost = amd_pstate_set_boost, + .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate", .attr = amd_pstate_attr, }; @@ -1511,6 +1542,7 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .online = amd_pstate_epp_cpu_online, .suspend = amd_pstate_epp_suspend, .resume = amd_pstate_epp_resume, + .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate-epp", .attr = amd_pstate_epp_attr, }; diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 446394f84606..fa86bc953d3e 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -70,6 +70,7 @@ struct amd_cpudata { u32 nominal_perf; u32 lowest_nonlinear_perf; u32 lowest_perf; + u32 prefcore_highest_perf;
u32 max_freq; u32 min_freq;
Introduce AMD Pstate Preferred Core.
check preferred core state: $ cat /sys/devices/system/cpu/amd-pstate/prefcore_state
Signed-off-by: Meng Li li.meng@amd.com --- Documentation/admin-guide/pm/amd-pstate.rst | 55 +++++++++++++++++++++ 1 file changed, 55 insertions(+)
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 1cf40f69278c..4a30cf235425 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -353,6 +353,49 @@ is activated. In this mode, driver requests minimum and maximum performance level and the platform autonomously selects a performance level in this range and appropriate to the current workload.
+AMD Pstate Preferred Core +================================= + +The core frequency is subjected to the process variation in semiconductors. +Not all cores are able to reach the maximum frequency respecting the +infrastructure limits. Consequently, AMD has redefined the concept of +maximum frequency of a part. This means that a fraction of cores can reach +maximum frequency. To find the best process scheduling policy for a given +scenario, OS needs to know the core ordering informed by the platform through +highest performance capability register of the CPPC interface. + +``AMD Pstate Preferred Core`` use ITMT arch provides functions and data structures +for enabling the scheduler to favor scheduling on cores can be get a higher frequency +with lower voltage under preferred core. And it has the ability to dynamically +change the preferred core based on the workload and platform conditions and +accounting for thermals and aging. + +The priority metric will be initialized by the AMD Pstate driver. The AMD Pstate +driver will also determine whether or not ``AMD Pstate Preferred Core`` is +supported by the platform. + +AMD Pstate driver will provide an initial core ordering when the system boots. +The platform uses the CPPC interfaces to communicate the core ranking to the +operating system and scheduler to make sure that OS is choosing the cores +with highest performance firstly for scheduling the process. When AMD Pstate +driver receives a message with the highest performance change, it will +update the core ranking and set the cpu's priority. + +AMD Preferred Core Switch +================================= +Kernel Parameters +----------------- + +``AMD Pstate Preferred Core`` has two states: enable and disable. +Enable/disable states can be chosen by different kernel parameters. +Default disable ``AMD Pstate Preferred Core``. + +``amd_prefcore=enable`` + +If ``amd_prefcore=enable`` is passed to kernel command line option +then enable ``AMD Pstate Preferred Core`` if the processor and power +firmware can support preferred core feature. + User Space Interface in ``sysfs`` - General ===========================================
@@ -385,6 +428,18 @@ control its functionality at the system level. They are located in the to the operation mode represented by that string - or to be unregistered in the "disable" case.
+``prefcore_state`` + Preferred Core state of the driver: "enabled" or "disabled". + + "enabled" + Enable the AMD Preferred Core. + + "disabled" + Disable the AMD Preferred Core + + + This attribute is read-only to check the state of Preferred Core. + ``cpupower`` tool support for ``amd-pstate`` ===============================================
On 8/8/2023 03:10, Meng Li wrote:
Introduce AMD Pstate Preferred Core.
check preferred core state: $ cat /sys/devices/system/cpu/amd-pstate/prefcore_state
Signed-off-by: Meng Li li.meng@amd.com
Documentation/admin-guide/pm/amd-pstate.rst | 55 +++++++++++++++++++++ 1 file changed, 55 insertions(+)
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 1cf40f69278c..4a30cf235425 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -353,6 +353,49 @@ is activated. In this mode, driver requests minimum and maximum performance level and the platform autonomously selects a performance level in this range and appropriate to the current workload. +AMD Pstate Preferred Core +=================================
+The core frequency is subjected to the process variation in semiconductors. +Not all cores are able to reach the maximum frequency respecting the +infrastructure limits. Consequently, AMD has redefined the concept of +maximum frequency of a part. This means that a fraction of cores can reach +maximum frequency. To find the best process scheduling policy for a given +scenario, OS needs to know the core ordering informed by the platform through +highest performance capability register of the CPPC interface.
+``AMD Pstate Preferred Core`` use ITMT arch provides functions and data structures +for enabling the scheduler to favor scheduling on cores can be get a higher frequency +with lower voltage under preferred core.
This sentence was useful for the commit message, but I don't think it should be in the user facing documentation.
And it has the ability to dynamically +change the preferred core based on the workload and platform conditions and +accounting for thermals and aging.
+The priority metric will be initialized by the AMD Pstate driver. The AMD Pstate +driver will also determine whether or not ``AMD Pstate Preferred Core`` is +supported by the platform.
+AMD Pstate driver will provide an initial core ordering when the system boots. +The platform uses the CPPC interfaces to communicate the core ranking to the +operating system and scheduler to make sure that OS is choosing the cores +with highest performance firstly for scheduling the process. When AMD Pstate +driver receives a message with the highest performance change, it will +update the core ranking and set the cpu's priority.
+AMD Preferred Core Switch +================================= +Kernel Parameters +-----------------
+``AMD Pstate Preferred Core`` has two states: enable and disable. +Enable/disable states can be chosen by different kernel parameters. +Default disable ``AMD Pstate Preferred Core``.
Why default disable?
+``amd_prefcore=enable``
+If ``amd_prefcore=enable`` is passed to kernel command line option +then enable ``AMD Pstate Preferred Core`` if the processor and power +firmware can support preferred core feature.
This can be simplified as "platform can support the preferred core feature".
User Space Interface in ``sysfs`` - General
@@ -385,6 +428,18 @@ control its functionality at the system level. They are located in the to the operation mode represented by that string - or to be unregistered in the "disable" case. +``prefcore_state``
- Preferred Core state of the driver: "enabled" or "disabled".
- "enabled"
Enable the AMD Preferred Core.
- "disabled"
Disable the AMD Preferred Core
This attribute is read-only to check the state of Preferred Core.
As the attribute is read only and won't change at runtime, I don't think it makes sense to include the word "state" in the sysfs file name.
You can just rename it to "prefcore".
``cpupower`` tool support for ``amd-pstate``
AMD Pstate driver support enable/disable Preferred core. Default disabled on platforms supporting AMD Preferred Core. Enable AMD Pstate Preferred Core with "amd_prefcore=enable" added to the kernel command line.
Signed-off-by: Meng Li li.meng@amd.com --- Documentation/admin-guide/kernel-parameters.txt | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2de235d52fac..bc92e178431b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -363,6 +363,11 @@ selects a performance level in this range and appropriate to the current workload.
+ amd_prefcore= + [X86] + enable + Enable AMD Pstate Preferred Core. + amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT Format: <a>,<b>
On 8/8/2023 03:10, Meng Li wrote:
AMD Pstate driver support enable/disable Preferred core. Default disabled on platforms supporting AMD Preferred Core.
Why default disabled?
Shouldn't this default enabled and then let users decide they don't want to use it if it's causing a problem for them?
Enable AMD Pstate Preferred Core with "amd_prefcore=enable" added to the kernel command line.
Signed-off-by: Meng Li li.meng@amd.com
Documentation/admin-guide/kernel-parameters.txt | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2de235d52fac..bc92e178431b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -363,6 +363,11 @@ selects a performance level in this range and appropriate to the current workload.
- amd_prefcore=
[X86]
enable
Enable AMD Pstate Preferred Core.
- amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT Format: <a>,<b>
linux-kselftest-mirror@lists.linaro.org