This patchset introduces CPPC(Collaborative Processor Performance Control) as a backend to the PID governor. The PID governor from intel_pstate.c maps cleanly onto some CPPC interfaces. e.g. The CPU performance requests are made on a continuous scale as against discrete pstate levels. The CPU performance feedback over an interval is gauged using platform specific counters which are also described by CPPC.
Although CPPC describes several other registers to provide more hints to the platform, Linux as of today does not have the infrastructure to make use of those registers. Some of the CPPC specific information could be made available from the scheduler as part of the CPUfreq and Scheduler intergration work. Until then PID can be used as the front end for CPPC.
Beyond code restructuring and renaming, this patchset does not change the logic from the intel_pstate.c driver. Kernel compilation times were compared with the original intel_pstate.c, intel backend(intel_pid_ctrl.c) and the CPPC backend and no significant overheads were noticed.
Testing was performed on a Thinkpad X240 laptop.
PID_CTRL + INTEL_PSTATE: ======================= real 5m37.742s user 18m42.575s sys 1m0.521s
PID_CTRL + CPPC_PID_CTRL: ======================== real 5m48.321s user 18m24.487s sys 0m59.327s
ORIGINAL INTEL_PSTATE: ====================== real 5m40.642s user 18m37.411s sys 1m0.185s
The complete patchset including the PCC hacks used for testing is available in [4].
Changes since V0: [1] - Split intel_pstate.c into a generic PID governor and platform specific backend. - Add CPPC accessors as PID backend.
CPPC: ====
CPPC (Collaborative Processor Performance Control) is a new way to control CPU performance using an abstract continous scale as against a discretized P-state scale which is tied to CPU frequency only. It is defined in the ACPI 5.0+ spec. In brief, the basic operation involves: - OS makes a CPU performance request. (Can provide min and max tolerable bounds)
- Platform (such as BMC) is free to optimize request within requested bounds depending on power/thermal budgets etc.
- Platform conveys its decision back to OS
The communication between OS and platform occurs through another medium called (PCC) Platform communication Channel. This is a generic mailbox like mechanism which includes doorbell semantics to indicate register updates. The PCC driver is being discussed in a separate patchset [3] and is not included here, since CPPC is only one client of PCC.
Finer details about the PCC and CPPC spec are available in the latest ACPI 5.1 specification.[2]
[1] - http://lwn.net/Articles/608715/ [2] - http://www.uefi.org/sites/default/files/resources/ACPI_5_1release.pdf [3] - http://comments.gmane.org/gmane.linux.acpi.devel/70299 [4] - http://git.linaro.org/people/ashwin.chaugule/leg-kernel.git/shortlog/refs/he...
Ashwin Chaugule (6): PID Controller governor PID: Move Turbo detection into backend driver PID: Move Baytrail specific accessors into backend driver PID: Add new function pointers to read multiple registers PID: Rename counters to make them more generic PID: Add CPPC (Collaborative Processor Performance) backend driver
Documentation/cpu-freq/intel-pstate.txt | 43 -- Documentation/cpu-freq/pid_ctrl.txt | 41 ++ drivers/cpufreq/Kconfig | 19 + drivers/cpufreq/Kconfig.x86 | 2 +- drivers/cpufreq/Makefile | 4 +- drivers/cpufreq/cppc_pid_ctrl.c | 406 +++++++++++++ drivers/cpufreq/intel_pid_ctrl.c | 408 +++++++++++++ drivers/cpufreq/intel_pstate.c | 1012 ------------------------------- drivers/cpufreq/pid_ctrl.c | 615 +++++++++++++++++++ drivers/cpufreq/pid_ctrl.h | 113 ++++ 10 files changed, 1606 insertions(+), 1057 deletions(-) delete mode 100644 Documentation/cpu-freq/intel-pstate.txt create mode 100644 Documentation/cpu-freq/pid_ctrl.txt create mode 100644 drivers/cpufreq/cppc_pid_ctrl.c create mode 100644 drivers/cpufreq/intel_pid_ctrl.c delete mode 100644 drivers/cpufreq/intel_pstate.c create mode 100644 drivers/cpufreq/pid_ctrl.c create mode 100644 drivers/cpufreq/pid_ctrl.h
The intel_state.c driver contains its own governor which is an implementation of PID control theory. Using PID to control CPU performance has a lot of advantages, some of which are:
(1) Cpu performance is requested on a continuous scale which exploits the full range of CPU clocking abilities.
(2) It uses platform counters to accurately gauge what happened in the past interval since its last request. This idea is generally more applicable to modern CPUs where the platform modifies CPU performance under the covers from the OS.
(3) PID tunables are exposed via sysfs for platform specific tuning if required.
To prepare the code to be resuable across architectures which support platform counters similar to aperf/mperf, this patch starts off by splitting intel_pstate.c with the following structure:
(1) A PID controller governor (2) A backend driver that accesses counters used by the PID contoller.
The PID governor still has a few X86 specific things which are moved out in the following patch.
Signed-off-by: Ashwin Chaugule ashwin.chaugule@linaro.org --- Documentation/cpu-freq/intel-pstate.txt | 43 -- Documentation/cpu-freq/pid_ctrl.txt | 41 ++ drivers/cpufreq/Kconfig | 9 + drivers/cpufreq/Kconfig.x86 | 2 +- drivers/cpufreq/Makefile | 3 +- drivers/cpufreq/intel_pid_ctrl.c | 347 +++++++++++ drivers/cpufreq/intel_pstate.c | 1012 ------------------------------- drivers/cpufreq/pid_ctrl.c | 638 +++++++++++++++++++ drivers/cpufreq/pid_ctrl.h | 120 ++++ 9 files changed, 1158 insertions(+), 1057 deletions(-) delete mode 100644 Documentation/cpu-freq/intel-pstate.txt create mode 100644 Documentation/cpu-freq/pid_ctrl.txt create mode 100644 drivers/cpufreq/intel_pid_ctrl.c delete mode 100644 drivers/cpufreq/intel_pstate.c create mode 100644 drivers/cpufreq/pid_ctrl.c create mode 100644 drivers/cpufreq/pid_ctrl.h
diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt deleted file mode 100644 index a69ffe1..0000000 --- a/Documentation/cpu-freq/intel-pstate.txt +++ /dev/null @@ -1,43 +0,0 @@ -Intel P-state driver --------------------- - -This driver implements a scaling driver with an internal governor for -Intel Core processors. The driver follows the same model as the -Transmeta scaling driver (longrun.c) and implements the setpolicy() -instead of target(). Scaling drivers that implement setpolicy() are -assumed to implement internal governors by the cpufreq core. All the -logic for selecting the current P state is contained within the -driver; no external governor is used by the cpufreq core. - -Intel SandyBridge+ processors are supported. - -New sysfs files for controlling P state selection have been added to -/sys/devices/system/cpu/intel_pstate/ - - max_perf_pct: limits the maximum P state that will be requested by - the driver stated as a percentage of the available performance. The - available (P states) performance may be reduced by the no_turbo - setting described below. - - min_perf_pct: limits the minimum P state that will be requested by - the driver stated as a percentage of the max (non-turbo) - performance level. - - no_turbo: limits the driver to selecting P states below the turbo - frequency range. - -For contemporary Intel processors, the frequency is controlled by the -processor itself and the P-states exposed to software are related to -performance levels. The idea that frequency can be set to a single -frequency is fiction for Intel Core processors. Even if the scaling -driver selects a single P state the actual frequency the processor -will run at is selected by the processor itself. - -New debugfs files have also been added to /sys/kernel/debug/pstate_snb/ - - deadband - d_gain_pct - i_gain_pct - p_gain_pct - sample_rate_ms - setpoint diff --git a/Documentation/cpu-freq/pid_ctrl.txt b/Documentation/cpu-freq/pid_ctrl.txt new file mode 100644 index 0000000..324064e --- /dev/null +++ b/Documentation/cpu-freq/pid_ctrl.txt @@ -0,0 +1,41 @@ +PID controller driver +--------------------- + +drivers/cpufreq/pid_ctrl.c implements a scaling driver similar to the +Transmeta scaling driver (longrun.c) and is independant of the cpufreq +core. + +drivers/cpufreq/intel_pid_ctrl.c implements the Intel specific backend +to access counters required by the PID controller governor. +Intel SandyBridge+ processors are supported. + +New sysfs files for controlling P state selection have been added to +/sys/devices/system/cpu/pid_ctrl/ + + max_perf_pct: limits the maximum P state that will be requested by + the driver stated as a percentage of the available performance. The + available (P states) performance may be reduced by the no_turbo + setting described below. + + min_perf_pct: limits the minimum P state that will be requested by + the driver stated as a percentage of the max (non-turbo) + performance level. + + no_turbo: limits the driver to selecting P states below the turbo + frequency range. + +For contemporary Intel processors, the frequency is controlled by the +processor itself and the P-states exposed to software are related to +performance levels. The idea that frequency can be set to a single +frequency is fiction for Intel Core processors. Even if the scaling +driver selects a single P state the actual frequency the processor +will run at is selected by the processor itself. + +New debugfs files have also been added to /sys/kernel/debug/pstate_snb/ + + deadband + d_gain_pct + i_gain_pct + p_gain_pct + sample_rate_ms + setpoint diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index ffe350f..bbc19ac 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -196,6 +196,15 @@ config GENERIC_CPUFREQ_CPU0
If in doubt, say N.
+config PID_CTRL + bool "PID Controller Governor" + help + This CPU performance governor implements a controller based on + the Proportional-Integral-Derivative control theory. PID specific + knobs are exposed through sysfs for platform specific tuning. This + governor requires platform specific backend drivers to access + counters. See Documentation/cpu-freq/pid_ctrl.txt + menu "x86 CPU frequency scaling drivers" depends on X86 source "drivers/cpufreq/Kconfig.x86" diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86 index 89ae88f..3ffa46a 100644 --- a/drivers/cpufreq/Kconfig.x86 +++ b/drivers/cpufreq/Kconfig.x86 @@ -4,7 +4,7 @@
config X86_INTEL_PSTATE bool "Intel P state control" - depends on X86 + depends on X86 && PID_CTRL help This driver provides a P state for Intel core processors. The driver implements an internal governor and will become diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index db6d9a2..6d1a4d0 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -39,7 +39,8 @@ obj-$(CONFIG_X86_SPEEDSTEP_SMI) += speedstep-smi.o obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o -obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pstate.o +obj-$(CONFIG_PID_CTRL) += pid_ctrl.o +obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pid_ctrl.o obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o
################################################################################## diff --git a/drivers/cpufreq/intel_pid_ctrl.c b/drivers/cpufreq/intel_pid_ctrl.c new file mode 100644 index 0000000..ebab074 --- /dev/null +++ b/drivers/cpufreq/intel_pid_ctrl.c @@ -0,0 +1,347 @@ +/* + * intel_pid_ctrl.c: Native P state management for Intel processors + * + * (C) Copyright 2012 Intel Corporation + * Author: Dirk Brandewie dirk.j.brandewie@intel.com + * + * (C) Copyright 2014 Linaro Ltd. + * Author: Ashwin Chaugule ashwin.chaugule@linaro.org + * - Restructured intel_pstate.c into a generic PID controller + * governor and separate backend platform specific driver. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include <linux/module.h> +#include <linux/types.h> +#include <linux/acpi.h> + +#include <asm/msr.h> +#include <asm/cpu_device_id.h> + +#include "pid_ctrl.h" + +#define BYT_RATIOS 0x66a +#define BYT_VIDS 0x66b +#define BYT_TURBO_RATIOS 0x66c +#define BYT_TURBO_VIDS 0x66d + +struct perf_limits limits = { + .no_turbo = 0, + .max_perf_pct = 100, + .max_perf = int_tofp(1), + .min_perf_pct = 0, + .min_perf = 0, + .max_policy_pct = 100, + .max_sysfs_pct = 100, +}; + +static int byt_get_min_pstate(void) +{ + u64 value; + + rdmsrl(BYT_RATIOS, value); + return (value >> 8) & 0x7F; +} + +static int byt_get_max_pstate(void) +{ + u64 value; + + rdmsrl(BYT_RATIOS, value); + return (value >> 16) & 0x7F; +} + +static int byt_get_turbo_pstate(void) +{ + u64 value; + + rdmsrl(BYT_TURBO_RATIOS, value); + return value & 0x7F; +} + +static void byt_set_pstate(struct cpudata *cpudata, int pstate) +{ + u64 val; + int32_t vid_fp; + u32 vid; + + val = pstate << 8; + if (limits.no_turbo && !limits.turbo_disabled) + val |= (u64)1 << 32; + + vid_fp = cpudata->vid.min + mul_fp( + int_tofp(pstate - cpudata->pstate.min_pstate), + cpudata->vid.ratio); + + vid_fp = clamp_t(int32_t, vid_fp, cpudata->vid.min, cpudata->vid.max); + vid = fp_toint(vid_fp); + + if (pstate > cpudata->pstate.max_pstate) + vid = cpudata->vid.turbo; + + val |= vid; + + wrmsrl(MSR_IA32_PERF_CTL, val); +} + +static void byt_get_vid(struct cpudata *cpudata) +{ + u64 value; + + rdmsrl(BYT_VIDS, value); + cpudata->vid.min = int_tofp((value >> 8) & 0x7f); + cpudata->vid.max = int_tofp((value >> 16) & 0x7f); + cpudata->vid.ratio = div_fp( + cpudata->vid.max - cpudata->vid.min, + int_tofp(cpudata->pstate.max_pstate - + cpudata->pstate.min_pstate)); + + rdmsrl(BYT_TURBO_VIDS, value); + cpudata->vid.turbo = value & 0x7f; +} + +static int core_get_min_pstate(void) +{ + u64 value; + + rdmsrl(MSR_PLATFORM_INFO, value); + return (value >> 40) & 0xFF; +} + +static int core_get_max_pstate(void) +{ + u64 value; + + rdmsrl(MSR_PLATFORM_INFO, value); + return (value >> 8) & 0xFF; +} + +static int core_get_turbo_pstate(void) +{ + u64 value; + int nont, ret; + + rdmsrl(MSR_NHM_TURBO_RATIO_LIMIT, value); + nont = core_get_max_pstate(); + ret = ((value) & 255); + if (ret <= nont) + ret = nont; + return ret; +} + +static void core_set_pstate(struct cpudata *cpudata, int pstate) +{ + u64 val; + + val = pstate << 8; + if (limits.no_turbo && !limits.turbo_disabled) + val |= (u64)1 << 32; + + wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); +} + +static struct cpu_defaults core_params = { + .pid_policy = { + .sample_rate_ms = 10, + .deadband = 0, + .setpoint = 97, + .p_gain_pct = 20, + .d_gain_pct = 0, + .i_gain_pct = 0, + }, + .funcs = { + .get_max = core_get_max_pstate, + .get_min = core_get_min_pstate, + .get_turbo = core_get_turbo_pstate, + .set = core_set_pstate, + }, +}; + +static struct cpu_defaults byt_params = { + .pid_policy = { + .sample_rate_ms = 10, + .deadband = 0, + .setpoint = 97, + .p_gain_pct = 14, + .d_gain_pct = 0, + .i_gain_pct = 4, + }, + .funcs = { + .get_max = byt_get_max_pstate, + .get_min = byt_get_min_pstate, + .get_turbo = byt_get_turbo_pstate, + .set = byt_set_pstate, + .get_vid = byt_get_vid, + }, +}; + + +#define ICPU(model, policy) \ + { X86_VENDOR_INTEL, 6, model, X86_FEATURE_APERFMPERF,\ + (unsigned long)&policy } + +static const struct x86_cpu_id intel_pid_ctrl_cpu_ids[] = { + ICPU(0x2a, core_params), + ICPU(0x2d, core_params), + ICPU(0x37, byt_params), + ICPU(0x3a, core_params), + ICPU(0x3c, core_params), + ICPU(0x3d, core_params), + ICPU(0x3e, core_params), + ICPU(0x3f, core_params), + ICPU(0x45, core_params), + ICPU(0x46, core_params), + ICPU(0x4f, core_params), + ICPU(0x56, core_params), + {} +}; +MODULE_DEVICE_TABLE(x86cpu, intel_pid_ctrl_cpu_ids); + +static int no_load __initdata; + +static int intel_pid_ctrl_msrs_not_valid(struct cpu_defaults *cpuinfo) +{ + /* Check that all the msr's we are using are valid. */ + u64 aperf, mperf, tmp; + + rdmsrl(MSR_IA32_APERF, aperf); + rdmsrl(MSR_IA32_MPERF, mperf); + + if (!cpuinfo->funcs.get_max() || + !cpuinfo->funcs.get_min() || + !cpuinfo->funcs.get_turbo()) + return -ENODEV; + + rdmsrl(MSR_IA32_APERF, tmp); + if (!(tmp - aperf)) + return -ENODEV; + + rdmsrl(MSR_IA32_MPERF, tmp); + if (!(tmp - mperf)) + return -ENODEV; + + return 0; +} + +#if IS_ENABLED(CONFIG_ACPI) +#include <acpi/processor.h> + +static bool intel_pid_ctrl_no_acpi_pss(void) +{ + int i; + + for_each_possible_cpu(i) { + acpi_status status; + union acpi_object *pss; + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; + struct acpi_processor *pr = per_cpu(processors, i); + + if (!pr) + continue; + + status = acpi_evaluate_object(pr->handle, "_PSS", NULL, + &buffer); + if (ACPI_FAILURE(status)) + continue; + + pss = buffer.pointer; + if (pss && pss->type == ACPI_TYPE_PACKAGE) { + kfree(pss); + return false; + } + + kfree(pss); + } + + return true; +} + +struct hw_vendor_info { + u16 valid; + char oem_id[ACPI_OEM_ID_SIZE]; + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE]; +}; + +/* Hardware vendor-specific info that has its own power management modes */ +static struct hw_vendor_info vendor_info[] = { + {1, "HP ", "ProLiant"}, + {0, "", ""}, +}; + +static bool intel_pid_ctrl_platform_pwr_mgmt_exists(void) +{ + struct acpi_table_header hdr; + struct hw_vendor_info *v_info; + + if (acpi_disabled + || ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, &hdr))) + return false; + + for (v_info = vendor_info; v_info->valid; v_info++) { + if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) + && !strncmp(hdr.oem_table_id, v_info->oem_table_id, + ACPI_OEM_TABLE_ID_SIZE) + && intel_pid_ctrl_no_acpi_pss()) + return true; + } + + return false; +} +#else /* CONFIG_ACPI not enabled */ +static inline bool intel_pid_ctrl_platform_pwr_mgmt_exists(void) +{ + return false; +} +#endif /* CONFIG_ACPI */ + +static int __init intel_pid_ctrl_init(void) +{ + const struct x86_cpu_id *id; + struct cpu_defaults *cpu_info; + + if (no_load) + return -ENODEV; + + id = x86_match_cpu(intel_pid_ctrl_cpu_ids); + if (!id) + return -ENODEV; + + /* + * The Intel PID controller driver will be ignored if the platform + * firmware has its own power management modes. + */ + if (intel_pid_ctrl_platform_pwr_mgmt_exists()) + return -ENODEV; + + cpu_info = (struct cpu_defaults *)id->driver_data; + + if (intel_pid_ctrl_msrs_not_valid(cpu_info)) + return -ENODEV; + + pr_info("Intel PID controller driver initializing.\n"); + + register_pid_params(&cpu_info->pid_policy); + register_cpu_funcs(&cpu_info->funcs); + + return 0; +} +device_initcall(intel_pid_ctrl_init); + +static int __init intel_pid_ctrl_setup(char *str) +{ + if (!str) + return -EINVAL; + + if (!strcmp(str, "disable")) + no_load = 1; + return 0; +} +early_param("intel_pid_ctrl", intel_pid_ctrl_setup); + +MODULE_AUTHOR("Dirk Brandewie dirk.j.brandewie@intel.com"); +MODULE_DESCRIPTION("'intel_pid_ctrl' - P state driver Intel Core processors"); +MODULE_LICENSE("GPL"); diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c deleted file mode 100644 index 86631cb..0000000 --- a/drivers/cpufreq/intel_pstate.c +++ /dev/null @@ -1,1012 +0,0 @@ -/* - * intel_pstate.c: Native P state management for Intel processors - * - * (C) Copyright 2012 Intel Corporation - * Author: Dirk Brandewie dirk.j.brandewie@intel.com - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; version 2 - * of the License. - */ - -#include <linux/kernel.h> -#include <linux/kernel_stat.h> -#include <linux/module.h> -#include <linux/ktime.h> -#include <linux/hrtimer.h> -#include <linux/tick.h> -#include <linux/slab.h> -#include <linux/sched.h> -#include <linux/list.h> -#include <linux/cpu.h> -#include <linux/cpufreq.h> -#include <linux/sysfs.h> -#include <linux/types.h> -#include <linux/fs.h> -#include <linux/debugfs.h> -#include <linux/acpi.h> -#include <trace/events/power.h> - -#include <asm/div64.h> -#include <asm/msr.h> -#include <asm/cpu_device_id.h> - -#define BYT_RATIOS 0x66a -#define BYT_VIDS 0x66b -#define BYT_TURBO_RATIOS 0x66c -#define BYT_TURBO_VIDS 0x66d - - -#define FRAC_BITS 8 -#define int_tofp(X) ((int64_t)(X) << FRAC_BITS) -#define fp_toint(X) ((X) >> FRAC_BITS) - - -static inline int32_t mul_fp(int32_t x, int32_t y) -{ - return ((int64_t)x * (int64_t)y) >> FRAC_BITS; -} - -static inline int32_t div_fp(int32_t x, int32_t y) -{ - return div_s64((int64_t)x << FRAC_BITS, (int64_t)y); -} - -struct sample { - int32_t core_pct_busy; - u64 aperf; - u64 mperf; - int freq; - ktime_t time; -}; - -struct pstate_data { - int current_pstate; - int min_pstate; - int max_pstate; - int turbo_pstate; -}; - -struct vid_data { - int min; - int max; - int turbo; - int32_t ratio; -}; - -struct _pid { - int setpoint; - int32_t integral; - int32_t p_gain; - int32_t i_gain; - int32_t d_gain; - int deadband; - int32_t last_err; -}; - -struct cpudata { - int cpu; - - struct timer_list timer; - - struct pstate_data pstate; - struct vid_data vid; - struct _pid pid; - - ktime_t last_sample_time; - u64 prev_aperf; - u64 prev_mperf; - struct sample sample; -}; - -static struct cpudata **all_cpu_data; -struct pstate_adjust_policy { - int sample_rate_ms; - int deadband; - int setpoint; - int p_gain_pct; - int d_gain_pct; - int i_gain_pct; -}; - -struct pstate_funcs { - int (*get_max)(void); - int (*get_min)(void); - int (*get_turbo)(void); - void (*set)(struct cpudata*, int pstate); - void (*get_vid)(struct cpudata *); -}; - -struct cpu_defaults { - struct pstate_adjust_policy pid_policy; - struct pstate_funcs funcs; -}; - -static struct pstate_adjust_policy pid_params; -static struct pstate_funcs pstate_funcs; - -struct perf_limits { - int no_turbo; - int turbo_disabled; - int max_perf_pct; - int min_perf_pct; - int32_t max_perf; - int32_t min_perf; - int max_policy_pct; - int max_sysfs_pct; -}; - -static struct perf_limits limits = { - .no_turbo = 0, - .max_perf_pct = 100, - .max_perf = int_tofp(1), - .min_perf_pct = 0, - .min_perf = 0, - .max_policy_pct = 100, - .max_sysfs_pct = 100, -}; - -static inline void pid_reset(struct _pid *pid, int setpoint, int busy, - int deadband, int integral) { - pid->setpoint = setpoint; - pid->deadband = deadband; - pid->integral = int_tofp(integral); - pid->last_err = int_tofp(setpoint) - int_tofp(busy); -} - -static inline void pid_p_gain_set(struct _pid *pid, int percent) -{ - pid->p_gain = div_fp(int_tofp(percent), int_tofp(100)); -} - -static inline void pid_i_gain_set(struct _pid *pid, int percent) -{ - pid->i_gain = div_fp(int_tofp(percent), int_tofp(100)); -} - -static inline void pid_d_gain_set(struct _pid *pid, int percent) -{ - - pid->d_gain = div_fp(int_tofp(percent), int_tofp(100)); -} - -static signed int pid_calc(struct _pid *pid, int32_t busy) -{ - signed int result; - int32_t pterm, dterm, fp_error; - int32_t integral_limit; - - fp_error = int_tofp(pid->setpoint) - busy; - - if (abs(fp_error) <= int_tofp(pid->deadband)) - return 0; - - pterm = mul_fp(pid->p_gain, fp_error); - - pid->integral += fp_error; - - /* limit the integral term */ - integral_limit = int_tofp(30); - if (pid->integral > integral_limit) - pid->integral = integral_limit; - if (pid->integral < -integral_limit) - pid->integral = -integral_limit; - - dterm = mul_fp(pid->d_gain, fp_error - pid->last_err); - pid->last_err = fp_error; - - result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm; - result = result + (1 << (FRAC_BITS-1)); - return (signed int)fp_toint(result); -} - -static inline void intel_pstate_busy_pid_reset(struct cpudata *cpu) -{ - pid_p_gain_set(&cpu->pid, pid_params.p_gain_pct); - pid_d_gain_set(&cpu->pid, pid_params.d_gain_pct); - pid_i_gain_set(&cpu->pid, pid_params.i_gain_pct); - - pid_reset(&cpu->pid, - pid_params.setpoint, - 100, - pid_params.deadband, - 0); -} - -static inline void intel_pstate_reset_all_pid(void) -{ - unsigned int cpu; - for_each_online_cpu(cpu) { - if (all_cpu_data[cpu]) - intel_pstate_busy_pid_reset(all_cpu_data[cpu]); - } -} - -/************************** debugfs begin ************************/ -static int pid_param_set(void *data, u64 val) -{ - *(u32 *)data = val; - intel_pstate_reset_all_pid(); - return 0; -} -static int pid_param_get(void *data, u64 *val) -{ - *val = *(u32 *)data; - return 0; -} -DEFINE_SIMPLE_ATTRIBUTE(fops_pid_param, pid_param_get, - pid_param_set, "%llu\n"); - -struct pid_param { - char *name; - void *value; -}; - -static struct pid_param pid_files[] = { - {"sample_rate_ms", &pid_params.sample_rate_ms}, - {"d_gain_pct", &pid_params.d_gain_pct}, - {"i_gain_pct", &pid_params.i_gain_pct}, - {"deadband", &pid_params.deadband}, - {"setpoint", &pid_params.setpoint}, - {"p_gain_pct", &pid_params.p_gain_pct}, - {NULL, NULL} -}; - -static struct dentry *debugfs_parent; -static void intel_pstate_debug_expose_params(void) -{ - int i = 0; - - debugfs_parent = debugfs_create_dir("pstate_snb", NULL); - if (IS_ERR_OR_NULL(debugfs_parent)) - return; - while (pid_files[i].name) { - debugfs_create_file(pid_files[i].name, 0660, - debugfs_parent, pid_files[i].value, - &fops_pid_param); - i++; - } -} - -/************************** debugfs end ************************/ - -/************************** sysfs begin ************************/ -#define show_one(file_name, object) \ - static ssize_t show_##file_name \ - (struct kobject *kobj, struct attribute *attr, char *buf) \ - { \ - return sprintf(buf, "%u\n", limits.object); \ - } - -static ssize_t store_no_turbo(struct kobject *a, struct attribute *b, - const char *buf, size_t count) -{ - unsigned int input; - int ret; - ret = sscanf(buf, "%u", &input); - if (ret != 1) - return -EINVAL; - limits.no_turbo = clamp_t(int, input, 0 , 1); - if (limits.turbo_disabled) { - pr_warn("Turbo disabled by BIOS or unavailable on processor\n"); - limits.no_turbo = limits.turbo_disabled; - } - return count; -} - -static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b, - const char *buf, size_t count) -{ - unsigned int input; - int ret; - ret = sscanf(buf, "%u", &input); - if (ret != 1) - return -EINVAL; - - limits.max_sysfs_pct = clamp_t(int, input, 0 , 100); - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); - limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100)); - return count; -} - -static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b, - const char *buf, size_t count) -{ - unsigned int input; - int ret; - ret = sscanf(buf, "%u", &input); - if (ret != 1) - return -EINVAL; - limits.min_perf_pct = clamp_t(int, input, 0 , 100); - limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100)); - - return count; -} - -show_one(no_turbo, no_turbo); -show_one(max_perf_pct, max_perf_pct); -show_one(min_perf_pct, min_perf_pct); - -define_one_global_rw(no_turbo); -define_one_global_rw(max_perf_pct); -define_one_global_rw(min_perf_pct); - -static struct attribute *intel_pstate_attributes[] = { - &no_turbo.attr, - &max_perf_pct.attr, - &min_perf_pct.attr, - NULL -}; - -static struct attribute_group intel_pstate_attr_group = { - .attrs = intel_pstate_attributes, -}; -static struct kobject *intel_pstate_kobject; - -static void intel_pstate_sysfs_expose_params(void) -{ - int rc; - - intel_pstate_kobject = kobject_create_and_add("intel_pstate", - &cpu_subsys.dev_root->kobj); - BUG_ON(!intel_pstate_kobject); - rc = sysfs_create_group(intel_pstate_kobject, - &intel_pstate_attr_group); - BUG_ON(rc); -} - -/************************** sysfs end ************************/ -static int byt_get_min_pstate(void) -{ - u64 value; - rdmsrl(BYT_RATIOS, value); - return (value >> 8) & 0x7F; -} - -static int byt_get_max_pstate(void) -{ - u64 value; - rdmsrl(BYT_RATIOS, value); - return (value >> 16) & 0x7F; -} - -static int byt_get_turbo_pstate(void) -{ - u64 value; - rdmsrl(BYT_TURBO_RATIOS, value); - return value & 0x7F; -} - -static void byt_set_pstate(struct cpudata *cpudata, int pstate) -{ - u64 val; - int32_t vid_fp; - u32 vid; - - val = pstate << 8; - if (limits.no_turbo && !limits.turbo_disabled) - val |= (u64)1 << 32; - - vid_fp = cpudata->vid.min + mul_fp( - int_tofp(pstate - cpudata->pstate.min_pstate), - cpudata->vid.ratio); - - vid_fp = clamp_t(int32_t, vid_fp, cpudata->vid.min, cpudata->vid.max); - vid = fp_toint(vid_fp); - - if (pstate > cpudata->pstate.max_pstate) - vid = cpudata->vid.turbo; - - val |= vid; - - wrmsrl(MSR_IA32_PERF_CTL, val); -} - -static void byt_get_vid(struct cpudata *cpudata) -{ - u64 value; - - - rdmsrl(BYT_VIDS, value); - cpudata->vid.min = int_tofp((value >> 8) & 0x7f); - cpudata->vid.max = int_tofp((value >> 16) & 0x7f); - cpudata->vid.ratio = div_fp( - cpudata->vid.max - cpudata->vid.min, - int_tofp(cpudata->pstate.max_pstate - - cpudata->pstate.min_pstate)); - - rdmsrl(BYT_TURBO_VIDS, value); - cpudata->vid.turbo = value & 0x7f; -} - - -static int core_get_min_pstate(void) -{ - u64 value; - rdmsrl(MSR_PLATFORM_INFO, value); - return (value >> 40) & 0xFF; -} - -static int core_get_max_pstate(void) -{ - u64 value; - rdmsrl(MSR_PLATFORM_INFO, value); - return (value >> 8) & 0xFF; -} - -static int core_get_turbo_pstate(void) -{ - u64 value; - int nont, ret; - rdmsrl(MSR_NHM_TURBO_RATIO_LIMIT, value); - nont = core_get_max_pstate(); - ret = ((value) & 255); - if (ret <= nont) - ret = nont; - return ret; -} - -static void core_set_pstate(struct cpudata *cpudata, int pstate) -{ - u64 val; - - val = pstate << 8; - if (limits.no_turbo && !limits.turbo_disabled) - val |= (u64)1 << 32; - - wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); -} - -static struct cpu_defaults core_params = { - .pid_policy = { - .sample_rate_ms = 10, - .deadband = 0, - .setpoint = 97, - .p_gain_pct = 20, - .d_gain_pct = 0, - .i_gain_pct = 0, - }, - .funcs = { - .get_max = core_get_max_pstate, - .get_min = core_get_min_pstate, - .get_turbo = core_get_turbo_pstate, - .set = core_set_pstate, - }, -}; - -static struct cpu_defaults byt_params = { - .pid_policy = { - .sample_rate_ms = 10, - .deadband = 0, - .setpoint = 97, - .p_gain_pct = 14, - .d_gain_pct = 0, - .i_gain_pct = 4, - }, - .funcs = { - .get_max = byt_get_max_pstate, - .get_min = byt_get_min_pstate, - .get_turbo = byt_get_turbo_pstate, - .set = byt_set_pstate, - .get_vid = byt_get_vid, - }, -}; - - -static void intel_pstate_get_min_max(struct cpudata *cpu, int *min, int *max) -{ - int max_perf = cpu->pstate.turbo_pstate; - int max_perf_adj; - int min_perf; - if (limits.no_turbo) - max_perf = cpu->pstate.max_pstate; - - max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits.max_perf)); - *max = clamp_t(int, max_perf_adj, - cpu->pstate.min_pstate, cpu->pstate.turbo_pstate); - - min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits.min_perf)); - *min = clamp_t(int, min_perf, - cpu->pstate.min_pstate, max_perf); -} - -static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate) -{ - int max_perf, min_perf; - - intel_pstate_get_min_max(cpu, &min_perf, &max_perf); - - pstate = clamp_t(int, pstate, min_perf, max_perf); - - if (pstate == cpu->pstate.current_pstate) - return; - - trace_cpu_frequency(pstate * 100000, cpu->cpu); - - cpu->pstate.current_pstate = pstate; - - pstate_funcs.set(cpu, pstate); -} - -static inline void intel_pstate_pstate_increase(struct cpudata *cpu, int steps) -{ - int target; - target = cpu->pstate.current_pstate + steps; - - intel_pstate_set_pstate(cpu, target); -} - -static inline void intel_pstate_pstate_decrease(struct cpudata *cpu, int steps) -{ - int target; - target = cpu->pstate.current_pstate - steps; - intel_pstate_set_pstate(cpu, target); -} - -static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) -{ - cpu->pstate.min_pstate = pstate_funcs.get_min(); - cpu->pstate.max_pstate = pstate_funcs.get_max(); - cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); - - if (pstate_funcs.get_vid) - pstate_funcs.get_vid(cpu); - intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate); -} - -static inline void intel_pstate_calc_busy(struct cpudata *cpu) -{ - struct sample *sample = &cpu->sample; - int64_t core_pct; - int32_t rem; - - core_pct = int_tofp(sample->aperf) * int_tofp(100); - core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem); - - if ((rem << 1) >= int_tofp(sample->mperf)) - core_pct += 1; - - sample->freq = fp_toint( - mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); - - sample->core_pct_busy = (int32_t)core_pct; -} - -static inline void intel_pstate_sample(struct cpudata *cpu) -{ - u64 aperf, mperf; - - rdmsrl(MSR_IA32_APERF, aperf); - rdmsrl(MSR_IA32_MPERF, mperf); - - aperf = aperf >> FRAC_BITS; - mperf = mperf >> FRAC_BITS; - - cpu->last_sample_time = cpu->sample.time; - cpu->sample.time = ktime_get(); - cpu->sample.aperf = aperf; - cpu->sample.mperf = mperf; - cpu->sample.aperf -= cpu->prev_aperf; - cpu->sample.mperf -= cpu->prev_mperf; - - intel_pstate_calc_busy(cpu); - - cpu->prev_aperf = aperf; - cpu->prev_mperf = mperf; -} - -static inline void intel_pstate_set_sample_time(struct cpudata *cpu) -{ - int sample_time, delay; - - sample_time = pid_params.sample_rate_ms; - delay = msecs_to_jiffies(sample_time); - mod_timer_pinned(&cpu->timer, jiffies + delay); -} - -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) -{ - int32_t core_busy, max_pstate, current_pstate, sample_ratio; - u32 duration_us; - u32 sample_time; - - core_busy = cpu->sample.core_pct_busy; - max_pstate = int_tofp(cpu->pstate.max_pstate); - current_pstate = int_tofp(cpu->pstate.current_pstate); - core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); - - sample_time = (pid_params.sample_rate_ms * USEC_PER_MSEC); - duration_us = (u32) ktime_us_delta(cpu->sample.time, - cpu->last_sample_time); - if (duration_us > sample_time * 3) { - sample_ratio = div_fp(int_tofp(sample_time), - int_tofp(duration_us)); - core_busy = mul_fp(core_busy, sample_ratio); - } - - return core_busy; -} - -static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) -{ - int32_t busy_scaled; - struct _pid *pid; - signed int ctl = 0; - int steps; - - pid = &cpu->pid; - busy_scaled = intel_pstate_get_scaled_busy(cpu); - - ctl = pid_calc(pid, busy_scaled); - - steps = abs(ctl); - - if (ctl < 0) - intel_pstate_pstate_increase(cpu, steps); - else - intel_pstate_pstate_decrease(cpu, steps); -} - -static void intel_pstate_timer_func(unsigned long __data) -{ - struct cpudata *cpu = (struct cpudata *) __data; - struct sample *sample; - - intel_pstate_sample(cpu); - - sample = &cpu->sample; - - intel_pstate_adjust_busy_pstate(cpu); - - trace_pstate_sample(fp_toint(sample->core_pct_busy), - fp_toint(intel_pstate_get_scaled_busy(cpu)), - cpu->pstate.current_pstate, - sample->mperf, - sample->aperf, - sample->freq); - - intel_pstate_set_sample_time(cpu); -} - -#define ICPU(model, policy) \ - { X86_VENDOR_INTEL, 6, model, X86_FEATURE_APERFMPERF,\ - (unsigned long)&policy } - -static const struct x86_cpu_id intel_pstate_cpu_ids[] = { - ICPU(0x2a, core_params), - ICPU(0x2d, core_params), - ICPU(0x37, byt_params), - ICPU(0x3a, core_params), - ICPU(0x3c, core_params), - ICPU(0x3d, core_params), - ICPU(0x3e, core_params), - ICPU(0x3f, core_params), - ICPU(0x45, core_params), - ICPU(0x46, core_params), - ICPU(0x4f, core_params), - ICPU(0x56, core_params), - {} -}; -MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids); - -static int intel_pstate_init_cpu(unsigned int cpunum) -{ - struct cpudata *cpu; - - all_cpu_data[cpunum] = kzalloc(sizeof(struct cpudata), GFP_KERNEL); - if (!all_cpu_data[cpunum]) - return -ENOMEM; - - cpu = all_cpu_data[cpunum]; - - cpu->cpu = cpunum; - intel_pstate_get_cpu_pstates(cpu); - - init_timer_deferrable(&cpu->timer); - cpu->timer.function = intel_pstate_timer_func; - cpu->timer.data = - (unsigned long)cpu; - cpu->timer.expires = jiffies + HZ/100; - intel_pstate_busy_pid_reset(cpu); - intel_pstate_sample(cpu); - - add_timer_on(&cpu->timer, cpunum); - - pr_info("Intel pstate controlling: cpu %d\n", cpunum); - - return 0; -} - -static unsigned int intel_pstate_get(unsigned int cpu_num) -{ - struct sample *sample; - struct cpudata *cpu; - - cpu = all_cpu_data[cpu_num]; - if (!cpu) - return 0; - sample = &cpu->sample; - return sample->freq; -} - -static int intel_pstate_set_policy(struct cpufreq_policy *policy) -{ - struct cpudata *cpu; - - cpu = all_cpu_data[policy->cpu]; - - if (!policy->cpuinfo.max_freq) - return -ENODEV; - - if (policy->policy == CPUFREQ_POLICY_PERFORMANCE) { - limits.min_perf_pct = 100; - limits.min_perf = int_tofp(1); - limits.max_perf_pct = 100; - limits.max_perf = int_tofp(1); - limits.no_turbo = limits.turbo_disabled; - return 0; - } - limits.min_perf_pct = (policy->min * 100) / policy->cpuinfo.max_freq; - limits.min_perf_pct = clamp_t(int, limits.min_perf_pct, 0 , 100); - limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100)); - - limits.max_policy_pct = policy->max * 100 / policy->cpuinfo.max_freq; - limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); - limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); - limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100)); - - return 0; -} - -static int intel_pstate_verify_policy(struct cpufreq_policy *policy) -{ - cpufreq_verify_within_cpu_limits(policy); - - if ((policy->policy != CPUFREQ_POLICY_POWERSAVE) && - (policy->policy != CPUFREQ_POLICY_PERFORMANCE)) - return -EINVAL; - - return 0; -} - -static void intel_pstate_stop_cpu(struct cpufreq_policy *policy) -{ - int cpu_num = policy->cpu; - struct cpudata *cpu = all_cpu_data[cpu_num]; - - pr_info("intel_pstate CPU %d exiting\n", cpu_num); - - del_timer_sync(&all_cpu_data[cpu_num]->timer); - intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate); - kfree(all_cpu_data[cpu_num]); - all_cpu_data[cpu_num] = NULL; -} - -static int intel_pstate_cpu_init(struct cpufreq_policy *policy) -{ - struct cpudata *cpu; - int rc; - u64 misc_en; - - rc = intel_pstate_init_cpu(policy->cpu); - if (rc) - return rc; - - cpu = all_cpu_data[policy->cpu]; - - rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); - if (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE || - cpu->pstate.max_pstate == cpu->pstate.turbo_pstate) { - limits.turbo_disabled = 1; - limits.no_turbo = 1; - } - if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100) - policy->policy = CPUFREQ_POLICY_PERFORMANCE; - else - policy->policy = CPUFREQ_POLICY_POWERSAVE; - - policy->min = cpu->pstate.min_pstate * 100000; - policy->max = cpu->pstate.turbo_pstate * 100000; - - /* cpuinfo and default policy values */ - policy->cpuinfo.min_freq = cpu->pstate.min_pstate * 100000; - policy->cpuinfo.max_freq = cpu->pstate.turbo_pstate * 100000; - policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL; - cpumask_set_cpu(policy->cpu, policy->cpus); - - return 0; -} - -static struct cpufreq_driver intel_pstate_driver = { - .flags = CPUFREQ_CONST_LOOPS, - .verify = intel_pstate_verify_policy, - .setpolicy = intel_pstate_set_policy, - .get = intel_pstate_get, - .init = intel_pstate_cpu_init, - .stop_cpu = intel_pstate_stop_cpu, - .name = "intel_pstate", -}; - -static int __initdata no_load; - -static int intel_pstate_msrs_not_valid(void) -{ - /* Check that all the msr's we are using are valid. */ - u64 aperf, mperf, tmp; - - rdmsrl(MSR_IA32_APERF, aperf); - rdmsrl(MSR_IA32_MPERF, mperf); - - if (!pstate_funcs.get_max() || - !pstate_funcs.get_min() || - !pstate_funcs.get_turbo()) - return -ENODEV; - - rdmsrl(MSR_IA32_APERF, tmp); - if (!(tmp - aperf)) - return -ENODEV; - - rdmsrl(MSR_IA32_MPERF, tmp); - if (!(tmp - mperf)) - return -ENODEV; - - return 0; -} - -static void copy_pid_params(struct pstate_adjust_policy *policy) -{ - pid_params.sample_rate_ms = policy->sample_rate_ms; - pid_params.p_gain_pct = policy->p_gain_pct; - pid_params.i_gain_pct = policy->i_gain_pct; - pid_params.d_gain_pct = policy->d_gain_pct; - pid_params.deadband = policy->deadband; - pid_params.setpoint = policy->setpoint; -} - -static void copy_cpu_funcs(struct pstate_funcs *funcs) -{ - pstate_funcs.get_max = funcs->get_max; - pstate_funcs.get_min = funcs->get_min; - pstate_funcs.get_turbo = funcs->get_turbo; - pstate_funcs.set = funcs->set; - pstate_funcs.get_vid = funcs->get_vid; -} - -#if IS_ENABLED(CONFIG_ACPI) -#include <acpi/processor.h> - -static bool intel_pstate_no_acpi_pss(void) -{ - int i; - - for_each_possible_cpu(i) { - acpi_status status; - union acpi_object *pss; - struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; - struct acpi_processor *pr = per_cpu(processors, i); - - if (!pr) - continue; - - status = acpi_evaluate_object(pr->handle, "_PSS", NULL, &buffer); - if (ACPI_FAILURE(status)) - continue; - - pss = buffer.pointer; - if (pss && pss->type == ACPI_TYPE_PACKAGE) { - kfree(pss); - return false; - } - - kfree(pss); - } - - return true; -} - -struct hw_vendor_info { - u16 valid; - char oem_id[ACPI_OEM_ID_SIZE]; - char oem_table_id[ACPI_OEM_TABLE_ID_SIZE]; -}; - -/* Hardware vendor-specific info that has its own power management modes */ -static struct hw_vendor_info vendor_info[] = { - {1, "HP ", "ProLiant"}, - {0, "", ""}, -}; - -static bool intel_pstate_platform_pwr_mgmt_exists(void) -{ - struct acpi_table_header hdr; - struct hw_vendor_info *v_info; - - if (acpi_disabled - || ACPI_FAILURE(acpi_get_table_header(ACPI_SIG_FADT, 0, &hdr))) - return false; - - for (v_info = vendor_info; v_info->valid; v_info++) { - if (!strncmp(hdr.oem_id, v_info->oem_id, ACPI_OEM_ID_SIZE) - && !strncmp(hdr.oem_table_id, v_info->oem_table_id, ACPI_OEM_TABLE_ID_SIZE) - && intel_pstate_no_acpi_pss()) - return true; - } - - return false; -} -#else /* CONFIG_ACPI not enabled */ -static inline bool intel_pstate_platform_pwr_mgmt_exists(void) { return false; } -#endif /* CONFIG_ACPI */ - -static int __init intel_pstate_init(void) -{ - int cpu, rc = 0; - const struct x86_cpu_id *id; - struct cpu_defaults *cpu_info; - - if (no_load) - return -ENODEV; - - id = x86_match_cpu(intel_pstate_cpu_ids); - if (!id) - return -ENODEV; - - /* - * The Intel pstate driver will be ignored if the platform - * firmware has its own power management modes. - */ - if (intel_pstate_platform_pwr_mgmt_exists()) - return -ENODEV; - - cpu_info = (struct cpu_defaults *)id->driver_data; - - copy_pid_params(&cpu_info->pid_policy); - copy_cpu_funcs(&cpu_info->funcs); - - if (intel_pstate_msrs_not_valid()) - return -ENODEV; - - pr_info("Intel P-state driver initializing.\n"); - - all_cpu_data = vzalloc(sizeof(void *) * num_possible_cpus()); - if (!all_cpu_data) - return -ENOMEM; - - rc = cpufreq_register_driver(&intel_pstate_driver); - if (rc) - goto out; - - intel_pstate_debug_expose_params(); - intel_pstate_sysfs_expose_params(); - - return rc; -out: - get_online_cpus(); - for_each_online_cpu(cpu) { - if (all_cpu_data[cpu]) { - del_timer_sync(&all_cpu_data[cpu]->timer); - kfree(all_cpu_data[cpu]); - } - } - - put_online_cpus(); - vfree(all_cpu_data); - return -ENODEV; -} -device_initcall(intel_pstate_init); - -static int __init intel_pstate_setup(char *str) -{ - if (!str) - return -EINVAL; - - if (!strcmp(str, "disable")) - no_load = 1; - return 0; -} -early_param("intel_pstate", intel_pstate_setup); - -MODULE_AUTHOR("Dirk Brandewie dirk.j.brandewie@intel.com"); -MODULE_DESCRIPTION("'intel_pstate' - P state driver Intel Core processors"); -MODULE_LICENSE("GPL"); diff --git a/drivers/cpufreq/pid_ctrl.c b/drivers/cpufreq/pid_ctrl.c new file mode 100644 index 0000000..b273ce1 --- /dev/null +++ b/drivers/cpufreq/pid_ctrl.c @@ -0,0 +1,638 @@ +/* + * pid_ctrl.c: Native P state management for Intel processors + * + * (C) Copyright 2012 Intel Corporation + * Author: Dirk Brandewie dirk.j.brandewie@intel.com + * + * (C) Copyright 2014 Linaro Ltd. + * Author: Ashwin Chaugule ashwin.chaugule@linaro.org + * - Restructured intel_pstate.c into a generic PID controller + * governor and separate backend platform specific driver. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#include <linux/kernel.h> +#include <linux/kernel_stat.h> +#include <linux/module.h> +#include <linux/ktime.h> +#include <linux/hrtimer.h> +#include <linux/tick.h> +#include <linux/slab.h> +#include <linux/sched.h> +#include <linux/list.h> +#include <linux/cpu.h> +#include <linux/cpufreq.h> +#include <linux/sysfs.h> +#include <linux/types.h> +#include <linux/fs.h> +#include <linux/debugfs.h> +#include <linux/acpi.h> +#include <trace/events/power.h> + +#include <asm/div64.h> + +#include "pid_ctrl.h" + +static struct cpudata **all_cpu_data; +static struct pstate_adjust_policy pid_params; +static struct pstate_funcs pstate_funcs; + +static inline void pid_reset(struct _pid *pid, int setpoint, int busy, + int deadband, int integral) { + pid->setpoint = setpoint; + pid->deadband = deadband; + pid->integral = int_tofp(integral); + pid->last_err = int_tofp(setpoint) - int_tofp(busy); +} + +static inline void pid_p_gain_set(struct _pid *pid, int percent) +{ + pid->p_gain = div_fp(int_tofp(percent), int_tofp(100)); +} + +static inline void pid_i_gain_set(struct _pid *pid, int percent) +{ + pid->i_gain = div_fp(int_tofp(percent), int_tofp(100)); +} + +static inline void pid_d_gain_set(struct _pid *pid, int percent) +{ + + pid->d_gain = div_fp(int_tofp(percent), int_tofp(100)); +} + +static signed int pid_calc(struct _pid *pid, int32_t busy) +{ + signed int result; + int32_t pterm, dterm, fp_error; + int32_t integral_limit; + + fp_error = int_tofp(pid->setpoint) - busy; + + if (abs(fp_error) <= int_tofp(pid->deadband)) + return 0; + + pterm = mul_fp(pid->p_gain, fp_error); + + pid->integral += fp_error; + + /* limit the integral term */ + integral_limit = int_tofp(30); + if (pid->integral > integral_limit) + pid->integral = integral_limit; + if (pid->integral < -integral_limit) + pid->integral = -integral_limit; + + dterm = mul_fp(pid->d_gain, fp_error - pid->last_err); + pid->last_err = fp_error; + + result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm; + result = result + (1 << (FRAC_BITS-1)); + return (signed int)fp_toint(result); +} + +static inline void pid_ctrl_busy_pid_reset(struct cpudata *cpu) +{ + pid_p_gain_set(&cpu->pid, pid_params.p_gain_pct); + pid_d_gain_set(&cpu->pid, pid_params.d_gain_pct); + pid_i_gain_set(&cpu->pid, pid_params.i_gain_pct); + + pid_reset(&cpu->pid, + pid_params.setpoint, + 100, + pid_params.deadband, + 0); +} + +static inline void pid_ctrl_reset_all_pid(void) +{ + unsigned int cpu; + + for_each_online_cpu(cpu) { + if (all_cpu_data[cpu]) + pid_ctrl_busy_pid_reset(all_cpu_data[cpu]); + } +} + +/************************** debugfs begin ************************/ +static int pid_param_set(void *data, u64 val) +{ + *(u32 *)data = val; + pid_ctrl_reset_all_pid(); + return 0; +} +static int pid_param_get(void *data, u64 *val) +{ + *val = *(u32 *)data; + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(fops_pid_param, pid_param_get, + pid_param_set, "%llu\n"); + +struct pid_param { + char *name; + void *value; +}; + +static struct pid_param pid_files[] = { + {"sample_rate_ms", &pid_params.sample_rate_ms}, + {"d_gain_pct", &pid_params.d_gain_pct}, + {"i_gain_pct", &pid_params.i_gain_pct}, + {"deadband", &pid_params.deadband}, + {"setpoint", &pid_params.setpoint}, + {"p_gain_pct", &pid_params.p_gain_pct}, + {NULL, NULL} +}; + +static struct dentry *debugfs_parent; +static void pid_ctrl_debug_expose_params(void) +{ + int i = 0; + + debugfs_parent = debugfs_create_dir("pstate_snb", NULL); + if (IS_ERR_OR_NULL(debugfs_parent)) + return; + while (pid_files[i].name) { + debugfs_create_file(pid_files[i].name, 0660, + debugfs_parent, pid_files[i].value, + &fops_pid_param); + i++; + } +} + +/************************** debugfs end ************************/ + +/************************** sysfs begin ************************/ +#define show_one(file_name, object) \ + static ssize_t show_##file_name \ + (struct kobject *kobj, struct attribute *attr, char *buf) \ + { \ + return sprintf(buf, "%u\n", limits.object); \ + } + +static ssize_t store_no_turbo(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + limits.no_turbo = clamp_t(int, input, 0 , 1); + if (limits.turbo_disabled) { + pr_warn("Turbo disabled by BIOS or unavailable on processor\n"); + limits.no_turbo = limits.turbo_disabled; + } + return count; +} + +static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + + limits.max_sysfs_pct = clamp_t(int, input, 0 , 100); + limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); + limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100)); + return count; +} + +static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b, + const char *buf, size_t count) +{ + unsigned int input; + int ret; + + ret = sscanf(buf, "%u", &input); + if (ret != 1) + return -EINVAL; + limits.min_perf_pct = clamp_t(int, input, 0 , 100); + limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100)); + + return count; +} + +show_one(no_turbo, no_turbo); +show_one(max_perf_pct, max_perf_pct); +show_one(min_perf_pct, min_perf_pct); + +define_one_global_rw(no_turbo); +define_one_global_rw(max_perf_pct); +define_one_global_rw(min_perf_pct); + +static struct attribute *pid_ctrl_attributes[] = { + &no_turbo.attr, + &max_perf_pct.attr, + &min_perf_pct.attr, + NULL +}; + +static struct attribute_group pid_ctrl_attr_group = { + .attrs = pid_ctrl_attributes, +}; +static struct kobject *pid_ctrl_kobject; + +static void pid_ctrl_sysfs_expose_params(void) +{ + int rc; + + pid_ctrl_kobject = kobject_create_and_add("pid_ctrl", + &cpu_subsys.dev_root->kobj); + BUG_ON(!pid_ctrl_kobject); + rc = sysfs_create_group(pid_ctrl_kobject, + &pid_ctrl_attr_group); + BUG_ON(rc); +} + +/************************** sysfs end ************************/ + +static void pid_ctrl_get_min_max(struct cpudata *cpu, int *min, int *max) +{ + int max_perf = cpu->pstate.turbo_pstate; + int max_perf_adj; + int min_perf; + + if (limits.no_turbo) + max_perf = cpu->pstate.max_pstate; + + max_perf_adj = fp_toint(mul_fp(int_tofp(max_perf), limits.max_perf)); + *max = clamp_t(int, max_perf_adj, + cpu->pstate.min_pstate, cpu->pstate.turbo_pstate); + + min_perf = fp_toint(mul_fp(int_tofp(max_perf), limits.min_perf)); + *min = clamp_t(int, min_perf, + cpu->pstate.min_pstate, max_perf); +} + +static void pid_ctrl_set_pstate(struct cpudata *cpu, int pstate) +{ + int max_perf, min_perf; + + pid_ctrl_get_min_max(cpu, &min_perf, &max_perf); + + pstate = clamp_t(int, pstate, min_perf, max_perf); + + if (pstate == cpu->pstate.current_pstate) + return; + + trace_cpu_frequency(pstate * 100000, cpu->cpu); + + cpu->pstate.current_pstate = pstate; + + pstate_funcs.set(cpu, pstate); +} + +static inline void pid_ctrl_pstate_increase(struct cpudata *cpu, int steps) +{ + int target; + + target = cpu->pstate.current_pstate + steps; + + pid_ctrl_set_pstate(cpu, target); +} + +static inline void pid_ctrl_pstate_decrease(struct cpudata *cpu, int steps) +{ + int target; + + target = cpu->pstate.current_pstate - steps; + pid_ctrl_set_pstate(cpu, target); +} + +static void pid_ctrl_get_cpu_pstates(struct cpudata *cpu) +{ + cpu->pstate.min_pstate = pstate_funcs.get_min(); + cpu->pstate.max_pstate = pstate_funcs.get_max(); + cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); + + if (pstate_funcs.get_vid) + pstate_funcs.get_vid(cpu); + pid_ctrl_set_pstate(cpu, cpu->pstate.min_pstate); +} + +static inline void pid_ctrl_calc_busy(struct cpudata *cpu) +{ + struct sample *sample = &cpu->sample; + int64_t core_pct; + int32_t rem; + + core_pct = int_tofp(sample->aperf) * int_tofp(100); + core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem); + + if ((rem << 1) >= int_tofp(sample->mperf)) + core_pct += 1; + + sample->freq = fp_toint( + mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct)); + + sample->core_pct_busy = (int32_t)core_pct; +} + +static inline void pid_ctrl_sample(struct cpudata *cpu) +{ + u64 aperf, mperf; + + rdmsrl(MSR_IA32_APERF, aperf); + rdmsrl(MSR_IA32_MPERF, mperf); + + aperf = aperf >> FRAC_BITS; + mperf = mperf >> FRAC_BITS; + + cpu->last_sample_time = cpu->sample.time; + cpu->sample.time = ktime_get(); + cpu->sample.aperf = aperf; + cpu->sample.mperf = mperf; + cpu->sample.aperf -= cpu->prev_aperf; + cpu->sample.mperf -= cpu->prev_mperf; + + pid_ctrl_calc_busy(cpu); + + cpu->prev_aperf = aperf; + cpu->prev_mperf = mperf; +} + +static inline void pid_ctrl_set_sample_time(struct cpudata *cpu) +{ + int sample_time, delay; + + sample_time = pid_params.sample_rate_ms; + delay = msecs_to_jiffies(sample_time); + mod_timer_pinned(&cpu->timer, jiffies + delay); +} + +static inline int32_t pid_ctrl_get_scaled_busy(struct cpudata *cpu) +{ + int32_t core_busy, max_pstate, current_pstate, sample_ratio; + + u32 duration_us; + u32 sample_time; + + core_busy = cpu->sample.core_pct_busy; + max_pstate = int_tofp(cpu->pstate.max_pstate); + current_pstate = int_tofp(cpu->pstate.current_pstate); + core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate)); + + sample_time = (pid_params.sample_rate_ms * USEC_PER_MSEC); + duration_us = (u32) ktime_us_delta(cpu->sample.time, + cpu->last_sample_time); + if (duration_us > sample_time * 3) { + sample_ratio = div_fp(int_tofp(sample_time), + int_tofp(duration_us)); + core_busy = mul_fp(core_busy, sample_ratio); + } + + return core_busy; +} + +static inline void pid_ctrl_adjust_busy_pstate(struct cpudata *cpu) +{ + int32_t busy_scaled; + struct _pid *pid; + signed int ctl = 0; + int steps; + + pid = &cpu->pid; + busy_scaled = pid_ctrl_get_scaled_busy(cpu); + + ctl = pid_calc(pid, busy_scaled); + + steps = abs(ctl); + + if (ctl < 0) + pid_ctrl_pstate_increase(cpu, steps); + else + pid_ctrl_pstate_decrease(cpu, steps); +} + +static void pid_ctrl_timer_func(unsigned long __data) +{ + struct cpudata *cpu = (struct cpudata *) __data; + struct sample *sample; + + pid_ctrl_sample(cpu); + + sample = &cpu->sample; + + pid_ctrl_adjust_busy_pstate(cpu); + + trace_pstate_sample(fp_toint(sample->core_pct_busy), + fp_toint(pid_ctrl_get_scaled_busy(cpu)), + cpu->pstate.current_pstate, + sample->mperf, + sample->aperf, + sample->freq); + + pid_ctrl_set_sample_time(cpu); +} + +static int pid_ctrl_init_cpu(unsigned int cpunum) +{ + struct cpudata *cpu; + + all_cpu_data[cpunum] = kzalloc(sizeof(struct cpudata), GFP_KERNEL); + if (!all_cpu_data[cpunum]) + return -ENOMEM; + + cpu = all_cpu_data[cpunum]; + + cpu->cpu = cpunum; + pid_ctrl_get_cpu_pstates(cpu); + + init_timer_deferrable(&cpu->timer); + cpu->timer.function = pid_ctrl_timer_func; + cpu->timer.data = + (unsigned long)cpu; + cpu->timer.expires = jiffies + HZ/100; + pid_ctrl_busy_pid_reset(cpu); + pid_ctrl_sample(cpu); + + add_timer_on(&cpu->timer, cpunum); + + pr_info("Intel pstate controlling: cpu %d\n", cpunum); + + return 0; +} + +static unsigned int pid_ctrl_get(unsigned int cpu_num) +{ + struct sample *sample; + struct cpudata *cpu; + + cpu = all_cpu_data[cpu_num]; + if (!cpu) + return 0; + sample = &cpu->sample; + return sample->freq; +} + +static int pid_ctrl_set_policy(struct cpufreq_policy *policy) +{ + struct cpudata *cpu; + + cpu = all_cpu_data[policy->cpu]; + + if (!policy->cpuinfo.max_freq) + return -ENODEV; + + if (policy->policy == CPUFREQ_POLICY_PERFORMANCE) { + limits.min_perf_pct = 100; + limits.min_perf = int_tofp(1); + limits.max_perf_pct = 100; + limits.max_perf = int_tofp(1); + limits.no_turbo = limits.turbo_disabled; + return 0; + } + limits.min_perf_pct = (policy->min * 100) / policy->cpuinfo.max_freq; + limits.min_perf_pct = clamp_t(int, limits.min_perf_pct, 0 , 100); + limits.min_perf = div_fp(int_tofp(limits.min_perf_pct), int_tofp(100)); + + limits.max_policy_pct = policy->max * 100 / policy->cpuinfo.max_freq; + limits.max_policy_pct = clamp_t(int, limits.max_policy_pct, 0 , 100); + limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct); + limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100)); + + return 0; +} + +static int pid_ctrl_verify_policy(struct cpufreq_policy *policy) +{ + cpufreq_verify_within_cpu_limits(policy); + + if ((policy->policy != CPUFREQ_POLICY_POWERSAVE) && + (policy->policy != CPUFREQ_POLICY_PERFORMANCE)) + return -EINVAL; + + return 0; +} + +static void pid_ctrl_stop_cpu(struct cpufreq_policy *policy) +{ + int cpu_num = policy->cpu; + struct cpudata *cpu = all_cpu_data[cpu_num]; + + pr_info("pid_ctrl CPU %d exiting\n", cpu_num); + + del_timer_sync(&all_cpu_data[cpu_num]->timer); + pid_ctrl_set_pstate(cpu, cpu->pstate.min_pstate); + kfree(all_cpu_data[cpu_num]); + all_cpu_data[cpu_num] = NULL; +} + +static int pid_ctrl_cpu_init(struct cpufreq_policy *policy) +{ + struct cpudata *cpu; + int rc; + u64 misc_en; + + rc = pid_ctrl_init_cpu(policy->cpu); + if (rc) + return rc; + + cpu = all_cpu_data[policy->cpu]; + + rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); + if (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE || + cpu->pstate.max_pstate == cpu->pstate.turbo_pstate) { + limits.turbo_disabled = 1; + limits.no_turbo = 1; + } + if (limits.min_perf_pct == 100 && limits.max_perf_pct == 100) + policy->policy = CPUFREQ_POLICY_PERFORMANCE; + else + policy->policy = CPUFREQ_POLICY_POWERSAVE; + + policy->min = cpu->pstate.min_pstate * 100000; + policy->max = cpu->pstate.turbo_pstate * 100000; + + /* cpuinfo and default policy values */ + policy->cpuinfo.min_freq = cpu->pstate.min_pstate * 100000; + policy->cpuinfo.max_freq = cpu->pstate.turbo_pstate * 100000; + policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL; + cpumask_set_cpu(policy->cpu, policy->cpus); + + return 0; +} + +static struct cpufreq_driver pid_ctrl_driver = { + .flags = CPUFREQ_CONST_LOOPS, + .verify = pid_ctrl_verify_policy, + .setpolicy = pid_ctrl_set_policy, + .get = pid_ctrl_get, + .init = pid_ctrl_cpu_init, + .stop_cpu = pid_ctrl_stop_cpu, + .name = "pid_ctrl", +}; + +void register_pid_params(struct pstate_adjust_policy *policy) +{ + pid_params.sample_rate_ms = policy->sample_rate_ms; + pid_params.p_gain_pct = policy->p_gain_pct; + pid_params.i_gain_pct = policy->i_gain_pct; + pid_params.d_gain_pct = policy->d_gain_pct; + pid_params.deadband = policy->deadband; + pid_params.setpoint = policy->setpoint; +} +EXPORT_SYMBOL_GPL(register_pid_params); + +void register_cpu_funcs(struct pstate_funcs *funcs) +{ + pstate_funcs.get_max = funcs->get_max; + pstate_funcs.get_min = funcs->get_min; + pstate_funcs.get_turbo = funcs->get_turbo; + pstate_funcs.set = funcs->set; + pstate_funcs.get_vid = funcs->get_vid; +} +EXPORT_SYMBOL_GPL(register_cpu_funcs); + +static int __init pid_ctrl_init(void) +{ + int cpu, rc = 0; + + + if (!pstate_funcs.get_max || + !pstate_funcs.get_min || + !pstate_funcs.set || + !pid_params.sample_rate_ms) { + pr_err("Err registering pstate func accessors\n"); + return -ENODEV; + } + + pr_info("PID controller driver initializing.\n"); + + all_cpu_data = vzalloc(sizeof(void *) * num_possible_cpus()); + if (!all_cpu_data) + return -ENOMEM; + + rc = cpufreq_register_driver(&pid_ctrl_driver); + if (rc) + goto out; + + pid_ctrl_debug_expose_params(); + pid_ctrl_sysfs_expose_params(); + + return rc; +out: + get_online_cpus(); + for_each_online_cpu(cpu) { + if (all_cpu_data[cpu]) { + del_timer_sync(&all_cpu_data[cpu]->timer); + kfree(all_cpu_data[cpu]); + } + } + + put_online_cpus(); + vfree(all_cpu_data); + return -ENODEV; +} +late_initcall(pid_ctrl_init); + diff --git a/drivers/cpufreq/pid_ctrl.h b/drivers/cpufreq/pid_ctrl.h new file mode 100644 index 0000000..ab56415 --- /dev/null +++ b/drivers/cpufreq/pid_ctrl.h @@ -0,0 +1,120 @@ +/* + * pid_ctrl.h: Native P state management for Intel processors + * + * (C) Copyright 2012 Intel Corporation + * Author: Dirk Brandewie dirk.j.brandewie@intel.com + * + * (C) Copyright 2014 Linaro Ltd. + * Author: Ashwin Chaugule ashwin.chaugule@linaro.org + * - Restructured intel_pstate.c into a generic PID controller + * governor and separate backend platform specific driver. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#ifndef __PID_CTRL_H_ +#define __PID_CTRL_H_ + +#define FRAC_BITS 8 +#define int_tofp(X) ((int64_t)(X) << FRAC_BITS) +#define fp_toint(X) ((X) >> FRAC_BITS) + +struct sample { + int32_t core_pct_busy; + u64 aperf; + u64 mperf; + int freq; + ktime_t time; +}; + +struct vid_data { + int min; + int max; + int turbo; + int32_t ratio; +}; + +struct _pid { + int setpoint; + int32_t integral; + int32_t p_gain; + int32_t i_gain; + int32_t d_gain; + int deadband; + int32_t last_err; +}; + +struct pstate_data { + int current_pstate; + int min_pstate; + int max_pstate; + int turbo_pstate; +}; + +struct pstate_adjust_policy { + int sample_rate_ms; + int deadband; + int setpoint; + int p_gain_pct; + int d_gain_pct; + int i_gain_pct; +}; + +struct cpudata { + int cpu; + + struct timer_list timer; + + struct pstate_data pstate; + struct vid_data vid; + struct _pid pid; + + ktime_t last_sample_time; + u64 prev_aperf; + u64 prev_mperf; + struct sample sample; +}; + +struct pstate_funcs { + int (*get_max)(void); + int (*get_min)(void); + int (*get_turbo)(void); + void (*set)(struct cpudata*, int pstate); + void (*get_vid)(struct cpudata *); +}; + +struct cpu_defaults { + struct pstate_adjust_policy pid_policy; + struct pstate_funcs funcs; +}; + +struct perf_limits { + int no_turbo; + int turbo_disabled; + int max_perf_pct; + int min_perf_pct; + int32_t max_perf; + int32_t min_perf; + int max_policy_pct; + int max_sysfs_pct; +}; + +extern void register_pid_params(struct pstate_adjust_policy *); +extern void register_cpu_funcs(struct pstate_funcs *); + +extern struct perf_limits limits; + +static inline int32_t mul_fp(int32_t x, int32_t y) +{ + return ((int64_t)x * (int64_t)y) >> FRAC_BITS; +} + +static inline int32_t div_fp(int32_t x, int32_t y) +{ + return div_s64((int64_t)x << FRAC_BITS, (int64_t)y); +} + +#endif /* __PID_CTRL_H_ */
Move X86 specific turbo detection logic into platform specific PID backend driver.
Signed-off-by: Ashwin Chaugule ashwin.chaugule@linaro.org --- drivers/cpufreq/intel_pid_ctrl.c | 15 ++++++++++++++- drivers/cpufreq/pid_ctrl.c | 13 +++++++------ 2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/intel_pid_ctrl.c b/drivers/cpufreq/intel_pid_ctrl.c index ebab074..9ad7d5e 100644 --- a/drivers/cpufreq/intel_pid_ctrl.c +++ b/drivers/cpufreq/intel_pid_ctrl.c @@ -58,6 +58,11 @@ static int byt_get_max_pstate(void) static int byt_get_turbo_pstate(void) { u64 value; + u64 misc_en; + + rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); + if (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE) + return byt_get_max_pstate();
rdmsrl(BYT_TURBO_RATIOS, value); return value & 0x7F; @@ -124,12 +129,20 @@ static int core_get_turbo_pstate(void) { u64 value; int nont, ret; + u64 misc_en;
- rdmsrl(MSR_NHM_TURBO_RATIO_LIMIT, value); nont = core_get_max_pstate(); + rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); + + if (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE) + return nont; + + rdmsrl(MSR_NHM_TURBO_RATIO_LIMIT, value); ret = ((value) & 255); + if (ret <= nont) ret = nont; + return ret; }
diff --git a/drivers/cpufreq/pid_ctrl.c b/drivers/cpufreq/pid_ctrl.c index b273ce1..516b95f 100644 --- a/drivers/cpufreq/pid_ctrl.c +++ b/drivers/cpufreq/pid_ctrl.c @@ -313,10 +313,15 @@ static void pid_ctrl_get_cpu_pstates(struct cpudata *cpu) { cpu->pstate.min_pstate = pstate_funcs.get_min(); cpu->pstate.max_pstate = pstate_funcs.get_max(); - cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); + + if (pstate_funcs.get_turbo) + cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); + else + cpu->pstate.turbo_pstate = cpu->pstate.max_pstate;
if (pstate_funcs.get_vid) pstate_funcs.get_vid(cpu); + pid_ctrl_set_pstate(cpu, cpu->pstate.min_pstate); }
@@ -532,7 +537,6 @@ static int pid_ctrl_cpu_init(struct cpufreq_policy *policy) { struct cpudata *cpu; int rc; - u64 misc_en;
rc = pid_ctrl_init_cpu(policy->cpu); if (rc) @@ -540,9 +544,7 @@ static int pid_ctrl_cpu_init(struct cpufreq_policy *policy)
cpu = all_cpu_data[policy->cpu];
- rdmsrl(MSR_IA32_MISC_ENABLE, misc_en); - if (misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE || - cpu->pstate.max_pstate == cpu->pstate.turbo_pstate) { + if (cpu->pstate.max_pstate == cpu->pstate.turbo_pstate) { limits.turbo_disabled = 1; limits.no_turbo = 1; } @@ -598,7 +600,6 @@ static int __init pid_ctrl_init(void) { int cpu, rc = 0;
- if (!pstate_funcs.get_max || !pstate_funcs.get_min || !pstate_funcs.set ||
The Baytrail series uses additional information while setting a target CPU performance value. To keep the PID governor generic, move this out into the platform specific backend driver.
Signed-off-by: Ashwin Chaugule ashwin.chaugule@linaro.org --- drivers/cpufreq/intel_pid_ctrl.c | 57 ++++++++++++++++++++++++---------------- drivers/cpufreq/pid_ctrl.c | 4 --- drivers/cpufreq/pid_ctrl.h | 9 ------- 3 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/drivers/cpufreq/intel_pid_ctrl.c b/drivers/cpufreq/intel_pid_ctrl.c index 9ad7d5e..a858981 100644 --- a/drivers/cpufreq/intel_pid_ctrl.c +++ b/drivers/cpufreq/intel_pid_ctrl.c @@ -29,6 +29,15 @@ #define BYT_TURBO_RATIOS 0x66c #define BYT_TURBO_VIDS 0x66d
+struct vid_data { + int min; + int max; + int turbo; + int32_t ratio; +}; + +static struct vid_data vid_data; + struct perf_limits limits = { .no_turbo = 0, .max_perf_pct = 100, @@ -39,6 +48,21 @@ struct perf_limits limits = { .max_sysfs_pct = 100, };
+static void byt_get_vid(int max, int min) +{ + u64 value; + + rdmsrl(BYT_VIDS, value); + vid_data.min = int_tofp((value >> 8) & 0x7f); + vid_data.max = int_tofp((value >> 16) & 0x7f); + vid_data.ratio = div_fp( + vid_data.max - vid_data.min, + int_tofp(max - min)); + + rdmsrl(BYT_TURBO_VIDS, value); + vid_data.turbo = value & 0x7f; +} + static int byt_get_min_pstate(void) { u64 value; @@ -50,9 +74,15 @@ static int byt_get_min_pstate(void) static int byt_get_max_pstate(void) { u64 value; + int max, min;
rdmsrl(BYT_RATIOS, value); - return (value >> 16) & 0x7F; + max = (value >> 16) & 0x7F; + min = byt_get_min_pstate(); + + byt_get_vid(max, min); + + return max; }
static int byt_get_turbo_pstate(void) @@ -78,37 +108,21 @@ static void byt_set_pstate(struct cpudata *cpudata, int pstate) if (limits.no_turbo && !limits.turbo_disabled) val |= (u64)1 << 32;
- vid_fp = cpudata->vid.min + mul_fp( + vid_fp = vid_data.min + mul_fp( int_tofp(pstate - cpudata->pstate.min_pstate), - cpudata->vid.ratio); + vid_data.ratio);
- vid_fp = clamp_t(int32_t, vid_fp, cpudata->vid.min, cpudata->vid.max); + vid_fp = clamp_t(int32_t, vid_fp, vid_data.min, vid_data.max); vid = fp_toint(vid_fp);
if (pstate > cpudata->pstate.max_pstate) - vid = cpudata->vid.turbo; + vid = vid_data.turbo;
val |= vid;
wrmsrl(MSR_IA32_PERF_CTL, val); }
-static void byt_get_vid(struct cpudata *cpudata) -{ - u64 value; - - rdmsrl(BYT_VIDS, value); - cpudata->vid.min = int_tofp((value >> 8) & 0x7f); - cpudata->vid.max = int_tofp((value >> 16) & 0x7f); - cpudata->vid.ratio = div_fp( - cpudata->vid.max - cpudata->vid.min, - int_tofp(cpudata->pstate.max_pstate - - cpudata->pstate.min_pstate)); - - rdmsrl(BYT_TURBO_VIDS, value); - cpudata->vid.turbo = value & 0x7f; -} - static int core_get_min_pstate(void) { u64 value; @@ -188,7 +202,6 @@ static struct cpu_defaults byt_params = { .get_min = byt_get_min_pstate, .get_turbo = byt_get_turbo_pstate, .set = byt_set_pstate, - .get_vid = byt_get_vid, }, };
diff --git a/drivers/cpufreq/pid_ctrl.c b/drivers/cpufreq/pid_ctrl.c index 516b95f..8eb9739 100644 --- a/drivers/cpufreq/pid_ctrl.c +++ b/drivers/cpufreq/pid_ctrl.c @@ -319,9 +319,6 @@ static void pid_ctrl_get_cpu_pstates(struct cpudata *cpu) else cpu->pstate.turbo_pstate = cpu->pstate.max_pstate;
- if (pstate_funcs.get_vid) - pstate_funcs.get_vid(cpu); - pid_ctrl_set_pstate(cpu, cpu->pstate.min_pstate); }
@@ -592,7 +589,6 @@ void register_cpu_funcs(struct pstate_funcs *funcs) pstate_funcs.get_min = funcs->get_min; pstate_funcs.get_turbo = funcs->get_turbo; pstate_funcs.set = funcs->set; - pstate_funcs.get_vid = funcs->get_vid; } EXPORT_SYMBOL_GPL(register_cpu_funcs);
diff --git a/drivers/cpufreq/pid_ctrl.h b/drivers/cpufreq/pid_ctrl.h index ab56415..40b352a 100644 --- a/drivers/cpufreq/pid_ctrl.h +++ b/drivers/cpufreq/pid_ctrl.h @@ -30,13 +30,6 @@ struct sample { ktime_t time; };
-struct vid_data { - int min; - int max; - int turbo; - int32_t ratio; -}; - struct _pid { int setpoint; int32_t integral; @@ -69,7 +62,6 @@ struct cpudata { struct timer_list timer;
struct pstate_data pstate; - struct vid_data vid; struct _pid pid;
ktime_t last_sample_time; @@ -83,7 +75,6 @@ struct pstate_funcs { int (*get_min)(void); int (*get_turbo)(void); void (*set)(struct cpudata*, int pstate); - void (*get_vid)(struct cpudata *); };
struct cpu_defaults {
There are cases where it is more efficient to read multiple counters or registers at once. To enable such reads, add new function pointers for the backend drivers. This is especially useful when the registers are memory mapped (e.g. in PCC) and the platform is signalled via a Doorbell to update multiple registers at once. In contrast, we'd have to ring the Doorbell for every register Read.
Signed-off-by: Ashwin Chaugule ashwin.chaugule@linaro.org --- drivers/cpufreq/intel_pid_ctrl.c | 49 ++++++++++++++++++++++++++++++++++------ drivers/cpufreq/pid_ctrl.c | 34 ++++++---------------------- drivers/cpufreq/pid_ctrl.h | 2 ++ 3 files changed, 51 insertions(+), 34 deletions(-)
diff --git a/drivers/cpufreq/intel_pid_ctrl.c b/drivers/cpufreq/intel_pid_ctrl.c index a858981..73faaf8 100644 --- a/drivers/cpufreq/intel_pid_ctrl.c +++ b/drivers/cpufreq/intel_pid_ctrl.c @@ -37,6 +37,7 @@ struct vid_data { };
static struct vid_data vid_data; +static struct cpu_defaults *cpuinfo;
struct perf_limits limits = { .no_turbo = 0, @@ -171,6 +172,37 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate) wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); }
+static void intel_get_pstates(struct cpudata *cpu) +{ + cpu->pstate.min_pstate = cpuinfo->funcs.get_min(); + cpu->pstate.max_pstate = cpuinfo->funcs.get_max(); + + if (cpuinfo->funcs.get_turbo) + cpu->pstate.turbo_pstate = cpuinfo->funcs.get_turbo(); + else + cpu->pstate.turbo_pstate = cpu->pstate.max_pstate; +} + +static void intel_get_sample(struct cpudata *cpu) +{ + u64 aperf, mperf; + + rdmsrl(MSR_IA32_APERF, aperf); + rdmsrl(MSR_IA32_MPERF, mperf); + + aperf = aperf >> FRAC_BITS; + mperf = mperf >> FRAC_BITS; + + cpu->sample.aperf = aperf; + cpu->sample.mperf = mperf; + + cpu->sample.aperf -= cpu->prev_aperf; + cpu->sample.mperf -= cpu->prev_mperf; + + cpu->prev_aperf = aperf; + cpu->prev_mperf = mperf; +} + static struct cpu_defaults core_params = { .pid_policy = { .sample_rate_ms = 10, @@ -181,9 +213,11 @@ static struct cpu_defaults core_params = { .i_gain_pct = 0, }, .funcs = { + .get_turbo = core_get_turbo_pstate, + .get_sample = intel_get_sample, + .get_pstates = intel_get_pstates, .get_max = core_get_max_pstate, .get_min = core_get_min_pstate, - .get_turbo = core_get_turbo_pstate, .set = core_set_pstate, }, }; @@ -198,9 +232,11 @@ static struct cpu_defaults byt_params = { .i_gain_pct = 4, }, .funcs = { + .get_turbo = byt_get_turbo_pstate, + .get_sample = intel_get_sample, + .get_pstates = intel_get_pstates, .get_max = byt_get_max_pstate, .get_min = byt_get_min_pstate, - .get_turbo = byt_get_turbo_pstate, .set = byt_set_pstate, }, }; @@ -327,7 +363,6 @@ static inline bool intel_pid_ctrl_platform_pwr_mgmt_exists(void) static int __init intel_pid_ctrl_init(void) { const struct x86_cpu_id *id; - struct cpu_defaults *cpu_info;
if (no_load) return -ENODEV; @@ -343,15 +378,15 @@ static int __init intel_pid_ctrl_init(void) if (intel_pid_ctrl_platform_pwr_mgmt_exists()) return -ENODEV;
- cpu_info = (struct cpu_defaults *)id->driver_data; + cpuinfo = (struct cpu_defaults *)id->driver_data;
- if (intel_pid_ctrl_msrs_not_valid(cpu_info)) + if (intel_pid_ctrl_msrs_not_valid(cpuinfo)) return -ENODEV;
pr_info("Intel PID controller driver initializing.\n");
- register_pid_params(&cpu_info->pid_policy); - register_cpu_funcs(&cpu_info->funcs); + register_pid_params(&cpuinfo->pid_policy); + register_cpu_funcs(&cpuinfo->funcs);
return 0; } diff --git a/drivers/cpufreq/pid_ctrl.c b/drivers/cpufreq/pid_ctrl.c index 8eb9739..a011f05 100644 --- a/drivers/cpufreq/pid_ctrl.c +++ b/drivers/cpufreq/pid_ctrl.c @@ -311,14 +311,7 @@ static inline void pid_ctrl_pstate_decrease(struct cpudata *cpu, int steps)
static void pid_ctrl_get_cpu_pstates(struct cpudata *cpu) { - cpu->pstate.min_pstate = pstate_funcs.get_min(); - cpu->pstate.max_pstate = pstate_funcs.get_max(); - - if (pstate_funcs.get_turbo) - cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); - else - cpu->pstate.turbo_pstate = cpu->pstate.max_pstate; - + pstate_funcs.get_pstates(cpu); pid_ctrl_set_pstate(cpu, cpu->pstate.min_pstate); }
@@ -342,25 +335,12 @@ static inline void pid_ctrl_calc_busy(struct cpudata *cpu)
static inline void pid_ctrl_sample(struct cpudata *cpu) { - u64 aperf, mperf; - - rdmsrl(MSR_IA32_APERF, aperf); - rdmsrl(MSR_IA32_MPERF, mperf); - - aperf = aperf >> FRAC_BITS; - mperf = mperf >> FRAC_BITS; - cpu->last_sample_time = cpu->sample.time; cpu->sample.time = ktime_get(); - cpu->sample.aperf = aperf; - cpu->sample.mperf = mperf; - cpu->sample.aperf -= cpu->prev_aperf; - cpu->sample.mperf -= cpu->prev_mperf;
- pid_ctrl_calc_busy(cpu); + pstate_funcs.get_sample(cpu);
- cpu->prev_aperf = aperf; - cpu->prev_mperf = mperf; + pid_ctrl_calc_busy(cpu); }
static inline void pid_ctrl_set_sample_time(struct cpudata *cpu) @@ -585,8 +565,8 @@ EXPORT_SYMBOL_GPL(register_pid_params);
void register_cpu_funcs(struct pstate_funcs *funcs) { - pstate_funcs.get_max = funcs->get_max; - pstate_funcs.get_min = funcs->get_min; + pstate_funcs.get_pstates = funcs->get_pstates; + pstate_funcs.get_sample = funcs->get_sample; pstate_funcs.get_turbo = funcs->get_turbo; pstate_funcs.set = funcs->set; } @@ -596,8 +576,8 @@ static int __init pid_ctrl_init(void) { int cpu, rc = 0;
- if (!pstate_funcs.get_max || - !pstate_funcs.get_min || + if (!pstate_funcs.get_pstates || + !pstate_funcs.get_sample || !pstate_funcs.set || !pid_params.sample_rate_ms) { pr_err("Err registering pstate func accessors\n"); diff --git a/drivers/cpufreq/pid_ctrl.h b/drivers/cpufreq/pid_ctrl.h index 40b352a..7f732e6 100644 --- a/drivers/cpufreq/pid_ctrl.h +++ b/drivers/cpufreq/pid_ctrl.h @@ -73,6 +73,8 @@ struct cpudata { struct pstate_funcs { int (*get_max)(void); int (*get_min)(void); + void (*get_pstates)(struct cpudata *); + void (*get_sample)(struct cpudata *); int (*get_turbo)(void); void (*set)(struct cpudata*, int pstate); };
Aperf/Mperf are very X86 specific names but effectively count Delivered and Reference Cpu performance values.
Signed-off-by: Ashwin Chaugule ashwin.chaugule@linaro.org --- drivers/cpufreq/intel_pid_ctrl.c | 12 ++++++------ drivers/cpufreq/pid_ctrl.c | 10 +++++----- drivers/cpufreq/pid_ctrl.h | 8 ++++---- 3 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/cpufreq/intel_pid_ctrl.c b/drivers/cpufreq/intel_pid_ctrl.c index 73faaf8..e0b007d 100644 --- a/drivers/cpufreq/intel_pid_ctrl.c +++ b/drivers/cpufreq/intel_pid_ctrl.c @@ -193,14 +193,14 @@ static void intel_get_sample(struct cpudata *cpu) aperf = aperf >> FRAC_BITS; mperf = mperf >> FRAC_BITS;
- cpu->sample.aperf = aperf; - cpu->sample.mperf = mperf; + cpu->sample.delivered = aperf; + cpu->sample.reference = mperf;
- cpu->sample.aperf -= cpu->prev_aperf; - cpu->sample.mperf -= cpu->prev_mperf; + cpu->sample.delivered -= cpu->prev_delivered; + cpu->sample.reference -= cpu->prev_reference;
- cpu->prev_aperf = aperf; - cpu->prev_mperf = mperf; + cpu->prev_delivered = aperf; + cpu->prev_reference = mperf; }
static struct cpu_defaults core_params = { diff --git a/drivers/cpufreq/pid_ctrl.c b/drivers/cpufreq/pid_ctrl.c index a011f05..2c197b2 100644 --- a/drivers/cpufreq/pid_ctrl.c +++ b/drivers/cpufreq/pid_ctrl.c @@ -321,10 +321,10 @@ static inline void pid_ctrl_calc_busy(struct cpudata *cpu) int64_t core_pct; int32_t rem;
- core_pct = int_tofp(sample->aperf) * int_tofp(100); - core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem); + core_pct = int_tofp(sample->delivered) * int_tofp(100); + core_pct = div_u64_rem(core_pct, int_tofp(sample->reference), &rem);
- if ((rem << 1) >= int_tofp(sample->mperf)) + if ((rem << 1) >= int_tofp(sample->reference)) core_pct += 1;
sample->freq = fp_toint( @@ -410,8 +410,8 @@ static void pid_ctrl_timer_func(unsigned long __data) trace_pstate_sample(fp_toint(sample->core_pct_busy), fp_toint(pid_ctrl_get_scaled_busy(cpu)), cpu->pstate.current_pstate, - sample->mperf, - sample->aperf, + sample->reference, + sample->delivered, sample->freq);
pid_ctrl_set_sample_time(cpu); diff --git a/drivers/cpufreq/pid_ctrl.h b/drivers/cpufreq/pid_ctrl.h index 7f732e6..65f08bc 100644 --- a/drivers/cpufreq/pid_ctrl.h +++ b/drivers/cpufreq/pid_ctrl.h @@ -24,8 +24,8 @@
struct sample { int32_t core_pct_busy; - u64 aperf; - u64 mperf; + u64 delivered; + u64 reference; int freq; ktime_t time; }; @@ -65,8 +65,8 @@ struct cpudata { struct _pid pid;
ktime_t last_sample_time; - u64 prev_aperf; - u64 prev_mperf; + u64 prev_delivered; + u64 prev_reference; struct sample sample; };
CPPC (Collaborative Processor Performance Control) is defined in the ACPI 5.0+ spec. It is a method for controlling CPU performance on a continuous scale using performance feedback registers. The PID governor concepts of CPU performance management map cleanly onto CPPC. This patch implements the PID backend interfaces using CPPC semantics.
Signed-off-by: Ashwin Chaugule ashwin.chaugule@linaro.org --- drivers/cpufreq/Kconfig | 10 + drivers/cpufreq/Makefile | 1 + drivers/cpufreq/cppc_pid_ctrl.c | 406 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 417 insertions(+) create mode 100644 drivers/cpufreq/cppc_pid_ctrl.c
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index bbc19ac..90b71d3 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -205,6 +205,16 @@ config PID_CTRL governor requires platform specific backend drivers to access counters. See Documentation/cpu-freq/pid_ctrl.txt
+config CPPC_PID_CTRL + bool "PID CPPC backend driver" + depends on ACPI_PCC && PID_CTRL + help + CPPC is Collaborative Processor Performance Control. It allows the OS + to request CPU performance with an abstract metric and lets the platform + (e.g. BMC) interpret and optimize it for power and performance in a + platform specific manner. This driver implements the backend interfaces + using CPPC semantics for the PID governor. + menu "x86 CPU frequency scaling drivers" depends on X86 source "drivers/cpufreq/Kconfig.x86" diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 6d1a4d0..0778013 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -41,6 +41,7 @@ obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o obj-$(CONFIG_PID_CTRL) += pid_ctrl.o obj-$(CONFIG_X86_INTEL_PSTATE) += intel_pid_ctrl.o +obj-$(CONFIG_CPPC_PID_CTRL) += cppc_pid_ctrl.o obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY) += amd_freq_sensitivity.o
################################################################################## diff --git a/drivers/cpufreq/cppc_pid_ctrl.c b/drivers/cpufreq/cppc_pid_ctrl.c new file mode 100644 index 0000000..53ea5e0 --- /dev/null +++ b/drivers/cpufreq/cppc_pid_ctrl.c @@ -0,0 +1,406 @@ +/* + * Copyright (C) 2014 Linaro Ltd. + * Author: Ashwin Chaugule ashwin.chaugule@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * PID algo bits are from intel_pstate.c and modified to use CPPC + * accessors. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/cpu.h> +#include <linux/types.h> +#include <linux/acpi.h> +#include <linux/errno.h> + +#include <acpi/processor.h> +#include <acpi/actypes.h> + +#include "pid_ctrl.h" + +#define CPPC_EN 1 +#define PCC_CMD_COMPLETE 1 +#define MAX_CPC_REG_ENT 19 + +static u64 pcc_comm_base_addr; +static void __iomem *comm_base_addr; +static s8 pcc_subspace_idx = -1; +extern int get_pcc_comm_channel(u32 ss_idx, u64* addr, int *len); +extern u16 send_pcc_cmd(u8 cmd, u8 sci, u32 ss_idx, u64 * __iomem base_addr); + +/* PCC Commands used by CPPC */ +enum cppc_ppc_cmds { + PCC_CMD_READ, + PCC_CMD_WRITE, + RESERVED, +}; + +/* These are indexes into the per-cpu cpc_regs[]. Order is important. */ +enum cppc_pcc_regs { + HIGHEST_PERF, /* Highest Performance */ + NOMINAL_PERF, /* Nominal Performance */ + LOW_NON_LINEAR_PERF, /* Lowest Nonlinear Performance */ + LOWEST_PERF, /* Lowest Performance */ + GUARANTEED_PERF, /* Guaranteed Performance Register */ + DESIRED_PERF, /* Desired Performance Register */ + MIN_PERF, /* Minimum Performance Register */ + MAX_PERF, /* Maximum Performance Register */ + PERF_REDUC_TOLERANCE, /* Performance Reduction Tolerance Register */ + TIME_WINDOW, /* Time Window Register */ + CTR_WRAP_TIME, /* Counter Wraparound Time */ + REFERENCE_CTR, /* Reference Counter Register */ + DELIVERED_CTR, /* Delivered Counter Register */ + PERF_LIMITED, /* Performance Limited Register */ + ENABLE, /* Enable Register */ + AUTO_SEL_ENABLE, /* Autonomous Selection Enable */ + AUTO_ACT_WINDOW, /* Autonomous Activity Window */ + ENERGY_PERF, /* Energy Performance Preference Register */ + REFERENCE_PERF, /* Reference Performance */ +}; + +/* Each register in the CPC table has the following format */ +static struct cpc_register_resource { + u8 descriptor; + u16 length; + u8 space_id; + u8 bit_width; + u8 bit_offset; + u8 access_width; + u64 __iomem address; +} __attribute__ ((packed)); + +static struct cpc_desc { + unsigned int num_entries; + unsigned int version; + struct cpc_register_resource cpc_regs[MAX_CPC_REG_ENT]; +}; +static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr); + +struct perf_limits limits = { + .no_turbo = 0, + .max_perf_pct = 100, + .max_perf = int_tofp(1), + .min_perf_pct = 0, + .min_perf = 0, + .max_policy_pct = 100, + .max_sysfs_pct = 100, +}; + +u64 cpc_read64(struct cpc_register_resource *reg, void __iomem *base_addr) +{ + u64 err = 0; + u64 val; + + switch (reg->space_id) { + case ACPI_ADR_SPACE_PLATFORM_COMM: + err = readq((void *) (reg->address + *(u64 *)base_addr)); + break; + case ACPI_ADR_SPACE_FIXED_HARDWARE: + rdmsrl(reg->address, val); + return val; + break; + default: + pr_err("unknown space_id detected in cpc reg: %d\n", reg->space_id); + break; + } + + return err; +} + +int cpc_write64(u64 val, struct cpc_register_resource *reg, void __iomem *base_addr) +{ + unsigned int err = 0; + + switch (reg->space_id) { + case ACPI_ADR_SPACE_PLATFORM_COMM: + writeq(val, (void *)(reg->address + *(u64 *)base_addr)); + break; + case ACPI_ADR_SPACE_FIXED_HARDWARE: + wrmsrl(reg->address, val); + break; + default: + pr_err("unknown space_id detected in cpc reg: %d\n", reg->space_id); + break; + } + + return err; +} + +static int cppc_processor_probe(void) +{ + struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL}; + union acpi_object *out_obj, *cpc_obj; + struct cpc_desc *current_cpu_cpc; + struct cpc_register_resource *gas_t; + char proc_name[11]; + unsigned int num_ent, ret = 0, i, cpu, len; + acpi_handle handle; + acpi_status status; + + /*Parse the ACPI _CPC table for each CPU. */ + for_each_possible_cpu(cpu) { + sprintf(proc_name, "\_PR.CPU%d", cpu); + + status = acpi_get_handle(NULL, proc_name, &handle); + if (ACPI_FAILURE(status)) { + ret = -ENODEV; + goto out_free; + } + + if (!acpi_has_method(handle, "_CPC")) { + ret = -ENODEV; + goto out_free; + } + + status = acpi_evaluate_object(handle, "_CPC", NULL, &output); + if (ACPI_FAILURE(status)) { + ret = -ENODEV; + goto out_free; + } + + out_obj = (union acpi_object *) output.pointer; + if (out_obj->type != ACPI_TYPE_PACKAGE) { + ret = -ENODEV; + goto out_free; + } + + current_cpu_cpc = kzalloc(sizeof(struct cpc_desc), GFP_KERNEL); + if (!current_cpu_cpc) { + pr_err("Could not allocate per cpu CPC descriptors\n"); + return -ENOMEM; + } + num_ent = out_obj->package.count; + current_cpu_cpc->num_entries = num_ent; + + pr_debug("num_ent in CPC table:%d\n", num_ent); + + /* Iterate through each entry in _CPC */ + for (i = 2; i < num_ent; i++) { + cpc_obj = &out_obj->package.elements[i]; + + if (cpc_obj->type != ACPI_TYPE_BUFFER) { + pr_err("Malformed PCC entry in CPC table\n"); + ret = -EINVAL; + goto out_free; + } + + gas_t = (struct cpc_register_resource *) cpc_obj->buffer.pointer; + + if (gas_t->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) { + if (pcc_subspace_idx < 0) + pcc_subspace_idx = gas_t->access_width; + } + + current_cpu_cpc->cpc_regs[i-2] = (struct cpc_register_resource) { + .space_id = gas_t->space_id, + .length = gas_t->length, + .bit_width = gas_t->bit_width, + .bit_offset = gas_t->bit_offset, + .address = gas_t->address, + .access_width = gas_t->access_width, + }; + } + per_cpu(cpc_desc_ptr, cpu) = current_cpu_cpc; + } + + pr_debug("Completed parsing , now onto PCC init\n"); + + if (pcc_subspace_idx >= 0) { + ret = get_pcc_comm_channel(pcc_subspace_idx, &pcc_comm_base_addr, &len); + if (ret) { + pr_err("No PCC Communication Channel found\n"); + ret = -ENODEV; + goto out_free; + } + + //XXX: PCC HACK: The PCC hack in drivers/acpi/pcc.c just + //returns a kmallocd address, so no point in ioremapping + //it here. Instead we'll just use it directly. + //Normally, we'd ioremap the address specified in the PCCT + //header for this PCC subspace. + + comm_base_addr = &pcc_comm_base_addr; + + // comm_base_addr = ioremap_nocache(pcc_comm_base_addr, len); + + // if (!comm_base_addr) { + // pr_err("ioremapping pcc comm space failed\n"); + // ret = -ENOMEM; + // goto out_free; + // } + pr_debug("PCC ioremapd space:%p, PCCT addr: %lld\n", comm_base_addr, pcc_comm_base_addr); + + } else { + pr_err("No PCC subspace detected in any CPC structure!\n"); + ret = -EINVAL; + goto out_free; + } + + /* Everything looks okay */ + pr_info("Successfully parsed all CPC structs\n"); + pr_debug("Enable CPPC_EN\n"); + /*XXX: Send write cmd to enable CPPC */ + + kfree(output.pointer); + return 0; + +out_free: + for_each_online_cpu(cpu) { + current_cpu_cpc = per_cpu(cpc_desc_ptr, cpu); + if (current_cpu_cpc) + kfree(current_cpu_cpc); + } + + kfree(output.pointer); + return -ENODEV; +} + +static void cppc_get_pstates(struct cpudata *cpu) +{ + unsigned int cpunum = cpu->cpu; + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum); + struct cpc_register_resource *highest_reg, *lowest_reg; + int status; + + if (!cpc_desc) { + pr_err("No CPC descriptor for CPU:%d\n", cpunum); + return; + } + + pr_debug("Sending PCC READ to update COMM space\n"); + status = send_pcc_cmd(PCC_CMD_READ, 0, pcc_subspace_idx, + comm_base_addr); + + if (!(status & PCC_CMD_COMPLETE)) { + pr_err("Err updating PCC comm space\n"); + return; + } + + highest_reg = &cpc_desc->cpc_regs[HIGHEST_PERF]; + lowest_reg = &cpc_desc->cpc_regs[LOWEST_PERF]; + + cpu->pstate.max_pstate = cpc_read64(highest_reg, comm_base_addr); + cpu->pstate.min_pstate = cpc_read64(lowest_reg, comm_base_addr); + + if (!cpu->pstate.max_pstate || !cpu->pstate.min_pstate) { + pr_err("Err reading CPU performance limits\n"); + return; + } + + cpu->pstate.turbo_pstate = cpu->pstate.max_pstate; +} + +static void cppc_get_sample(struct cpudata *cpu) +{ + unsigned int cpunum = cpu->cpu; + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum); + struct cpc_register_resource *delivered_reg, *reference_reg; + int status; + u64 delivered, reference; + + if (!cpc_desc) { + pr_err("No CPC descriptor for CPU:%d\n", cpunum); + return; + } + + pr_debug("Sending PCC READ to update COMM space\n"); + status = send_pcc_cmd(PCC_CMD_READ, 0, pcc_subspace_idx, + comm_base_addr); + + if (!(status & PCC_CMD_COMPLETE)) { + pr_err("Err updating PCC comm space\n"); + return; + } + + delivered_reg = &cpc_desc->cpc_regs[DELIVERED_CTR]; + reference_reg = &cpc_desc->cpc_regs[REFERENCE_CTR]; + + delivered = cpc_read64(delivered_reg, comm_base_addr); + reference = cpc_read64(reference_reg, comm_base_addr); + + if (!delivered || !reference) { + pr_err("Err reading CPU counters\n"); + return; + } + + delivered = delivered >> FRAC_BITS; + reference = reference >> FRAC_BITS; + + cpu->sample.delivered = delivered; + cpu->sample.reference = reference; + + cpu->sample.delivered -= cpu->prev_delivered; + cpu->sample.reference -= cpu->prev_reference; + + cpu->prev_delivered = delivered; + cpu->prev_reference = reference; +} + +static void cppc_set_pstate(struct cpudata *cpudata, int pstate) +{ + unsigned int cpu = cpudata->cpu; + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu); + struct cpc_register_resource *desired_reg; + int status; + + if (!cpc_desc) { + pr_err("No CPC descriptor for CPU:%d\n", cpu); + return; + } + + desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF]; + cpc_write64(pstate, desired_reg, comm_base_addr); + + pr_debug("Sending PCC WRITE to update COMM space\n"); + status = send_pcc_cmd(PCC_CMD_WRITE, 0, pcc_subspace_idx, + comm_base_addr); + + if (!(status & PCC_CMD_COMPLETE)) { + pr_err("Err updating PCC comm space\n"); + return; + } +} + +static struct cpu_defaults cppc_params = { + .pid_policy = { + .sample_rate_ms = 10, + .deadband = 0, + .setpoint = 97, + .p_gain_pct = 14, + .d_gain_pct = 0, + .i_gain_pct = 4, + }, + .funcs = { + .get_sample = cppc_get_sample, + .get_pstates = cppc_get_pstates, + .set = cppc_set_pstate, + }, +}; + +static int __init cppc_init(void) +{ + if(acpi_disabled || cppc_processor_probe()) { + pr_err("Err initializing CPC structures or ACPI is disabled\n"); + return -ENODEV; + } + + pr_info("CPPC PID driver initializing.\n"); + + register_pid_params(&cppc_params.pid_policy); + register_cpu_funcs(&cppc_params.funcs); + + return 0; +} +device_initcall(cppc_init); +
Hi Ashwin,
I think the CPPC based driver should be a separate driver.
We made the conscious decision to not use any of the ACPI mechanisms to enumerate or control P state selection. Experience over the years has shown that the quality/accuracy of the BIOS/ACPI implementations vary widely across OEM's and platform types from a single OEM. Features that always work on a server platform from a given OEM may not work or provide bad information on client platforms for example.
Another reason for doing intel_pstate was to be able to land intel specific features and fixes without breaking other architectures as the power management capabilities of the platform evolve. As processors that support Hardware P states (HWP) as described in section 14.4 of the current SDM come into the market intel_pstate will change to not doing much other than enabling HWP and providing an interface to forward user configuration requests to the processor if the user chooses to enable HWP otherwise the current mechanisms will be used. This is why the intel_pstate sysfs interface is the way it is to be able to map cleanly to HWP and provide an abstract interface going forward.
Having separate drivers allows the system integrator/user to select the most appropriate mechanism for their system.
--Dirk
On 09/09/2014 03:12 PM, Ashwin Chaugule wrote:
This patchset introduces CPPC(Collaborative Processor Performance Control) as a backend to the PID governor. The PID governor from intel_pstate.c maps cleanly onto some CPPC interfaces. e.g. The CPU performance requests are made on a continuous scale as against discrete pstate levels. The CPU performance feedback over an interval is gauged using platform specific counters which are also described by CPPC.
Although CPPC describes several other registers to provide more hints to the platform, Linux as of today does not have the infrastructure to make use of those registers. Some of the CPPC specific information could be made available from the scheduler as part of the CPUfreq and Scheduler intergration work. Until then PID can be used as the front end for CPPC.
Beyond code restructuring and renaming, this patchset does not change the logic from the intel_pstate.c driver. Kernel compilation times were compared with the original intel_pstate.c, intel backend(intel_pid_ctrl.c) and the CPPC backend and no significant overheads were noticed.
Testing was performed on a Thinkpad X240 laptop.
PID_CTRL + INTEL_PSTATE:
real 5m37.742s user 18m42.575s sys 1m0.521s
PID_CTRL + CPPC_PID_CTRL:
real 5m48.321s user 18m24.487s sys 0m59.327s
ORIGINAL INTEL_PSTATE:
real 5m40.642s user 18m37.411s sys 1m0.185s
The complete patchset including the PCC hacks used for testing is available in [4].
Changes since V0: [1]
- Split intel_pstate.c into a generic PID governor and platform specific backend.
- Add CPPC accessors as PID backend.
CPPC:
CPPC (Collaborative Processor Performance Control) is a new way to control CPU performance using an abstract continous scale as against a discretized P-state scale which is tied to CPU frequency only. It is defined in the ACPI 5.0+ spec. In brief, the basic operation involves:
OS makes a CPU performance request. (Can provide min and max tolerable bounds)
Platform (such as BMC) is free to optimize request within requested bounds depending
on power/thermal budgets etc.
- Platform conveys its decision back to OS
The communication between OS and platform occurs through another medium called (PCC) Platform communication Channel. This is a generic mailbox like mechanism which includes doorbell semantics to indicate register updates. The PCC driver is being discussed in a separate patchset [3] and is not included here, since CPPC is only one client of PCC.
Finer details about the PCC and CPPC spec are available in the latest ACPI 5.1 specification.[2]
[1] - http://lwn.net/Articles/608715/ [2] - http://www.uefi.org/sites/default/files/resources/ACPI_5_1release.pdf [3] - http://comments.gmane.org/gmane.linux.acpi.devel/70299 [4] - http://git.linaro.org/people/ashwin.chaugule/leg-kernel.git/shortlog/refs/he...
Ashwin Chaugule (6): PID Controller governor PID: Move Turbo detection into backend driver PID: Move Baytrail specific accessors into backend driver PID: Add new function pointers to read multiple registers PID: Rename counters to make them more generic PID: Add CPPC (Collaborative Processor Performance) backend driver
Documentation/cpu-freq/intel-pstate.txt | 43 -- Documentation/cpu-freq/pid_ctrl.txt | 41 ++ drivers/cpufreq/Kconfig | 19 + drivers/cpufreq/Kconfig.x86 | 2 +- drivers/cpufreq/Makefile | 4 +- drivers/cpufreq/cppc_pid_ctrl.c | 406 +++++++++++++ drivers/cpufreq/intel_pid_ctrl.c | 408 +++++++++++++ drivers/cpufreq/intel_pstate.c | 1012 ------------------------------- drivers/cpufreq/pid_ctrl.c | 615 +++++++++++++++++++ drivers/cpufreq/pid_ctrl.h | 113 ++++ 10 files changed, 1606 insertions(+), 1057 deletions(-) delete mode 100644 Documentation/cpu-freq/intel-pstate.txt create mode 100644 Documentation/cpu-freq/pid_ctrl.txt create mode 100644 drivers/cpufreq/cppc_pid_ctrl.c create mode 100644 drivers/cpufreq/intel_pid_ctrl.c delete mode 100644 drivers/cpufreq/intel_pstate.c create mode 100644 drivers/cpufreq/pid_ctrl.c create mode 100644 drivers/cpufreq/pid_ctrl.h
On 10 September 2014 11:44, Dirk Brandewie dirk.brandewie@gmail.com wrote:
Hi Ashwin,
Hi Dirk,
I think the CPPC based driver should be a separate driver.
We made the conscious decision to not use any of the ACPI mechanisms to enumerate or control P state selection. Experience over the years has shown that the quality/accuracy of the BIOS/ACPI implementations vary widely across OEM's and platform types from a single OEM. Features that always work on a server platform from a given OEM may not work or provide bad information on client platforms for example.
Another reason for doing intel_pstate was to be able to land intel specific features and fixes without breaking other architectures as the power management capabilities of the platform evolve. As processors that support Hardware P states (HWP) as described in section 14.4 of the current SDM come into the market intel_pstate will change to not doing much other than enabling HWP and providing an interface to forward user configuration requests to the processor if the user chooses to enable HWP otherwise the current mechanisms will be used. This is why the intel_pstate sysfs interface is the way it is to be able to map cleanly to HWP and provide an abstract interface going forward.
Having separate drivers allows the system integrator/user to select the most appropriate mechanism for their system.
--Dirk
With the current split I think you will still be able to maintain Intel specific changes for the future in the backend driver. The PID algorithm seems platform independent anyway and the PID knobs are exported to userspace for platform specific tuning. The Intel backend driver should be unaffected by the CPPC (ACPI) backend. We can also make them mutually exclusive at runtime.
Or are you suggesting using PID + CPPC as another driver? IIUC, that would lead to a lot of redundancy.
Cheers, Ashwin
On 09/10/2014 09:11 AM, Ashwin Chaugule wrote:
On 10 September 2014 11:44, Dirk Brandewie dirk.brandewie@gmail.com wrote:
Hi Ashwin,
Hi Dirk,
I think the CPPC based driver should be a separate driver.
We made the conscious decision to not use any of the ACPI mechanisms to enumerate or control P state selection. Experience over the years has shown that the quality/accuracy of the BIOS/ACPI implementations vary widely across OEM's and platform types from a single OEM. Features that always work on a server platform from a given OEM may not work or provide bad information on client platforms for example.
Another reason for doing intel_pstate was to be able to land intel specific features and fixes without breaking other architectures as the power management capabilities of the platform evolve. As processors that support Hardware P states (HWP) as described in section 14.4 of the current SDM come into the market intel_pstate will change to not doing much other than enabling HWP and providing an interface to forward user configuration requests to the processor if the user chooses to enable HWP otherwise the current mechanisms will be used. This is why the intel_pstate sysfs interface is the way it is to be able to map cleanly to HWP and provide an abstract interface going forward.
Having separate drivers allows the system integrator/user to select the most appropriate mechanism for their system.
--Dirk
With the current split I think you will still be able to maintain Intel specific changes for the future in the backend driver. The PID algorithm seems platform independent anyway and the PID knobs are exported to userspace for platform specific tuning. The Intel backend driver should be unaffected by the CPPC (ACPI) backend. We can also make them mutually exclusive at runtime.
We could make it runtime selectable whether to use CPPC or the native mechanisms for P state enumeration and selection but we would get into an awful black/white list situation that would not make anyone happy.
Using CPPC on Intel platforms implies using HWP which is already planned for in intel_pstate. I am not aware of any effort to support CPPC on Intel platforms that do not support HWP. For Intel platforms using CPPC is NOT needed or desirable IMHO. We had many conversations over many months while CPPC was being defined and made the decision to not use this mechanism on Intel Linux platforms.
For other platforms that plan on conforming to ACPI 5.x with respect to P state enumeration and selection I would like to leave it to them to hurd all the cats at the OEMs to get CPPC correct on all their platforms.
Or are you suggesting using PID + CPPC as another driver? IIUC, that would lead to a lot of redundancy.
The redundancy is actually pretty small IMHO if you take out the enumeration/init code the code shared at runtime is pretty small sample/calc_busy/PID.
Cheers, Ashwin
On 10 September 2014 13:31, Dirk Brandewie dirk.brandewie@gmail.com wrote:
On 09/10/2014 09:11 AM, Ashwin Chaugule wrote:
On 10 September 2014 11:44, Dirk Brandewie dirk.brandewie@gmail.com With the current split I think you will still be able to maintain Intel specific changes for the future in the backend driver. The PID algorithm seems platform independent anyway and the PID knobs are exported to userspace for platform specific tuning. The Intel backend driver should be unaffected by the CPPC (ACPI) backend. We can also make them mutually exclusive at runtime.
We could make it runtime selectable whether to use CPPC or the native mechanisms for P state enumeration and selection but we would get into an awful black/white list situation that would not make anyone happy.
Using CPPC on Intel platforms implies using HWP which is already planned for in intel_pstate. I am not aware of any effort to support CPPC on Intel platforms that do not support HWP. For Intel platforms using CPPC is NOT needed or desirable IMHO. We had many conversations over many months while CPPC was being defined and made the decision to not use this mechanism on Intel Linux platforms.
Ok. There is no intention to force CPPC usage on Intel platforms. We could make the CPPC backend unavailable on Intel platforms at compile time. The idea behind this patchset is to mainly separate out the PID algorithm so it can be used by anyone who can support it, with or without CPPC. For ARM64 , using CPPC is useful to unify all the ARM implementations which choose to design counters as either memory mapped or sysregs or whatever, while keeping the PID algorithm the same.
Or are you suggesting using PID + CPPC as another driver? IIUC, that would lead to a lot of redundancy.
The redundancy is actually pretty small IMHO if you take out the enumeration/init code the code shared at runtime is pretty small sample/calc_busy/PID.
This is exactly all there is in pid_ctrl.c. If HWP is enabled, do you plan to modify these generic PID bits in a platform specific manner? If not, then it seems that the HWP accessors could live in the intel pid backend driver?
Cheers, Ashwin