Hi all,
This patch set tries to introduce a mechanism for managing power in a way which is closer to the scheduler. This is a draft of some reaserch work which I am going to continue so it is a RFC. I will continue this research in the future.
It is based heavily on IPA (PID approach). It also uses design from schedutil. It also uses mechanims for injecting idle. It should be changed to cfs bandwith controller, though.
Regards, Lukasz Luba
Lukasz Luba (43): thermal: add interface for changing 'weight' of cpu cooling thermal: add sched driven feature to thermal_instance sched/power: add power and thermal governance thermal: add function which triggers update of all zones sched/power: update thermal subsystem after CPUs reweight DT: arm64: exynos5433: Add SoC thermal zone for IPA thermal: add containers with thermal zones DT: arm64: exynos: add support fot container thermal zone thermal: add virtual temperature sensor DT: arm64: exynos: add capacitance for every cpu core DT: arm64: exynos: Change polling time for soc thermal zone sched/power: change update filter function idle_inject: move needed structure to header file thermal: cpu_cooling: add function showing CPUs in cooling dev sched/power: add idle injection sched/sched.h: add new function for changing cluster power 'weight' sched/fair: testing sched power interfaces sched/power: change 'weight' type allowing to decrese value sched/power: refactor update function sched: power: add thermal governor and move some headers thermal: cpu_cooling: add function which copies cpumask sched/power: add thermal governor cooling devices and zones sched/power: register only the real cooling device not instances sched/power: cleanup warnings sched/power: change calc budget params thermal: add trip flag for control algorithm sched/power: add support of trip control algorithm arm64: dts: exynos5433: add ctrl-alg flag in trip points sched/power: allocate power for cooling devices sched/power: implement simple PI with integral decaying arm64: dts: exynos5433: change the trip points to passive sched/power: alwas take into current integral into account arm64: dts: exynos5433: change virtual temperature sensor sched/power: handle cluster weight change requests sched/power: set sum_power to 1 and finish calculation arm64: dts: exynos5433: change trip point 'SoC' thresholds sched: fix flags in sched_power sched/power: improve share power algorithm sched/power: collect unused power for future split sched/power: add check for max possible frequency of cooling device sched/power: max allowed state capping fixed sched/power: add recalc power budget and trace events sched/power: do not change state for single cooling zone
.../dts/exynos/exynos5433-tm2-common.dtsi | 4 + .../arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 120 +- arch/arm64/boot/dts/exynos/exynos5433.dtsi | 14 + drivers/powercap/idle_inject.c | 15 +- drivers/thermal/Kconfig | 5 + drivers/thermal/Makefile | 2 + drivers/thermal/cpu_cooling.c | 35 + drivers/thermal/fair_share.c | 3 +- drivers/thermal/gov_bang_bang.c | 2 +- drivers/thermal/hisi_thermal.c | 3 +- drivers/thermal/of-thermal.c | 129 +- drivers/thermal/power_allocator.c | 3 +- drivers/thermal/qoriq_thermal.c | 3 +- drivers/thermal/rcar_gen3_thermal.c | 3 +- drivers/thermal/samsung/exynos_tmu.c | 3 +- drivers/thermal/step_wise.c | 3 +- drivers/thermal/tegra/soctherm.c | 2 +- drivers/thermal/thermal_core.c | 91 +- drivers/thermal/thermal_helpers.c | 3 +- drivers/thermal/thermal_sysfs.c | 3 +- drivers/thermal/uniphier_thermal.c | 3 +- drivers/thermal/user_space.c | 3 +- drivers/thermal/virt_tsens.c | 126 ++ include/linux/cpu_cooling.h | 25 + include/linux/idle_inject.h | 39 +- include/linux/sched/power.h | 15 + include/linux/thermal.h | 17 + .../thermal => include/linux}/thermal_core.h | 10 + include/trace/events/sched_power.h | 60 + kernel/sched/Makefile | 2 +- kernel/sched/fair.c | 13 + kernel/sched/power.c | 1128 +++++++++++++++++ kernel/sched/power.h | 115 ++ kernel/sched/sched.h | 37 + 34 files changed, 1950 insertions(+), 89 deletions(-) create mode 100644 drivers/thermal/virt_tsens.c create mode 100644 include/linux/sched/power.h rename {drivers/thermal => include/linux}/thermal_core.h (93%) create mode 100644 include/trace/events/sched_power.h create mode 100644 kernel/sched/power.c create mode 100644 kernel/sched/power.h
This patch add an interface which allows to change the weight of a CPU cooling device. It changes the allocated power in every thermal zone were this device is pinned.
The wieght varies from 0 to 1024 (0.0 to 1.0) and is multipiled by requested power in IPA thermal governor. With this patch it is possible to tune the algorithm which calculates and grants power for thermal actors, so some CPUs get higher freqencies as their operating limits.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/cpu_cooling.c | 19 +++++++++++ drivers/thermal/thermal_core.c | 59 ++++++++++++++++++++++++++++++++++ include/linux/cpu_cooling.h | 11 +++++++ include/linux/thermal.h | 3 ++ 4 files changed, 92 insertions(+)
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index dfd23245f778..0360cdd0826b 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -736,6 +736,25 @@ cpufreq_cooling_register(struct cpufreq_policy *policy) } EXPORT_SYMBOL_GPL(cpufreq_cooling_register);
+bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, + int cpu) +{ + struct cpufreq_cooling_device *cpufreq_cdev; + struct cpufreq_policy *policy; + + if (cdev && cdev->devdata) { + cpufreq_cdev = cdev->devdata; + if (cpufreq_cdev->policy) { + policy = cpufreq_cdev->policy; + if (cpumask_test_cpu(cpu, policy->related_cpus)) + return true; + } + } + + return false; +} +EXPORT_SYMBOL_GPL(cpufreq_cooling_test_related_cpu); + /** * of_cpufreq_cooling_register - function to create cpufreq cooling device. * @policy: cpufreq policy diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 6ab982309e6a..2fc143ac8552 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -9,6 +9,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/cpu_cooling.h> #include <linux/module.h> #include <linux/device.h> #include <linux/err.h> @@ -779,6 +780,64 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, } EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
+static void +thermal_update_weight(struct thermal_instance *instance, unsigned long weight) +{ + instance->weight = weight; +} + +static bool thermal_cdev_in_tz(struct thermal_cooling_device *cdev, + struct thermal_zone_device *tz) +{ + struct thermal_instance *pos; + + list_for_each_entry(pos, &tz->thermal_instances, tz_node) + if (pos->cdev == cdev) + return true; + + return false; +} + +int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight) +{ + struct thermal_cooling_device *cdev; + struct thermal_instance *instance; + struct thermal_zone_device *tz; + + mutex_lock(&thermal_list_lock); + list_for_each_entry(cdev, &thermal_cdev_list, node) { + if (cpufreq_cooling_test_related_cpu(cdev, cpu)) + goto update; + } + + mutex_unlock(&thermal_list_lock); + return -ENODEV; + +update: + mutex_unlock(&thermal_list_lock); + + mutex_lock(&cdev->lock); + list_for_each_entry(instance, &cdev->thermal_instances, cdev_node) + thermal_update_weight(instance, weight); + mutex_unlock(&cdev->lock); + + /* Update thermal zones which are pinned to this cooling device. + * It will trigger recalculation of the states. + */ + mutex_lock(&thermal_list_lock); + list_for_each_entry(tz, &thermal_tz_list, node) { + if (thermal_cdev_in_tz(cdev, tz)) + thermal_zone_device_update(tz, + THERMAL_EVENT_UNSPECIFIED); + + } + mutex_unlock(&thermal_list_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(thermal_cpu_cdev_set_weight); + + /** * thermal_zone_unbind_cooling_device() - unbind a cooling device from a * thermal zone. diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h index de0dafb9399d..c875cde35879 100644 --- a/include/linux/cpu_cooling.h +++ b/include/linux/cpu_cooling.h @@ -44,6 +44,10 @@ cpufreq_cooling_register(struct cpufreq_policy *policy); */ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev);
+ +bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, + int cpu); + #else /* !CONFIG_CPU_THERMAL */ static inline struct thermal_cooling_device * cpufreq_cooling_register(struct cpufreq_policy *policy) @@ -56,6 +60,13 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) { return; } + +static inline +bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, + int cpu) +{ + return false; +} #endif /* CONFIG_CPU_THERMAL */
#if defined(CONFIG_THERMAL_OF) && defined(CONFIG_CPU_THERMAL) diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 5f4705f46c2f..baef42ccb3a5 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -458,6 +458,7 @@ struct thermal_instance *get_thermal_instance(struct thermal_zone_device *, struct thermal_cooling_device *, int); void thermal_cdev_update(struct thermal_cooling_device *); void thermal_notify_framework(struct thermal_zone_device *, int); +int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight); #else static inline bool cdev_is_power_actor(struct thermal_cooling_device *cdev) { return false; } @@ -529,6 +530,8 @@ static inline void thermal_cdev_update(struct thermal_cooling_device *cdev) static inline void thermal_notify_framework(struct thermal_zone_device *tz, int trip) { } +static inline +int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight) {} #endif /* CONFIG_THERMAL */
#if defined(CONFIG_NET) && IS_ENABLED(CONFIG_THERMAL)
This patch adds a new flag and init function for thermal_instance in order to provide connection with scheduler.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/thermal_core.c | 6 ++++++ drivers/thermal/thermal_core.h | 1 + 2 files changed, 7 insertions(+)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 2fc143ac8552..a3d80553b590 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -19,6 +19,7 @@ #include <linux/thermal.h> #include <linux/reboot.h> #include <linux/string.h> +#include <linux/sched/power.h> #include <linux/of.h> #include <net/netlink.h> #include <net/genetlink.h> @@ -837,6 +838,11 @@ int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight) } EXPORT_SYMBOL_GPL(thermal_cpu_cdev_set_weight);
+static void thermal_cpu_cdev_init_weight(int cpu, unsigned int weight) +{ + sched_power_cpu_reinit_weight(cpu, weight); +} +
/** * thermal_zone_unbind_cooling_device() - unbind a cooling device from a diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index 0df190ed82a7..f6ccd8fc2ba5 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -37,6 +37,7 @@ struct thermal_instance { struct list_head tz_node; /* node in tz->thermal_instances */ struct list_head cdev_node; /* node in cdev->thermal_instances */ unsigned int weight; /* The weight of the cooling device */ + bool sched_driven; };
#define to_thermal_zone(_dev) \
This patch add new feature for the scheduler and provides connection with thermal subsystem which grants power to cooling devices (in DVFS - CPU devices). .....
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- include/linux/sched/power.h | 15 ++ kernel/sched/Makefile | 2 +- kernel/sched/fair.c | 6 + kernel/sched/power.c | 268 ++++++++++++++++++++++++++++++++++++ kernel/sched/power.h | 56 ++++++++ kernel/sched/sched.h | 22 +++ 6 files changed, 368 insertions(+), 1 deletion(-) create mode 100644 include/linux/sched/power.h create mode 100644 kernel/sched/power.c create mode 100644 kernel/sched/power.h
diff --git a/include/linux/sched/power.h b/include/linux/sched/power.h new file mode 100644 index 000000000000..7827ba02a65c --- /dev/null +++ b/include/linux/sched/power.h @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Scheduler CPU power + * + * Copyright (C) 2018 Samsung + */ + +#ifndef __INC_SCHED_POWER_H__ +#define __INC_SCHED_POWER_H__ + + + +int sched_power_cpu_reinit_weight(int cpu, int weight); + +#endif diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 7fe183404c38..c1ccc0a9dc9b 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o obj-y += idle.o fair.o rt.o deadline.o obj-y += wait.o wait_bit.o swait.o completion.o
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o +obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o power.o obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o obj-$(CONFIG_SCHEDSTATS) += stats.o obj-$(CONFIG_SCHED_DEBUG) += debug.o diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 908c9cdae2f0..c03c709ccc68 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4172,6 +4172,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) */ update_curr(cfs_rq);
+ /* * Ensure that runnable average is periodically updated. */ @@ -6357,6 +6358,9 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f } rcu_read_unlock();
+ if (prev_cpu != new_cpu) + sched_power_change_cpu_weight(new_cpu, 512, 0); + return new_cpu; }
@@ -9658,6 +9662,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
if (static_branch_unlikely(&sched_numa_balancing)) task_tick_numa(rq, curr); + + /* sched_power_change_cpu_weight(cpu_of(rq), 768, 0); */ }
/* diff --git a/kernel/sched/power.c b/kernel/sched/power.c new file mode 100644 index 000000000000..c2fc0811bf37 --- /dev/null +++ b/kernel/sched/power.c @@ -0,0 +1,268 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Scheduler CPU power + * + * Copyright (C) 2018 Samsung + */ + + +#include <linux/sched.h> +#include <linux/thermal.h> + +#include "power.h" + +#define THERMAL_REQUEST_KFIFO_SIZE (64 * sizeof(struct power_request)) +#define DEFAULT_CPU_WEIGHT 1024 + +static DEFINE_PER_CPU(struct cpu_power, cpu_power); +DEFINE_PER_CPU(struct update_sched_power *, update_cpu_power); + +static struct sched_power sched_power; + +void sched_power_set_update_func(int cpu, struct update_sched_power *update, + void (*fn)(struct update_sched_power *, int, unsigned int, int, + int)) +{ + + if (WARN_ON(!update || !fn)) + return; + + if (WARN_ON(per_cpu(update_cpu_power, cpu))) + return; + + update->func = fn; + rcu_assign_pointer(per_cpu(update_cpu_power, cpu), update); +} + +void sched_power_clean_update_func(int cpu) +{ + rcu_assign_pointer(per_cpu(update_cpu_power, cpu), NULL); +} + + +///////////////////////////////////////////////////////////////////////// + + +unsigned int cpu_power_calc_group_weight(int cpu) +{ + cpumask_t *span_cpus = NULL; + struct cpu_power *power; + unsigned int w = 0; + int i; + int num_cpus; + + + num_cpus = cpumask_weight(span_cpus); + + for_each_cpu(i, span_cpus) { + power = (&per_cpu(cpu_power, i)); + w += power->weight; + } + + if (num_cpus) + w /= num_cpus; + + return w; +} + +int get_state_for_power(int cpu, unsigned long power) +{ + /* unsigned long gr_load; */ + + + return 0; +} + +int cpu_power_calc_group_capacity(unsigned long gr_power, unsigned gr_weight, + int cpu) +{ + cpumask_t *span_cpus = NULL; + int num_cpus; + struct cpu_power *power; + unsigned long p; + int i, state; + /* int size = 0; */ + unsigned long max_power = 0; + + num_cpus = cpumask_weight(span_cpus); + + for_each_cpu(i, span_cpus) { + power = (&per_cpu(cpu_power, i)); + p = gr_power * (power->weight << 10) / gr_weight; + p >>= 10; + + if (max_power < p) + max_power = p; + } + + + state = get_state_for_power(cpu, max_power); + + + return 0; +} + +int sched_power_cpu_reinit_weight(int cpu, int weight) +{ + struct cpu_power *cpower = &per_cpu(cpu_power, cpu); + + if (!cpower->operating) + return -EAGAIN; + + raw_spin_lock(&cpower->update_lock); + cpower->weight = weight; + raw_spin_unlock(&cpower->update_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(sched_power_cpu_reinit_weight); + +////////////////////////////////////////////////////////////// + + +static bool should_update_next_weight(int time) +{ + return 1; +} + +static void sched_power_work(struct kthread_work *work) +{ + struct sched_power *sp = container_of(work, struct sched_power, work); + int i; + struct cpu_power *cpower = NULL; + struct power_request req; + + for_each_online_cpu(i) { + cpower = (&per_cpu(cpu_power, i)); + raw_spin_lock(&cpower->update_lock); + req = cpower->req; + cpower->req.time = 0; + raw_spin_unlock(&cpower->update_lock); + + if (should_update_next_weight(req.time)) { + pr_info("cpower req poped\n"); + thermal_cpu_cdev_set_weight(req.cpu, req.weight); + } + } + + sp->work_in_progress = false; +} + +static void sched_power_irq_work(struct irq_work *irq_work) +{ + struct sched_power *power; + + power = container_of(irq_work, struct sched_power, irq_work); + + kthread_queue_work(&power->worker, &power->work); +} + +static void sched_power_update(struct update_sched_power *update, int cpu, + unsigned int weight, int flags, int time) +{ + struct cpu_power *cpower = container_of(update, struct cpu_power, + update_power); + struct sched_power *sp; + + if (!cpower->operating) + return; + + sp = cpower->sched_power; + + /* Filter to frequent changes */ + if (!should_update_next_weight(time)) + return; + + raw_spin_lock(&cpower->update_lock); + cpower->req.weight = weight; + cpower->req.cpu = cpu; + cpower->req.time = time; + raw_spin_unlock(&cpower->update_lock); + + if (!sp->work_in_progress) { + sp->work_in_progress = true; + irq_work_queue(&sp->irq_work); + } +} + + +static int sched_power_create_thread(struct sched_power *power) +{ + int ret; + struct task_struct *thread; + struct sched_attr attr = { + .sched_policy = SCHED_DEADLINE, + .sched_nice = 0, + .sched_priority = 0, + .sched_flags = 0, + .sched_runtime = 1000000, + .sched_deadline = 10000000, + .sched_period = 10000000, + }; + + kthread_init_work(&power->work, sched_power_work); + kthread_init_worker(&power->worker); + thread = kthread_create(kthread_worker_fn, &power->worker, + "sched_power/a"); + + if (IS_ERR(thread)) { + pr_err("failed to create sched_power thread %ld\n", + PTR_ERR(thread)); + return PTR_ERR(thread); + } + + ret = sched_setattr_nocheck(thread, &attr); + if (ret) { + kthread_stop(thread); + pr_warn("failed to set SCHED_DEADLINE for sched_power %d\n", + ret); + return ret; + } + + power->thread = thread; + mutex_init(&power->work_lock); + init_irq_work(&power->irq_work, sched_power_irq_work); + wake_up_process(thread); + + return 0; +} + +static void sched_power_disable_thread(struct sched_power *sp) +{ + kthread_flush_worker(&sp->worker); + kthread_stop(sp->thread); + mutex_destroy(&sp->work_lock); +} + +static int sched_power_setup(struct sched_power *sp) +{ + int i; + struct cpu_power *cpower; + + for_each_possible_cpu(i) { + cpower = (&per_cpu(cpu_power, i)); + cpower->weight = DEFAULT_CPU_WEIGHT; + cpower->sched_power = sp; + sched_power_set_update_func(i, &cpower->update_power, + sched_power_update); + raw_spin_lock_init(&cpower->update_lock); + cpower->operating = true; + } + + return 0; +} + + +static int __init sched_power_init(void) +{ + int ret = 0; + + ret = sched_power_create_thread(&sched_power); + if (ret) + return ret; + + sched_power_setup(&sched_power); + + return ret; +} +fs_initcall(sched_power_init); diff --git a/kernel/sched/power.h b/kernel/sched/power.h new file mode 100644 index 000000000000..f08277efd50d --- /dev/null +++ b/kernel/sched/power.h @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Scheduler CPU power + * + * Copyright (C) 2018 Samsung + */ + +#ifndef __SCHED_POWER_H__ +#define __SCHED_POWER_H__ + +#include "sched.h" + +// struct update_sched_power { +// void (*func)(struct update_sched_power *, int, unsigned int, int); +// }; + +struct power_budget { + s64 temp; + s64 temp_limit; + s64 avail_power; +}; + +struct sched_power { + struct task_struct *thread; + struct irq_work irq_work; + struct kthread_work work; + struct kthread_worker worker; + bool work_in_progress; + struct mutex work_lock; +}; + +struct power_request { + unsigned int weight; + int cpu; + int time; +}; + +struct cpu_power { + struct update_sched_power update_power; + unsigned int max_capacity; + unsigned int capacity; + unsigned int vcapacity; + int opp_state; + u64 opp_power_cost; + unsigned long vidle; + unsigned int vrun; /* from 0..1024 (100%) */ + unsigned int weight; /* 0..1024 (100%) */ + struct sched_power *sched_power; + struct power_request req; + bool operating; + /* lock shared with thermal framework and/or cpufreq */ + raw_spinlock_t update_lock; +}; + + +#endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9683f458aec7..c1714ef73669 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2244,3 +2244,25 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned return util; } #endif + +#ifdef CONFIG_THERMAL +struct update_sched_power { + void (*func)(struct update_sched_power *, int, unsigned int, int, int); +}; +DECLARE_PER_CPU(struct update_sched_power *, update_cpu_power); + +static inline void sched_power_change_cpu_weight(int cpu, unsigned long weight, + int flags) +{ + struct update_sched_power *update; + int time = 0; + + + update = rcu_dereference_sched(*per_cpu_ptr(&update_cpu_power, cpu)); + if (update) + update->func(update, cpu, weight, flags, time); +} +#else +static inline void sched_power_change_cpu_weight(int cpu, unsigned int weight, + int flags) {} +#endif /* CONFIG_THERMAL */
The patch....
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/thermal_core.c | 25 +++++++++++++------------ include/linux/thermal.h | 2 ++ 2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index a3d80553b590..7b998297db20 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -799,6 +799,19 @@ static bool thermal_cdev_in_tz(struct thermal_cooling_device *cdev, return false; }
+void thermal_all_zones_recalc_power(void) +{ + struct thermal_zone_device *tz; + + mutex_lock(&thermal_list_lock); + list_for_each_entry(tz, &thermal_tz_list, node) + thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED); + + mutex_unlock(&thermal_list_lock); + +} +EXPORT_SYMBOL_GPL(thermal_all_zones_recalc_power); + int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight) { struct thermal_cooling_device *cdev; @@ -822,18 +835,6 @@ int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight) thermal_update_weight(instance, weight); mutex_unlock(&cdev->lock);
- /* Update thermal zones which are pinned to this cooling device. - * It will trigger recalculation of the states. - */ - mutex_lock(&thermal_list_lock); - list_for_each_entry(tz, &thermal_tz_list, node) { - if (thermal_cdev_in_tz(cdev, tz)) - thermal_zone_device_update(tz, - THERMAL_EVENT_UNSPECIFIED); - - } - mutex_unlock(&thermal_list_lock); - return 0; } EXPORT_SYMBOL_GPL(thermal_cpu_cdev_set_weight); diff --git a/include/linux/thermal.h b/include/linux/thermal.h index baef42ccb3a5..6c944ab78d17 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -459,6 +459,7 @@ struct thermal_instance *get_thermal_instance(struct thermal_zone_device *, void thermal_cdev_update(struct thermal_cooling_device *); void thermal_notify_framework(struct thermal_zone_device *, int); int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight); +void thermal_all_zones_recalc_power(void); #else static inline bool cdev_is_power_actor(struct thermal_cooling_device *cdev) { return false; } @@ -532,6 +533,7 @@ static inline void thermal_notify_framework(struct thermal_zone_device *tz, { } static inline int thermal_cpu_cdev_set_weight(int cpu, unsigned long weight) {} +void thermal_all_zones_recalc_power(void) {} #endif /* CONFIG_THERMAL */
#if defined(CONFIG_NET) && IS_ENABLED(CONFIG_THERMAL)
The patch....
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index c2fc0811bf37..0dcb4579b474 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -131,20 +131,28 @@ static void sched_power_work(struct kthread_work *work) int i; struct cpu_power *cpower = NULL; struct power_request req; + unsigned int w; + bool need_update = false;
for_each_online_cpu(i) { cpower = (&per_cpu(cpu_power, i)); raw_spin_lock(&cpower->update_lock); + w = cpower->weight; req = cpower->req; cpower->req.time = 0; + cpower->weight = req.weight; raw_spin_unlock(&cpower->update_lock);
if (should_update_next_weight(req.time)) { pr_info("cpower req poped\n"); thermal_cpu_cdev_set_weight(req.cpu, req.weight); + need_update = true; } }
+ if (need_update) + thermal_all_zones_recalc_power(); + sp->work_in_progress = false; }
@@ -167,12 +175,12 @@ static void sched_power_update(struct update_sched_power *update, int cpu, if (!cpower->operating) return;
- sp = cpower->sched_power; - - /* Filter to frequent changes */ + /* Filter to frequent changes or not needed*/ if (!should_update_next_weight(time)) return;
+ sp = cpower->sched_power; + raw_spin_lock(&cpower->update_lock); cpower->req.weight = weight; cpower->req.cpu = cpu;
Draft support for future IPA tests.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- .../arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index fe3a0b14bee6..c3f6dac6743a 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -289,5 +289,37 @@ thermal-zones { }; }; }; + + soc_thermal_ipa: soc-thermal-ipa { + thermal-sensors = <&tmu_atlas0>; + polling-delay = <100>; + polling-delay-passive = <1000>; + sustainable-power = <2500>; + trips { + threshold: threshold { + temperature = <55000>; /* millicelsius */ + hysteresis = <1000>; /* millicelsius */ + type = "passive"; + }; + target: target { + temperature = <70000>; /* millicelsius */ + hysteresis = <1000>; /* millicelsius */ + type = "passive"; + }; + }; + + cooling-maps { + map0 { + trip = <&target>; + cooling-device = <&cpu4 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; + contribution = <1024>; + }; + map1 { + trip = <&target>; + cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; + contribution = <2048>; + }; + }; + }; }; };
This patch adds a new concept of a container thermal zone, which agregates subzones. There is only one level allowed. It relays on virtual temperature sensor which calculates the temp based on real sensors from subzones.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/of-thermal.c | 104 +++++++++++++++++++++++++++++++-- drivers/thermal/thermal_core.h | 1 + include/linux/thermal.h | 3 + 3 files changed, 102 insertions(+), 6 deletions(-)
diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c index 4f2816559205..1825d30e9b0b 100644 --- a/drivers/thermal/of-thermal.c +++ b/drivers/thermal/of-thermal.c @@ -801,7 +801,7 @@ static int thermal_of_populate_trip(struct device_node *np, * check the return value with help of IS_ERR() helper. */ static struct __thermal_zone -__init *thermal_of_build_thermal_zone(struct device_node *np) +__init *thermal_of_build_thermal_zone(struct device_node *np, bool container) { struct device_node *child = NULL, *gchild; struct __thermal_zone *tz; @@ -813,6 +813,15 @@ __init *thermal_of_build_thermal_zone(struct device_node *np) return ERR_PTR(-EINVAL); }
+ /* + * Make sure we are populating the right zone type in the + * right phase + */ + if (of_property_read_bool(np, "container") ^ container) { + return ERR_PTR(-EAGAIN); + } + + tz = kzalloc(sizeof(*tz), GFP_KERNEL); if (!tz) return ERR_PTR(-ENOMEM); @@ -931,6 +940,69 @@ static inline void of_thermal_free_zone(struct __thermal_zone *tz) kfree(tz); }
+static int thermal_add_subzone(struct thermal_zone_device *container, + struct thermal_zone_device *zone, + u32 capacitance) +{ + if (IS_ERR_OR_NULL(container) || IS_ERR_OR_NULL(zone)) + return -EINVAL; + + mutex_lock(&zone->lock); + zone->capacitance = capacitance; + mutex_unlock(&zone->lock); + + mutex_lock(&container->lock); + list_add_tail(&zone->subzones, &container->subzones); + mutex_unlock(&container->lock); + + return 0; +} + +static int thermal_build_subzones(struct thermal_zone_device *zone, + struct device_node *np) +{ + int i = 0; + int ret = 0; + + INIT_LIST_HEAD(&zone->subzones); + + for ( ;; i++) { + struct thermal_zone_device *tz; + struct of_phandle_args zone_specs; + struct device_node *child; + u32 capacitance; + + ret = of_parse_phandle_with_fixed_args(np, "subzones", 1, i, + &zone_specs); + if (ret) { + of_node_put(zone_specs.np); + break; + } + + + tz = thermal_zone_get_zone_by_name(zone_specs.np->name); + if (IS_ERR_OR_NULL(tz)) { + pr_warn("Parsing thermal subzone %s failed %d\n", + of_node_full_name(zone_specs.np), ret); + of_node_put(zone_specs.np); + continue; + } + + if (zone_specs.args_count == 1) + capacitance = zone_specs.args[0]; + else + capacitance = THERMAL_SUBZONE_DEFAULT_CAPACITANCE; + + + ret = thermal_add_subzone(zone, tz, capacitance); + + of_node_put(zone_specs.np); + + } + + return ret; +} + /** * of_parse_thermal_zones - parse device tree thermal data * @@ -943,11 +1015,12 @@ static inline void of_thermal_free_zone(struct __thermal_zone *tz) * Return: 0 on success, proper error code otherwise * */ -int __init of_parse_thermal_zones(void) +int __init of_parse_thermal_zones_containers(bool container) { struct device_node *np, *child; struct __thermal_zone *tz; struct thermal_zone_device_ops *ops; + int ret;
np = of_find_node_by_name(NULL, "thermal-zones"); if (!np) { @@ -961,11 +1034,12 @@ int __init of_parse_thermal_zones(void) int i, mask = 0; u32 prop;
- tz = thermal_of_build_thermal_zone(child); + tz = thermal_of_build_thermal_zone(child, container); if (IS_ERR(tz)) { - pr_err("failed to build thermal zone %s: %ld\n", - child->name, - PTR_ERR(tz)); + if (tz != ERR_PTR(-EAGAIN)) + pr_err("failed to build thermal zone %s: %ld\n", + child->name, + PTR_ERR(tz)); continue; }
@@ -1005,7 +1079,13 @@ int __init of_parse_thermal_zones(void) of_thermal_free_zone(tz); /* attempting to build remaining zones still */ } + + zone->container = container; + if (container) { + ret = thermal_build_subzones(zone, child); + } } + of_node_put(np);
return 0; @@ -1021,6 +1101,18 @@ int __init of_parse_thermal_zones(void) return -ENOMEM; }
+ +int __init of_parse_thermal_zones(void) +{ + int ret; + + /* First parse normal zones, then containers */ + ret = of_parse_thermal_zones_containers(false); + if (ret) + return ret; + return of_parse_thermal_zones_containers(true); +} + /** * of_thermal_destroy_zones - remove all zones parsed and allocated resources * diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index f6ccd8fc2ba5..66527810a06c 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -14,6 +14,7 @@
/* Initial state of a cooling device during binding */ #define THERMAL_NO_TARGET -1UL +#define THERMAL_SUBZONE_DEFAULT_CAPACITANCE 1
/* * This structure is used to describe the behavior of diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 6c944ab78d17..0b47cb72b96e 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -218,6 +218,9 @@ struct thermal_zone_device { struct mutex lock; struct list_head node; struct delayed_work poll_queue; + bool container; + struct list_head subzones; + u32 capacitance; enum thermal_notify_event notify_event; };
Add basic support for new type of thermal zone - container with new virtual temperature sensor.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- .../boot/dts/exynos/exynos5433-tm2-common.dtsi | 4 ++++ arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 14 +++++++++----- arch/arm64/boot/dts/exynos/exynos5433.dtsi | 6 ++++++ 3 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2-common.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tm2-common.dtsi index 6380d2751d15..d94bd6a8c140 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tm2-common.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2-common.dtsi @@ -1276,6 +1276,10 @@ status = "okay"; };
+&vtsens { + status = "okay"; +}; + &usbdrd30 { vdd33-supply = <&ldo10_reg>; vdd10-supply = <&ldo6_reg>; diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index c3f6dac6743a..6057c9101f0e 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -291,19 +291,23 @@ thermal-zones { };
soc_thermal_ipa: soc-thermal-ipa { - thermal-sensors = <&tmu_atlas0>; + container; + thermal-sensors = <&vtsens>; + #thermal-subzone-cells = <1>; + subzones = <&apollo_thermal 50 &atlas0_thermal 100 &g3d_thermal 100>; polling-delay = <100>; polling-delay-passive = <1000>; sustainable-power = <2500>; + trips { threshold: threshold { - temperature = <55000>; /* millicelsius */ - hysteresis = <1000>; /* millicelsius */ + temperature = <55000>; + hysteresis = <1000>; type = "passive"; }; target: target { - temperature = <70000>; /* millicelsius */ - hysteresis = <1000>; /* millicelsius */ + temperature = <70000>; + hysteresis = <1000>; type = "passive"; }; }; diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi b/arch/arm64/boot/dts/exynos/exynos5433.dtsi index 2131f12364cb..7527d579114d 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi @@ -678,6 +678,12 @@ status = "disabled"; };
+ vtsens: vtsens@0 { + compatible = "thermal,virt-tsens"; + #thermal-sensor-cells = <0>; + status = "disabled"; + }; + mct@101c0000 { compatible = "samsung,exynos4210-mct"; reg = <0x101c0000 0x800>;
The new type of sensor allows to calculate temperature of thermal zone container. It relays on real thermal zones temperature and provides basic weighted avg algorithm.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/Makefile | 2 + drivers/thermal/virt_tsens.c | 126 +++++++++++++++++++++++++++++++++++ 2 files changed, 128 insertions(+) create mode 100644 drivers/thermal/virt_tsens.c
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile index 610344eb3e03..c0d29120379a 100644 --- a/drivers/thermal/Makefile +++ b/drivers/thermal/Makefile @@ -27,6 +27,8 @@ thermal_sys-$(CONFIG_CLOCK_THERMAL) += clock_cooling.o # devfreq cooling thermal_sys-$(CONFIG_DEVFREQ_THERMAL) += devfreq_cooling.o
+obj-y += virt_tsens.o + # platform thermal drivers obj-y += broadcom/ obj-$(CONFIG_QCOM_SPMI_TEMP_ALARM) += qcom-spmi-temp-alarm.o diff --git a/drivers/thermal/virt_tsens.c b/drivers/thermal/virt_tsens.c new file mode 100644 index 000000000000..59dffeacecbf --- /dev/null +++ b/drivers/thermal/virt_tsens.c @@ -0,0 +1,126 @@ +/*license */ + +#define pr_fmt(fmt) "VTSENS: " fmt + +#include <linux/device.h> +#include <linux/init.h> +#include <linux/io.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/of.h> +#include <linux/of_device.h> +#include <linux/platform_device.h> +#include <linux/thermal.h> +#include <linux/types.h> + + + +struct vtsens { + struct thermal_zone_device *tzd; + struct mutex lock; +}; + +static int vtsens_get_temp(void *p, int *temp) +{ + struct vtsens *vts = p; + struct thermal_zone_device *tzd = vts->tzd; + struct thermal_zone_device *tz; + int ret = 0; + int subtz_temp; + int avg_temp = 0; + int subzones_capacitance = 0; + + /* We are virtual temp sensor so must relay on real sensors + * which are present in subzones. */ + if (!tzd || !tzd->container) + return -EINVAL; + + + if (list_empty(&tzd->subzones)) + return -EINVAL; + + list_for_each_entry(tz, &tzd->subzones, subzones) { + /* Avoid checking the first entry and prevent from + aggregating containers (circular dependencies, etc.) */ + if (tz->container) + continue; + ret = thermal_zone_get_temp(tz, &subtz_temp); + if (ret) { + return ret; + } + avg_temp += tz->capacitance * subtz_temp; + subzones_capacitance += tz->capacitance; + /* pr_info("%s, subtz_temp = %d\n", tzd->type, subtz_temp); */ + } + + if (subzones_capacitance) + avg_temp /= subzones_capacitance; + else + ret = -EINVAL; + + *temp = avg_temp; + + return ret; +} + +static const struct thermal_zone_of_device_ops vtsens_ops = { + .get_temp = vtsens_get_temp, +}; + +static int vtsens_probe(struct platform_device *pdev) +{ + struct vtsens *vts; + struct device *dev = &pdev->dev; + int ret; + + if (!dev || !dev->of_node) + return -EINVAL; + + vts = devm_kzalloc(&pdev->dev, sizeof(struct vtsens), + GFP_KERNEL); + if (!vts) + return -ENOMEM; + + platform_set_drvdata(pdev, vts); + mutex_init(&vts->lock); + + vts->tzd = thermal_zone_of_sensor_register(&pdev->dev, 0, vts, + &vtsens_ops); + + if (IS_ERR(vts->tzd)) { + ret = PTR_ERR(vts->tzd); + dev_err(&pdev->dev, "Failed during registration of sensor %d\n", + ret); + return ret; + } + + return 0; +} + +static int vtsens_remove(struct platform_device *pdev) +{ + + return 0; +} + +static const struct of_device_id of_virt_tsens_match[] = { + { .compatible = "thermal,virt-tsens", }, + { /* end */ } +}; +MODULE_DEVICE_TABLE(of, of_virt_tsens_match); + + +static struct platform_driver vtsens_thermal = { + .driver = { + .name = "vtsens_thermal", + .of_match_table = of_virt_tsens_match, + }, + .probe = vtsens_probe, + .remove = vtsens_remove, +}; +module_platform_driver(vtsens_thermal); + +MODULE_AUTHOR("Samsung, Inc."); +MODULE_DESCRIPTION("Virtual temperature sensor driver for agregated thermal zones"); +MODULE_LICENSE("GPL v2");
This patch enables IPA cooling device needed functions. The values are experimental and could not be the right one.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- arch/arm64/boot/dts/exynos/exynos5433.dtsi | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi b/arch/arm64/boot/dts/exynos/exynos5433.dtsi index 7527d579114d..05445159e87b 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi @@ -37,6 +37,7 @@ clock-names = "apolloclk"; operating-points-v2 = <&cluster_a53_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <200>; };
cpu1: cpu@101 { @@ -47,6 +48,7 @@ clock-frequency = <1300000000>; operating-points-v2 = <&cluster_a53_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <200>; };
cpu2: cpu@102 { @@ -57,6 +59,7 @@ clock-frequency = <1300000000>; operating-points-v2 = <&cluster_a53_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <200>; };
cpu3: cpu@103 { @@ -67,6 +70,7 @@ clock-frequency = <1300000000>; operating-points-v2 = <&cluster_a53_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <200>; };
cpu4: cpu@0 { @@ -79,6 +83,7 @@ clock-names = "atlasclk"; operating-points-v2 = <&cluster_a57_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <500>; };
cpu5: cpu@1 { @@ -89,6 +94,7 @@ clock-frequency = <1900000000>; operating-points-v2 = <&cluster_a57_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <500>; };
cpu6: cpu@2 { @@ -99,6 +105,7 @@ clock-frequency = <1900000000>; operating-points-v2 = <&cluster_a57_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <500>; };
cpu7: cpu@3 { @@ -109,6 +116,7 @@ clock-frequency = <1900000000>; operating-points-v2 = <&cluster_a57_opp_table>; #cooling-cells = <2>; + dynamic-power-coefficient = <500>; }; };
The patch changes polling intervals values.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index 6057c9101f0e..5beb4538dfdc 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -295,8 +295,8 @@ thermal-zones { thermal-sensors = <&vtsens>; #thermal-subzone-cells = <1>; subzones = <&apollo_thermal 50 &atlas0_thermal 100 &g3d_thermal 100>; - polling-delay = <100>; - polling-delay-passive = <1000>; + polling-delay = <1000>; + polling-delay-passive = <100>; sustainable-power = <2500>;
trips {
Add basic filtering mechanism of filtering to frequent requests. On the other hand create a by-pass for RT od deadline bandwith requests which must be served.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 24 +++++++++++++++--------- kernel/sched/power.h | 3 ++- kernel/sched/sched.h | 3 ++- 3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 0dcb4579b474..28f3b6c8c0a3 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -13,6 +13,7 @@
#define THERMAL_REQUEST_KFIFO_SIZE (64 * sizeof(struct power_request)) #define DEFAULT_CPU_WEIGHT 1024 +#define MINIMUM_UPDATE_TIME 10000000 /* 10 ms */
static DEFINE_PER_CPU(struct cpu_power, cpu_power); DEFINE_PER_CPU(struct update_sched_power *, update_cpu_power); @@ -120,9 +121,15 @@ EXPORT_SYMBOL_GPL(sched_power_cpu_reinit_weight); //////////////////////////////////////////////////////////////
-static bool should_update_next_weight(int time) +static bool should_update_next_weight(u64 time, int flags) { - return 1; + if (flags & SCHED_POWER_FORCE_UPDATE_RT) + return 1; + + if (time >= sched_clock() + MINIMUM_UPDATE_TIME) + return 1; + + return 0; }
static void sched_power_work(struct kthread_work *work) @@ -139,15 +146,13 @@ static void sched_power_work(struct kthread_work *work) raw_spin_lock(&cpower->update_lock); w = cpower->weight; req = cpower->req; - cpower->req.time = 0; + cpower->req.time = sched_clock(); cpower->weight = req.weight; raw_spin_unlock(&cpower->update_lock);
- if (should_update_next_weight(req.time)) { - pr_info("cpower req poped\n"); - thermal_cpu_cdev_set_weight(req.cpu, req.weight); - need_update = true; - } + pr_info("cpower req poped\n"); + thermal_cpu_cdev_set_weight(req.cpu, req.weight); + need_update = true; }
if (need_update) @@ -176,7 +181,7 @@ static void sched_power_update(struct update_sched_power *update, int cpu, return;
/* Filter to frequent changes or not needed*/ - if (!should_update_next_weight(time)) + if (!should_update_next_weight(time, flags)) return;
sp = cpower->sched_power; @@ -185,6 +190,7 @@ static void sched_power_update(struct update_sched_power *update, int cpu, cpower->req.weight = weight; cpower->req.cpu = cpu; cpower->req.time = time; + cpower->req.flags = flags; raw_spin_unlock(&cpower->update_lock);
if (!sp->work_in_progress) { diff --git a/kernel/sched/power.h b/kernel/sched/power.h index f08277efd50d..1992e637d53f 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -32,7 +32,8 @@ struct sched_power { struct power_request { unsigned int weight; int cpu; - int time; + u64 time; + int flags };
struct cpu_power { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c1714ef73669..7c8dea6df31a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2245,6 +2245,7 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned } #endif
+#define SCHED_POWER_FORCE_UPDATE_RT 0x01 #ifdef CONFIG_THERMAL struct update_sched_power { void (*func)(struct update_sched_power *, int, unsigned int, int, int); @@ -2255,7 +2256,7 @@ static inline void sched_power_change_cpu_weight(int cpu, unsigned long weight, int flags) { struct update_sched_power *update; - int time = 0; + u64 time = sched_clock();
update = rcu_dereference_sched(*per_cpu_ptr(&update_cpu_power, cpu));
Small refactoring needed by files from different locations.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/powercap/idle_inject.c | 15 +-------------- include/linux/idle_inject.h | 15 +++++++++++++-- 2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index 24ff2a068978..a019971ad89c 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -39,6 +39,7 @@
#include <linux/cpu.h> #include <linux/hrtimer.h> +#include <linux/idle_inject.h> #include <linux/kthread.h> #include <linux/sched.h> #include <linux/slab.h> @@ -56,20 +57,6 @@ struct idle_inject_thread { int should_run; };
-/** - * struct idle_inject_device - idle injection data - * @timer: idle injection period timer - * @idle_duration_ms: duration of CPU idle time to inject - * @run_duration_ms: duration of CPU run time to allow - * @cpumask: mask of CPUs affected by idle injection - */ -struct idle_inject_device { - struct hrtimer timer; - unsigned int idle_duration_ms; - unsigned int run_duration_ms; - unsigned long int cpumask[0]; -}; - static DEFINE_PER_CPU(struct idle_inject_thread, idle_inject_thread); static DEFINE_PER_CPU(struct idle_inject_device *, idle_inject_device);
diff --git a/include/linux/idle_inject.h b/include/linux/idle_inject.h index bdc0293fb6cb..4c60f91ef7a2 100644 --- a/include/linux/idle_inject.h +++ b/include/linux/idle_inject.h @@ -8,8 +8,19 @@ #ifndef __IDLE_INJECT_H__ #define __IDLE_INJECT_H__
-/* private idle injection device structure */ -struct idle_inject_device; +/** + * struct idle_inject_device - idle injection data + * @timer: idle injection period timer + * @idle_duration_ms: duration of CPU idle time to inject + * @run_duration_ms: duration of CPU run time to allow + * @cpumask: mask of CPUs affected by idle injection + */ +struct idle_inject_device { + struct hrtimer timer; + unsigned int idle_duration_ms; + unsigned int run_duration_ms; + unsigned long int cpumask[0]; +};
struct idle_inject_device *idle_inject_register(struct cpumask *cpumask);
Implent basic function which returns affected CPUs mask for a given CPU in the cooling device.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/thermal_core.c | 18 ++++++++++++++++++ include/linux/cpu_cooling.h | 8 ++++++++ 2 files changed, 26 insertions(+)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 7b998297db20..ecff9740cbf0 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -844,6 +844,24 @@ static void thermal_cpu_cdev_init_weight(int cpu, unsigned int weight) sched_power_cpu_reinit_weight(cpu, weight); }
+struct cpumask *thermal_cpu_cooling_get_cpumask(int cpu) +{ + struct cpumask *cpus = NULL; + struct thermal_cooling_device *cdev; + struct thermal_instance *instance; + struct thermal_zone_device *tz; + + mutex_lock(&thermal_list_lock); + list_for_each_entry(cdev, &thermal_cdev_list, node) { + if (cpufreq_cooling_test_related_cpu(cdev, cpu)) { + /* cpus = cpufreq_cooling_get_related_cpumask(cdev); */ + break; + } + } + + mutex_unlock(&thermal_list_lock); + return cpus; +}
/** * thermal_zone_unbind_cooling_device() - unbind a cooling device from a diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h index c875cde35879..ea609f7bdf55 100644 --- a/include/linux/cpu_cooling.h +++ b/include/linux/cpu_cooling.h @@ -47,6 +47,8 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev);
bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, int cpu); +struct cpumask +*cpufreq_cooling_get_related_cpumask(struct thermal_cooling_device *cdev);
#else /* !CONFIG_CPU_THERMAL */ static inline struct thermal_cooling_device * @@ -67,6 +69,12 @@ bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, { return false; } + +struct cpumask +*cpufreq_cooling_get_related_cpumask(struct thermal_cooling_device *cdev) +{ + return NULL; +} #endif /* CONFIG_CPU_THERMAL */
#if defined(CONFIG_THERMAL_OF) && defined(CONFIG_CPU_THERMAL)
Idle injection is used for keeping the cluster under thermal envelope. Some of the CPUs in the cluster run idle task to limit power consumption on higher OPP. There is one CPU which runs highest speed on that OPP and one task which benefits from more cycles in this period.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 175 +++++++++++++++++++++++++++++++++++++++++++ kernel/sched/power.h | 6 +- 2 files changed, 180 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 28f3b6c8c0a3..c9893c86efe6 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -8,12 +8,14 @@
#include <linux/sched.h> #include <linux/thermal.h> +#include <linux/idle_inject.h>
#include "power.h"
#define THERMAL_REQUEST_KFIFO_SIZE (64 * sizeof(struct power_request)) #define DEFAULT_CPU_WEIGHT 1024 #define MINIMUM_UPDATE_TIME 10000000 /* 10 ms */ +#define MAX_CAPACITY_REQEST_PERIOD 50 /* ms */
static DEFINE_PER_CPU(struct cpu_power, cpu_power); DEFINE_PER_CPU(struct update_sched_power *, update_cpu_power); @@ -118,6 +120,13 @@ int sched_power_cpu_reinit_weight(int cpu, int weight) } EXPORT_SYMBOL_GPL(sched_power_cpu_reinit_weight);
+static int vidle_setup(int cpu, int rate, int period) +{ + + return 0; +} + + //////////////////////////////////////////////////////////////
@@ -132,6 +141,99 @@ static bool should_update_next_weight(u64 time, int flags) return 0; }
+static int play_idle_setup; + +static void sched_power_idle_stop(struct cpu_power *cpower) +{ + + raw_spin_lock(&cpower->update_lock); + cpower->vidle = 0; + cpower->vrun = 0; + raw_spin_unlock(&cpower->update_lock); + + idle_inject_stop(cpower->ii_dev); + idle_inject_set_duration(cpower->ii_dev, 0, 0); +} + +static int sched_power_idle_play(struct cpu_power *cpower, unsigned int period, + unsigned int idle) +{ + unsigned int run; + + if (period <= idle) + return -EINVAL; + + raw_spin_lock(&cpower->update_lock); + cpower->vidle = idle; + cpower->vrun = period - idle; + raw_spin_unlock(&cpower->update_lock); + + idle_inject_set_duration(cpower->ii_dev, cpower->vrun, cpower->vidle); + idle_inject_start(cpower->ii_dev); +} + + + +static u64 cluster_power_budget(struct cpumask *cpus) +{ + + + return 100; +} + +static int +sched_power_reweight_cluster(int cpu, struct cpumask *cpus, unsigned int capacity, + unsigned int period, int flags) +{ + int ret, i; + struct cpu_power *cpower = NULL; + int opp_curr_state, opp_curr_cost; + int opp_next_state, opp_next_cost; + u64 cluster_udget; + u64 total_weight = 0; + + /* opp_next_state = get_opp_for_capacity(cpu, capacity); */ + /* opp_next_cost = get_opp_cost(cpu, opp_next_state); */ + /* */ + /* cluster_budget = cluster_power_budget(cpus); */ + /* */ + /* for_each_cpu(i, cpus) { */ + /* cpower = (&per_cpu(cpu_power, i)); */ + /* raw_spin_lock(&cpower->update_lock); */ + /* total_weight += cpower->weight; */ + /* raw_spin_unlock(&cpower->update_lock); */ + /* } */ + /* */ + /* for_each_cpu(i, cpus) { */ + /* cpower = (&per_cpu(cpu_power, i)); */ + /* raw_spin_lock(&cpower->update_lock); */ + /* budget = cluster_budget * cpower->weight << 10; */ + /* raw_spin_unlock(&cpower->update_lock); */ + /* budget /= total_weight; */ + /* budget >>= 10; */ + /* } */ + + return 0; +} + +static int sched_power_cpu_capacity_request(int cpu, unsigned int capacity, + unsigned int period, int flags) +{ + int ret; + struct cpu_power *cpower; + + if (period > MAX_CAPACITY_REQEST_PERIOD) + return -EINVAL; + + cpower = (&per_cpu(cpu_power, cpu)); + + //for cluster OR system wise + ret = sched_power_reweight_cluster(cpu, cpower->cluster_mask, capacity, + period, flags); + + return ret; +} + static void sched_power_work(struct kthread_work *work) { struct sched_power *sp = container_of(work, struct sched_power, work); @@ -159,6 +261,7 @@ static void sched_power_work(struct kthread_work *work) thermal_all_zones_recalc_power();
sp->work_in_progress = false; + }
static void sched_power_irq_work(struct irq_work *irq_work) @@ -197,6 +300,12 @@ static void sched_power_update(struct update_sched_power *update, int cpu, sp->work_in_progress = true; irq_work_queue(&sp->irq_work); } + + if (!play_idle_setup && cpu == 4) { + play_idle_setup = 1; + idle_inject_set_duration(cpower->ii_dev, 10, 4); + idle_inject_start(cpower->ii_dev); + } }
@@ -266,6 +375,68 @@ static int sched_power_setup(struct sched_power *sp) return 0; }
+static int sched_power_idle_init(struct sched_power *sp) +{ + struct idle_inject_device *ii_dev; + struct cpumask *cpus; + int i, last_cpu; + struct cpu_power *cpower; + + + cpus = kzalloc(cpumask_size(), GFP_KERNEL); + if (!cpus) + return -ENOMEM; + + for_each_possible_cpu(i) { + cpumask_set_cpu(i, cpus); + + ii_dev = idle_inject_register(cpus); + if (IS_ERR_OR_NULL(ii_dev)) { + last_cpu; + goto cleanup; + } + + cpower = (&per_cpu(cpu_power, i)); + raw_spin_lock(&cpower->update_lock); + cpower->ii_dev = ii_dev; + raw_spin_unlock(&cpower->update_lock); + + cpumask_clear_cpu(i, cpus); + } + + kfree(cpus); + return 0; + +cleanup: + kfree(cpus); + + for_each_possible_cpu(i) { + if (i == last_cpu) + break; + + cpower = (&per_cpu(cpu_power, i)); + raw_spin_lock(&cpower->update_lock); + idle_inject_unregister(cpower->ii_dev); + raw_spin_unlock(&cpower->update_lock); + } + + return -ENODEV; +} + +static void sched_power_idle_unregister(struct sched_power *sp) +{ + struct idle_inject_device *ii_dev; + int i, last_cpu; + struct cpu_power *cpower; + + for_each_possible_cpu(i) { + cpower = (&per_cpu(cpu_power, i)); + raw_spin_lock(&cpower->update_lock); + idle_inject_unregister(cpower->ii_dev); + raw_spin_unlock(&cpower->update_lock); + } + +}
static int __init sched_power_init(void) { @@ -277,6 +448,10 @@ static int __init sched_power_init(void)
sched_power_setup(&sched_power);
+ ret = sched_power_idle_init(&sched_power); + if (ret) + sched_power_disable_thread(&sched_power); + return ret; } fs_initcall(sched_power_init); diff --git a/kernel/sched/power.h b/kernel/sched/power.h index 1992e637d53f..8ce1409ea538 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -8,6 +8,8 @@ #ifndef __SCHED_POWER_H__ #define __SCHED_POWER_H__
+#include <linux/idle_inject.h> + #include "sched.h"
// struct update_sched_power { @@ -43,7 +45,7 @@ struct cpu_power { unsigned int vcapacity; int opp_state; u64 opp_power_cost; - unsigned long vidle; + unsigned int vidle; unsigned int vrun; /* from 0..1024 (100%) */ unsigned int weight; /* 0..1024 (100%) */ struct sched_power *sched_power; @@ -51,6 +53,8 @@ struct cpu_power { bool operating; /* lock shared with thermal framework and/or cpufreq */ raw_spinlock_t update_lock; + struct idle_inject_device *ii_dev; + struct cpumask *cluster_mask; };
This patch adds new interface for sched power which controls power allocation for CPUs.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/sched.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7c8dea6df31a..725e0c19ae96 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2246,6 +2246,7 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned #endif
#define SCHED_POWER_FORCE_UPDATE_RT 0x01 +#define SCHED_POWER_CLUSTER_REQ 0x2 #ifdef CONFIG_THERMAL struct update_sched_power { void (*func)(struct update_sched_power *, int, unsigned int, int, int); @@ -2263,7 +2264,17 @@ static inline void sched_power_change_cpu_weight(int cpu, unsigned long weight, if (update) update->func(update, cpu, weight, flags, time); } + +static inline +void sched_power_change_cluster_weight(int cpu, unsigned long weight, int flags) +{ + sched_power_change_cpu_weight(cpu, weight, flags | + SCHED_POWER_CLUSTER_REQ); +} #else static inline void sched_power_change_cpu_weight(int cpu, unsigned int weight, int flags) {} +static inline +void sched_power_change_cluster_weight(int cpu, unsigned int weight, int flags) +{} #endif /* CONFIG_THERMAL */
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/fair.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c03c709ccc68..8b0d693370bb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6358,8 +6358,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f } rcu_read_unlock();
- if (prev_cpu != new_cpu) - sched_power_change_cpu_weight(new_cpu, 512, 0); + if (prev_cpu != new_cpu) { + if (new_cpu == 0) + sched_power_change_cpu_weight(new_cpu, 1000, 0x1); + else if (new_cpu < 4) + sched_power_change_cpu_weight(new_cpu, 700, 0x1); + else + sched_power_change_cluster_weight(new_cpu, 300, 0x2); + } +
return new_cpu; }
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 4 ++-- kernel/sched/sched.h | 15 ++++++++------- 2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index c9893c86efe6..89e447d442cd 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -23,7 +23,7 @@ DEFINE_PER_CPU(struct update_sched_power *, update_cpu_power); static struct sched_power sched_power;
void sched_power_set_update_func(int cpu, struct update_sched_power *update, - void (*fn)(struct update_sched_power *, int, unsigned int, int, + void (*fn)(struct update_sched_power *, int, int, int, int)) {
@@ -274,7 +274,7 @@ static void sched_power_irq_work(struct irq_work *irq_work) }
static void sched_power_update(struct update_sched_power *update, int cpu, - unsigned int weight, int flags, int time) + int weight, int flags, int time) { struct cpu_power *cpower = container_of(update, struct cpu_power, update_power); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 725e0c19ae96..ebaf42e81afa 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2249,12 +2249,12 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned #define SCHED_POWER_CLUSTER_REQ 0x2 #ifdef CONFIG_THERMAL struct update_sched_power { - void (*func)(struct update_sched_power *, int, unsigned int, int, int); + void (*func)(struct update_sched_power *, int, int, int, int); }; DECLARE_PER_CPU(struct update_sched_power *, update_cpu_power);
-static inline void sched_power_change_cpu_weight(int cpu, unsigned long weight, - int flags) +static inline +void sched_power_change_cpu_weight(int cpu, int weight, int flags) { struct update_sched_power *update; u64 time = sched_clock(); @@ -2266,15 +2266,16 @@ static inline void sched_power_change_cpu_weight(int cpu, unsigned long weight, }
static inline -void sched_power_change_cluster_weight(int cpu, unsigned long weight, int flags) +void sched_power_change_cluster_weight(int cpu, int weight, int flags) { sched_power_change_cpu_weight(cpu, weight, flags | SCHED_POWER_CLUSTER_REQ); } #else -static inline void sched_power_change_cpu_weight(int cpu, unsigned int weight, - int flags) {} static inline -void sched_power_change_cluster_weight(int cpu, unsigned int weight, int flags) +void sched_power_change_cpu_weight(int cpu, int weight, int flags) {} + +static inline +void sched_power_change_cluster_weight(int cpu, int weight, int flags) {} #endif /* CONFIG_THERMAL */
The update function must handle a few types of requests which are reconized based on 'flags'. Add switch which trigger different paths in the code.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 33 ++++++++++++++++++++++++++------- kernel/sched/sched.h | 6 ++++-- 2 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 89e447d442cd..808d8db5c02d 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -273,6 +273,21 @@ static void sched_power_irq_work(struct irq_work *irq_work) kthread_queue_work(&power->worker, &power->work); }
+static void +sched_power_update_cpu_weight(struct cpu_power *cpower, int cpu, int weight, + int flags, int time) +{ + struct sched_power *sp; + + raw_spin_lock(&cpower->update_lock); + cpower->req.weight = weight; + cpower->req.cpu = cpu; + cpower->req.time = time; + cpower->req.flags = flags; + raw_spin_unlock(&cpower->update_lock); +} + + static void sched_power_update(struct update_sched_power *update, int cpu, int weight, int flags, int time) { @@ -287,14 +302,18 @@ static void sched_power_update(struct update_sched_power *update, int cpu, if (!should_update_next_weight(time, flags)) return;
- sp = cpower->sched_power; + switch (flags) { + case SCHED_POWER_CPU_WEIGHT: + sched_power_update_cpu_weight(cpower, cpu, weight, flags, time); + break; + case SCHED_POWER_CLUSTER_WEIGHT:
- raw_spin_lock(&cpower->update_lock); - cpower->req.weight = weight; - cpower->req.cpu = cpu; - cpower->req.time = time; - cpower->req.flags = flags; - raw_spin_unlock(&cpower->update_lock); + break; + default: + return; + } + + sp = cpower->sched_power;
if (!sp->work_in_progress) { sp->work_in_progress = true; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ebaf42e81afa..0b8f8505d3bc 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2246,7 +2246,9 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned #endif
#define SCHED_POWER_FORCE_UPDATE_RT 0x01 -#define SCHED_POWER_CLUSTER_REQ 0x2 +#define SCHED_POWER_CPU_WEIGHT 0x2 +#define SCHED_POWER_CLUSTER_WEIGHT 0x4 + #ifdef CONFIG_THERMAL struct update_sched_power { void (*func)(struct update_sched_power *, int, int, int, int); @@ -2269,7 +2271,7 @@ static inline void sched_power_change_cluster_weight(int cpu, int weight, int flags) { sched_power_change_cpu_weight(cpu, weight, flags | - SCHED_POWER_CLUSTER_REQ); + SCHED_POWER_CLUSTER_WEIGHT); } #else static inline
Introduce thermal governor which has abilities to serve scheduler needs. Basic algorithm tracks current temperature and the limit. Based on this difference estimated power budget is calculated.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/Kconfig | 5 ++ drivers/thermal/fair_share.c | 3 +- drivers/thermal/gov_bang_bang.c | 2 +- drivers/thermal/hisi_thermal.c | 3 +- drivers/thermal/of-thermal.c | 3 +- drivers/thermal/power_allocator.c | 3 +- drivers/thermal/qoriq_thermal.c | 3 +- drivers/thermal/rcar_gen3_thermal.c | 3 +- drivers/thermal/samsung/exynos_tmu.c | 3 +- drivers/thermal/step_wise.c | 3 +- drivers/thermal/tegra/soctherm.c | 2 +- drivers/thermal/thermal_core.c | 7 +- drivers/thermal/thermal_helpers.c | 3 +- drivers/thermal/thermal_sysfs.c | 3 +- drivers/thermal/uniphier_thermal.c | 3 +- drivers/thermal/user_space.c | 3 +- include/linux/idle_inject.h | 24 +++++ .../thermal => include/linux}/thermal_core.h | 8 ++ kernel/sched/power.c | 88 +++++++++++++++++++ 19 files changed, 145 insertions(+), 27 deletions(-) rename {drivers/thermal => include/linux}/thermal_core.h (94%)
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig index 0e69edc77d18..208acacbc311 100644 --- a/drivers/thermal/Kconfig +++ b/drivers/thermal/Kconfig @@ -116,6 +116,11 @@ config THERMAL_DEFAULT_GOV_POWER_ALLOCATOR
endchoice
+config THERMAL_GOV_SCHED_POWER + bool "Sched power thermal governor" + help + Enable this to manage platform thermals using sched power governor. + config THERMAL_GOV_FAIR_SHARE bool "Fair-share thermal governor" help diff --git a/drivers/thermal/fair_share.c b/drivers/thermal/fair_share.c index d3469fbc5207..ecd52fee77bf 100644 --- a/drivers/thermal/fair_share.c +++ b/drivers/thermal/fair_share.c @@ -23,10 +23,9 @@ */
#include <linux/thermal.h> +#include <linux/thermal_core.h> #include <trace/events/thermal.h>
-#include "thermal_core.h" - /** * get_trip_level: - obtains the current trip level for a zone * @tz: thermal zone device diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c index fc5e5057f0de..8f94e3285283 100644 --- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -20,8 +20,8 @@ */
#include <linux/thermal.h> +#include <linux/thermal_core.h>
-#include "thermal_core.h"
static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip) { diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index 761d0559c268..d79db9c04917 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -24,8 +24,7 @@ #include <linux/platform_device.h> #include <linux/io.h> #include <linux/of_device.h> - -#include "thermal_core.h" +#include <linux/thermal_core.h>
#define HI6220_TEMP0_LAG (0x0) #define HI6220_TEMP0_TH (0x4) diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c index 1825d30e9b0b..f7049ee16e98 100644 --- a/drivers/thermal/of-thermal.c +++ b/drivers/thermal/of-thermal.c @@ -6,6 +6,7 @@ * Copyright (C) 2013 Eduardo Valentin eduardo.valentin@ti.com */ #include <linux/thermal.h> +#include <linux/thermal_core.h> #include <linux/slab.h> #include <linux/types.h> #include <linux/of_device.h> @@ -14,8 +15,6 @@ #include <linux/export.h> #include <linux/string.h>
-#include "thermal_core.h" - /*** Private data structures to represent thermal device tree data ***/
/** diff --git a/drivers/thermal/power_allocator.c b/drivers/thermal/power_allocator.c index 3055f9a12a17..1e67f32613c6 100644 --- a/drivers/thermal/power_allocator.c +++ b/drivers/thermal/power_allocator.c @@ -18,12 +18,11 @@ #include <linux/rculist.h> #include <linux/slab.h> #include <linux/thermal.h> +#include <linux/thermal_core.h>
#define CREATE_TRACE_POINTS #include <trace/events/thermal_power_allocator.h>
-#include "thermal_core.h" - #define INVALID_TRIP -1
#define FRAC_BITS 10 diff --git a/drivers/thermal/qoriq_thermal.c b/drivers/thermal/qoriq_thermal.c index 450ed66edf58..c8268a0b5ea7 100644 --- a/drivers/thermal/qoriq_thermal.c +++ b/drivers/thermal/qoriq_thermal.c @@ -9,8 +9,7 @@ #include <linux/of.h> #include <linux/of_address.h> #include <linux/thermal.h> - -#include "thermal_core.h" +#include <linux/thermal_core.h>
#define SITES_MAX 16
diff --git a/drivers/thermal/rcar_gen3_thermal.c b/drivers/thermal/rcar_gen3_thermal.c index 7aed5337bdd3..62e793b96f0a 100644 --- a/drivers/thermal/rcar_gen3_thermal.c +++ b/drivers/thermal/rcar_gen3_thermal.c @@ -17,8 +17,7 @@ #include <linux/spinlock.h> #include <linux/sys_soc.h> #include <linux/thermal.h> - -#include "thermal_core.h" +#include <linux/thermal_core.h>
/* Register offsets */ #define REG_GEN3_IRQSTR 0x04 diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c index 48eef552cba4..e6c2ddd96487 100644 --- a/drivers/thermal/samsung/exynos_tmu.c +++ b/drivers/thermal/samsung/exynos_tmu.c @@ -34,11 +34,10 @@ #include <linux/of_irq.h> #include <linux/platform_device.h> #include <linux/regulator/consumer.h> +#include <linux/thermal_core.h>
#include <dt-bindings/thermal/thermal_exynos.h>
-#include "../thermal_core.h" - /* Exynos generic registers */ #define EXYNOS_TMU_REG_TRIMINFO 0x0 #define EXYNOS_TMU_REG_CONTROL 0x20 diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c index ee047ca43084..e51c5b642e56 100644 --- a/drivers/thermal/step_wise.c +++ b/drivers/thermal/step_wise.c @@ -23,10 +23,9 @@ */
#include <linux/thermal.h> +#include <linux/thermal_core.h> #include <trace/events/thermal.h>
-#include "thermal_core.h" - /* * If the temperature is higher than a trip point, * a. if the trend is THERMAL_TREND_RAISING, use higher cooling diff --git a/drivers/thermal/tegra/soctherm.c b/drivers/thermal/tegra/soctherm.c index ed28110a3535..68ae5c235303 100644 --- a/drivers/thermal/tegra/soctherm.c +++ b/drivers/thermal/tegra/soctherm.c @@ -27,10 +27,10 @@ #include <linux/platform_device.h> #include <linux/reset.h> #include <linux/thermal.h> +#include <linux/thermal_core.h>
#include <dt-bindings/thermal/tegra124-soctherm.h>
-#include "../thermal_core.h" #include "soctherm.h"
#define SENSOR_CONFIG0 0 diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index ecff9740cbf0..106601da9a20 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -17,6 +17,7 @@ #include <linux/kdev_t.h> #include <linux/idr.h> #include <linux/thermal.h> +#include <linux/thermal_core.h> #include <linux/reboot.h> #include <linux/string.h> #include <linux/sched/power.h> @@ -28,7 +29,6 @@ #define CREATE_TRACE_POINTS #include <trace/events/thermal.h>
-#include "thermal_core.h" #include "thermal_hwmon.h"
MODULE_AUTHOR("Zhang Rui"); @@ -249,6 +249,10 @@ static int __init thermal_register_governors(void) { int result;
+ result = thermal_gov_sched_power_register(); + if (result) + return result; + result = thermal_gov_step_wise_register(); if (result) return result; @@ -270,6 +274,7 @@ static int __init thermal_register_governors(void)
static void thermal_unregister_governors(void) { + thermal_gov_sched_power_unregister(); thermal_gov_step_wise_unregister(); thermal_gov_fair_share_unregister(); thermal_gov_bang_bang_unregister(); diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c index 2ba756af76b7..fb6c979bf4f7 100644 --- a/drivers/thermal/thermal_helpers.c +++ b/drivers/thermal/thermal_helpers.c @@ -17,11 +17,10 @@ #include <linux/err.h> #include <linux/slab.h> #include <linux/string.h> +#include <linux/thermal_core.h>
#include <trace/events/thermal.h>
-#include "thermal_core.h" - int get_tz_trend(struct thermal_zone_device *tz, int trip) { enum thermal_trend trend; diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c index 2241ceae7d7f..fb625f6522e2 100644 --- a/drivers/thermal/thermal_sysfs.c +++ b/drivers/thermal/thermal_sysfs.c @@ -18,8 +18,7 @@ #include <linux/slab.h> #include <linux/string.h> #include <linux/jiffies.h> - -#include "thermal_core.h" +#include <linux/thermal_core.h>
/* sys I/F for thermal zone */
diff --git a/drivers/thermal/uniphier_thermal.c b/drivers/thermal/uniphier_thermal.c index 55477d74d591..ee0397b9c5d3 100644 --- a/drivers/thermal/uniphier_thermal.c +++ b/drivers/thermal/uniphier_thermal.c @@ -27,8 +27,7 @@ #include <linux/platform_device.h> #include <linux/regmap.h> #include <linux/thermal.h> - -#include "thermal_core.h" +#include <linux/thermal_core.h>
/* * block registers diff --git a/drivers/thermal/user_space.c b/drivers/thermal/user_space.c index 8e92a06ef48a..4a6a80ddb556 100644 --- a/drivers/thermal/user_space.c +++ b/drivers/thermal/user_space.c @@ -23,10 +23,9 @@ */
#include <linux/thermal.h> +#include <linux/thermal_core.h> #include <linux/slab.h>
-#include "thermal_core.h" - /** * notify_user_space - Notifies user space about thermal events * @tz - thermal_zone_device diff --git a/include/linux/idle_inject.h b/include/linux/idle_inject.h index 4c60f91ef7a2..90583e37022c 100644 --- a/include/linux/idle_inject.h +++ b/include/linux/idle_inject.h @@ -22,6 +22,7 @@ struct idle_inject_device { unsigned long int cpumask[0]; };
+#ifdef CONFIG_IDLE_INJECT struct idle_inject_device *idle_inject_register(struct cpumask *cpumask);
void idle_inject_unregister(struct idle_inject_device *ii_dev); @@ -37,4 +38,27 @@ void idle_inject_set_duration(struct idle_inject_device *ii_dev, void idle_inject_get_duration(struct idle_inject_device *ii_dev, unsigned int *run_duration_ms, unsigned int *idle_duration_ms); +#else /* CONFIG_IDLE_INJECT */ +struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) +{ + return NULL; +} + +void idle_inject_unregister(struct idle_inject_device *ii_dev) {} + +int idle_inject_start(struct idle_inject_device *ii_dev) +{ + return -ENODEV; +} + +void idle_inject_stop(struct idle_inject_device *ii_dev) {} + +void idle_inject_set_duration(struct idle_inject_device *ii_dev, + unsigned int run_duration_ms, + unsigned int idle_duration_ms) {} + +void idle_inject_get_duration(struct idle_inject_device *ii_dev, + unsigned int *run_duration_ms, + unsigned int *idle_duration_ms) {} +#endif /* CONFIG_IDLE_INJECT */ #endif /* __IDLE_INJECT_H__ */ diff --git a/drivers/thermal/thermal_core.h b/include/linux/thermal_core.h similarity index 94% rename from drivers/thermal/thermal_core.h rename to include/linux/thermal_core.h index 66527810a06c..adff231eec2a 100644 --- a/drivers/thermal/thermal_core.h +++ b/include/linux/thermal_core.h @@ -76,6 +76,14 @@ thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, unsigned long new_state) {} #endif /* CONFIG_THERMAL_STATISTICS */
+#ifdef CONFIG_THERMAL_GOV_SCHED_POWER +int thermal_gov_sched_power_register(void); +void thermal_gov_sched_power_unregister(void); +#else +static inline int thermal_gov_sched_power_register(void) { return 0; } +static inline void thermal_gov_sched_power_unregister(void) {} +#endif /* CONFIG_THERMAL_GOV_SCHED_POWER */ + #ifdef CONFIG_THERMAL_GOV_STEP_WISE int thermal_gov_step_wise_register(void); void thermal_gov_step_wise_unregister(void); diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 808d8db5c02d..1976e5db8efc 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -8,6 +8,7 @@
#include <linux/sched.h> #include <linux/thermal.h> +#include <linux/thermal_core.h> #include <linux/idle_inject.h>
#include "power.h" @@ -474,3 +475,90 @@ static int __init sched_power_init(void) return ret; } fs_initcall(sched_power_init); + + + +/////////////////thermal governor//////////////////////// + +static int cdev_get_min_power(struct thermal_cooling_device *cdev, + u64 *min_power) +{ + return 0; +} + +static u64 estimate_total_min_power(struct thermal_zone_device *tz, int trip) +{ + u64 total_min_power = 0; + u64 min_power; + struct thermal_instance *inst; + + list_for_each_entry(inst, &tz->thermal_instances, tz_node) { + cdev_get_min_power(inst->cdev, &min_power); + total_min_power += min_power; + } + + return total_min_power; +} + +static u64 calc_power_budget(struct thermal_zone_device *tz, int desire_temp) +{ + s64 temp_diff; + s64 power_budget; + + /* temperature is represented in milidegress */ + temp_diff = desire_temp - tz->temperature; + + power_budget = temp_diff; + + power_budget = max(0, power_budget); + + return power_budget; +} + +static int controller_prepare_coefficents(struct thermal_zone_device *tz, + int trip) +{ + return 0; +} + +static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) +{ + + return 0; +} + +static int sched_power_gov_bind(struct thermal_zone_device *tz) +{ + struct thermal_instance *inst; + + list_for_each_entry(inst, &tz->thermal_instances, tz_node) { + + + } + + return 0; +} + +static void sched_power_gov_unbind(struct thermal_zone_device *tz) +{ + +} + +static struct thermal_governor sched_power_gov = { + .name = "sched_power", + .bind_to_tz = sched_power_gov_bind, + .unbind_from_tz = sched_power_gov_unbind, + .throttle = sched_power_gov_throttle, +}; + +int thermal_gov_sched_power_register(void) +{ + return thermal_register_governor(&sched_power_gov); +} + +void thermal_gov_sched_power_unregister(void) +{ + thermal_unregister_governor(&sched_power_gov); +} + +/////////////////////////////////////////////////////////
The patch add fnction which provides information which CPUs are pinned to the related cpufreq cooling device.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/cpu_cooling.c | 16 ++++++++++++++++ include/linux/cpu_cooling.h | 6 ++++++ 2 files changed, 22 insertions(+)
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index 0360cdd0826b..f27f1636803c 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -755,6 +755,22 @@ bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, } EXPORT_SYMBOL_GPL(cpufreq_cooling_test_related_cpu);
+void cpufreq_cooling_copy_cpumask(struct thermal_cooling_device *cdev, + struct cpumask *dstmask) +{ + struct cpufreq_cooling_device *cpufreq_cdev; + struct cpufreq_policy *policy; + + if (cdev && cdev->devdata) { + cpufreq_cdev = cdev->devdata; + if (cpufreq_cdev->policy) { + policy = cpufreq_cdev->policy; + cpumask_copy(dstmask, policy->related_cpus); + } + } +} +EXPORT_SYMBOL_GPL(cpufreq_cooling_copy_cpumask); + /** * of_cpufreq_cooling_register - function to create cpufreq cooling device. * @policy: cpufreq policy diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h index ea609f7bdf55..fee20e0a5fed 100644 --- a/include/linux/cpu_cooling.h +++ b/include/linux/cpu_cooling.h @@ -50,6 +50,9 @@ bool cpufreq_cooling_test_related_cpu(struct thermal_cooling_device *cdev, struct cpumask *cpufreq_cooling_get_related_cpumask(struct thermal_cooling_device *cdev);
+void cpufreq_cooling_copy_cpumask(struct thermal_cooling_device *cdev, + struct cpumask *dstmask); + #else /* !CONFIG_CPU_THERMAL */ static inline struct thermal_cooling_device * cpufreq_cooling_register(struct cpufreq_policy *policy) @@ -75,6 +78,9 @@ struct cpumask { return NULL; } + +void cpufreq_cooling_copy_cpumask(struct thermal_cooling_device *cdev, + struct cpumask *dstmask) {} #endif /* CONFIG_CPU_THERMAL */
#if defined(CONFIG_THERMAL_OF) && defined(CONFIG_CPU_THERMAL)
The patch populated cooling devices for clusters and CPUs.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 158 +++++++++++++++++++++++++++++++++++++++++++ kernel/sched/power.h | 4 ++ 2 files changed, 162 insertions(+)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 1976e5db8efc..22d5758b89ef 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -5,6 +5,7 @@ * Copyright (C) 2018 Samsung */
+#define pr_fmt(fmt) "SCHED_POWER: " fmt
#include <linux/sched.h> #include <linux/thermal.h> @@ -480,6 +481,28 @@ fs_initcall(sched_power_init);
/////////////////thermal governor////////////////////////
+struct _thermal_zone { + struct thermal_zone_device *tz; + bool single_cooling_dev; + struct list_head node; +}; + +struct _cooling_dev { + struct thermal_cooling_device *cdev; + struct list_head node; + int max_single_state; /* max state (0-highest) which means freqency */ + int min_sum_single_idle; /* minimum sum of idle calculated for 'single' + zone. Other zones should not go bellow this + value for the same 'state' (frequency) */ + bool cpu_dev; + unsigned long int cpus[0]; +}; + +static LIST_HEAD(tz_list); +static LIST_HEAD(cdev_list); +static DEFINE_MUTEX(tz_list_lock); +static DEFINE_MUTEX(cdev_list_lock); + static int cdev_get_min_power(struct thermal_cooling_device *cdev, u64 *min_power) { @@ -521,22 +544,157 @@ static int controller_prepare_coefficents(struct thermal_zone_device *tz, return 0; }
+static struct _thermal_zone *find_zone(struct thermal_zone_device *tz) +{ + struct _thermal_zone *zone; + + mutex_lock(&tz_list_lock); + list_for_each_entry(zone, &tz_list, node) { + if (zone->tz == tz) { + mutex_unlock(&tz_list_lock); + return zone; + } + } + mutex_unlock(&tz_list_lock); + + return NULL; +} +static int inject_more_idle(struct _thermal_zone *zone) +{ + + return 0; +} + +static int throttle_single_cdev(struct _thermal_zone *zone) +{ + + + return 0; +} + static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) { + struct thermal_cooling_device *cdev; + struct thermal_instance *inst; + u32 dev_power; + struct _thermal_zone *zone; + int ret; + + zone = find_zone(tz); + if (!zone) + return -EINVAL; + + if (zone->single_cooling_dev) { + /* ret = calc_power_budget(tz); */ + ret = inject_more_idle(zone); + if (ret) { + throttle_single_cdev(zone); + power_actor_set_power(cdev, inst, dev_power); + } + } else { + + } +
return 0; }
+static inline bool is_cpufreq_cooling(struct thermal_cooling_device *cdev) +{ + const char dev_name[] = "thermal-cpufreq-"; + + return strncmp(dev_name, cdev->type, strlen(dev_name)) ? + false : true; +} + +static void cleanup_percpu_cooling_dev(struct _cooling_dev *cooling) +{ + int i; + struct cpumask *cpumask; + struct cpu_power *cpower; + + cpumask = to_cpumask(cooling->cpus); + for_each_cpu(i, cpumask) { + cpower = (&per_cpu(cpu_power, i)); + raw_spin_lock(&cpower->update_lock); + cpower->cooling = NULL; + raw_spin_unlock(&cpower->update_lock); + } + kfree(cooling); +} + static int sched_power_gov_bind(struct thermal_zone_device *tz) { struct thermal_instance *inst; + struct _thermal_zone *zone; + struct _cooling_dev *cooling, *prev_cooling; + struct thermal_cooling_device *cdev; + int i = 0; + int cpu; + int cpus_size; + struct cpumask *cpumask; + struct cpu_power *cpower; + + zone = kzalloc(sizeof(*zone), GFP_KERNEL); + if (!zone) + return -ENOMEM;
+ mutex_lock(&cdev_list_lock); list_for_each_entry(inst, &tz->thermal_instances, tz_node) { + cdev = inst->cdev; + if (is_cpufreq_cooling(cdev)) + cpus_size = cpumask_size(); + else + cpus_size = 0; + + cooling = kzalloc(sizeof(*cooling) + cpus_size, GFP_KERNEL); + if (!cooling) + goto cleanup; + + cooling->cpu_dev = !!cpus_size; + cooling->cdev = cdev; + if (cooling->cpu_dev) { + cpumask = to_cpumask(cooling->cpus); + + cpufreq_cooling_copy_cpumask(cdev, cpumask); + + for_each_cpu(cpu, cpumask) { + cpower = (&per_cpu(cpu_power, cpu)); + raw_spin_lock(&cpower->update_lock); + cpower->cooling = cooling; + raw_spin_unlock(&cpower->update_lock); + } + } + list_add(&cooling->node, &cdev_list);
+ pr_info("registred new cooling device for CPUs\n");
+ prev_cooling = cooling; + i++; } + mutex_unlock(&cdev_list_lock); + + zone->single_cooling_dev = (i == 1 ? true : false); + zone->tz = tz; + + mutex_lock(&tz_list_lock); + list_add(&zone->node, &tz_list); + mutex_unlock(&tz_list_lock);
return 0; + +cleanup: + list_for_each_entry_reverse(cooling, &prev_cooling->node, node) { + if (i-- == 0) + break; + list_del(&prev_cooling->node); + cleanup_percpu_cooling_dev(prev_cooling); + prev_cooling = cooling; + } + mutex_unlock(&cdev_list_lock); + kfree(zone); + + return -ENOMEM; }
static void sched_power_gov_unbind(struct thermal_zone_device *tz) diff --git a/kernel/sched/power.h b/kernel/sched/power.h index 8ce1409ea538..da969b9bc30f 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -9,6 +9,7 @@ #define __SCHED_POWER_H__
#include <linux/idle_inject.h> +#include <linux/cpu_cooling.h>
#include "sched.h"
@@ -38,6 +39,8 @@ struct power_request { int flags };
+struct _cooling_dev; + struct cpu_power { struct update_sched_power update_power; unsigned int max_capacity; @@ -51,6 +54,7 @@ struct cpu_power { struct sched_power *sched_power; struct power_request req; bool operating; + struct _cooling_dev *cooling; /* lock shared with thermal framework and/or cpufreq */ raw_spinlock_t update_lock; struct idle_inject_device *ii_dev;
Check if the instance is not pointing to the already known cooling device. Skip registration of instances, take into account only real cooling devices.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 60 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 58 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 22d5758b89ef..d2ba44befaae 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -485,6 +485,7 @@ struct _thermal_zone { struct thermal_zone_device *tz; bool single_cooling_dev; struct list_head node; + struct list_head cooling_list; };
struct _cooling_dev { @@ -498,6 +499,11 @@ struct _cooling_dev { unsigned long int cpus[0]; };
+struct _cooling_instance { + struct list_head node; + struct _cooling_dev *cooling; +}; + static LIST_HEAD(tz_list); static LIST_HEAD(cdev_list); static DEFINE_MUTEX(tz_list_lock); @@ -623,11 +629,37 @@ static void cleanup_percpu_cooling_dev(struct _cooling_dev *cooling) kfree(cooling); }
+/* called with 'cdev_list_lock' taken */ +static struct _cooling_dev +*find_registered_cooling_dev(struct thermal_cooling_device *cdev) +{ + struct _cooling_dev *cooling; + + list_for_each_entry(cooling, &cdev_list, node) + if (cdev == cooling->cdev) + return cooling; + + return NULL; +} + +static bool cdev_in_instance_list(struct _thermal_zone *zone, + struct thermal_cooling_device *cdev) +{ + struct _cooling_instance *inst; + + list_for_each_entry(inst, &zone->cooling_list, node) + if (cdev == inst->cooling->cdev) + return true; + + return false; +} + static int sched_power_gov_bind(struct thermal_zone_device *tz) { struct thermal_instance *inst; struct _thermal_zone *zone; struct _cooling_dev *cooling, *prev_cooling; + struct _cooling_instance *_inst, *tmp; struct thermal_cooling_device *cdev; int i = 0; int cpu; @@ -639,9 +671,19 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) if (!zone) return -ENOMEM;
+ INIT_LIST_HEAD(&zone->cooling_list); + mutex_lock(&cdev_list_lock); list_for_each_entry(inst, &tz->thermal_instances, tz_node) { cdev = inst->cdev; + + if (cdev_in_instance_list(zone, cdev)) + continue; + + cooling = find_registered_cooling_dev(cdev); + if (cooling) + goto handle_cooling_instance; + if (is_cpufreq_cooling(cdev)) cpus_size = cpumask_size(); else @@ -666,11 +708,20 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) } } list_add(&cooling->node, &cdev_list); - - pr_info("registred new cooling device for CPUs\n"); + pr_info("created new CPU cooling device\n");
prev_cooling = cooling; i++; + +handle_cooling_instance: + _inst = kmalloc(sizeof(*_inst), GFP_KERNEL); + if (!_inst) + goto cleanup; + + _inst->cooling = cooling; + list_add(&_inst->node, &zone->cooling_list); + + pr_info("pinned cooling device into zone\n"); } mutex_unlock(&cdev_list_lock);
@@ -692,6 +743,11 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) prev_cooling = cooling; } mutex_unlock(&cdev_list_lock); + + list_for_each_entry_safe(_inst, tmp, &zone->cooling_list, node) { + list_del(&_inst->node); + kfree(_inst); + } kfree(zone);
return -ENOMEM;
It will be squashed
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 54 +++++++++++++++++++------------------------- kernel/sched/power.h | 2 +- 2 files changed, 24 insertions(+), 32 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index d2ba44befaae..327f49a670d2 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -122,12 +122,6 @@ int sched_power_cpu_reinit_weight(int cpu, int weight) } EXPORT_SYMBOL_GPL(sched_power_cpu_reinit_weight);
-static int vidle_setup(int cpu, int rate, int period) -{ - - return 0; -} -
//////////////////////////////////////////////////////////////
@@ -160,8 +154,6 @@ static void sched_power_idle_stop(struct cpu_power *cpower) static int sched_power_idle_play(struct cpu_power *cpower, unsigned int period, unsigned int idle) { - unsigned int run; - if (period <= idle) return -EINVAL;
@@ -187,13 +179,13 @@ static int sched_power_reweight_cluster(int cpu, struct cpumask *cpus, unsigned int capacity, unsigned int period, int flags) { - int ret, i; - struct cpu_power *cpower = NULL; - int opp_curr_state, opp_curr_cost; - int opp_next_state, opp_next_cost; - u64 cluster_udget; - u64 total_weight = 0; - + /* int ret, i; */ + /* struct cpu_power *cpower = NULL; */ + /* int opp_curr_state, opp_curr_cost; */ + /* int opp_next_state, opp_next_cost; */ + /* u64 cluster_udget; */ + /* u64 total_weight = 0; */ + /* */ /* opp_next_state = get_opp_for_capacity(cpu, capacity); */ /* opp_next_cost = get_opp_cost(cpu, opp_next_state); */ /* */ @@ -279,8 +271,6 @@ static void sched_power_update_cpu_weight(struct cpu_power *cpower, int cpu, int weight, int flags, int time) { - struct sched_power *sp; - raw_spin_lock(&cpower->update_lock); cpower->req.weight = weight; cpower->req.cpu = cpu; @@ -400,7 +390,7 @@ static int sched_power_idle_init(struct sched_power *sp) { struct idle_inject_device *ii_dev; struct cpumask *cpus; - int i, last_cpu; + int i, last_cpu = 0; struct cpu_power *cpower;
@@ -413,7 +403,6 @@ static int sched_power_idle_init(struct sched_power *sp)
ii_dev = idle_inject_register(cpus); if (IS_ERR_OR_NULL(ii_dev)) { - last_cpu; goto cleanup; }
@@ -423,6 +412,7 @@ static int sched_power_idle_init(struct sched_power *sp) raw_spin_unlock(&cpower->update_lock);
cpumask_clear_cpu(i, cpus); + last_cpu = i; }
kfree(cpus); @@ -446,8 +436,7 @@ static int sched_power_idle_init(struct sched_power *sp)
static void sched_power_idle_unregister(struct sched_power *sp) { - struct idle_inject_device *ii_dev; - int i, last_cpu; + int i; struct cpu_power *cpower;
for_each_possible_cpu(i) { @@ -539,7 +528,7 @@ static u64 calc_power_budget(struct thermal_zone_device *tz, int desire_temp)
power_budget = temp_diff;
- power_budget = max(0, power_budget); + power_budget = max(0LL, power_budget);
return power_budget; } @@ -578,10 +567,11 @@ static int throttle_single_cdev(struct _thermal_zone *zone) return 0; }
+ static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) { - struct thermal_cooling_device *cdev; - struct thermal_instance *inst; + struct thermal_cooling_device *cdev = NULL; + struct thermal_instance *inst = NULL; u32 dev_power; struct _thermal_zone *zone; int ret; @@ -658,7 +648,7 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) { struct thermal_instance *inst; struct _thermal_zone *zone; - struct _cooling_dev *cooling, *prev_cooling; + struct _cooling_dev *cooling, *prev_cooling = NULL; struct _cooling_instance *_inst, *tmp; struct thermal_cooling_device *cdev; int i = 0; @@ -735,12 +725,14 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) return 0;
cleanup: - list_for_each_entry_reverse(cooling, &prev_cooling->node, node) { - if (i-- == 0) - break; - list_del(&prev_cooling->node); - cleanup_percpu_cooling_dev(prev_cooling); - prev_cooling = cooling; + if (prev_cooling) { + list_for_each_entry_reverse(cooling, &prev_cooling->node, node) { + if (i-- == 0) + break; + list_del(&prev_cooling->node); + cleanup_percpu_cooling_dev(prev_cooling); + prev_cooling = cooling; + } } mutex_unlock(&cdev_list_lock);
diff --git a/kernel/sched/power.h b/kernel/sched/power.h index da969b9bc30f..7fafaa9f6609 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -36,7 +36,7 @@ struct power_request { unsigned int weight; int cpu; u64 time; - int flags + int flags; };
struct _cooling_dev;
Change calculation parameters
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 327f49a670d2..c66c8dca2465 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -518,13 +518,14 @@ static u64 estimate_total_min_power(struct thermal_zone_device *tz, int trip) return total_min_power; }
-static u64 calc_power_budget(struct thermal_zone_device *tz, int desire_temp) +static u32 calc_power_budget(struct _thermal_zone *zone, int control_temp) { s64 temp_diff; s64 power_budget; + struct thermal_zone_device *tz = zone->tz;
/* temperature is represented in milidegress */ - temp_diff = desire_temp - tz->temperature; + temp_diff = control_temp - tz->temperature;
power_budget = temp_diff;
@@ -554,7 +555,7 @@ static struct _thermal_zone *find_zone(struct thermal_zone_device *tz)
return NULL; } -static int inject_more_idle(struct _thermal_zone *zone) +static int inject_more_idle(struct _thermal_zone *zone, u32 power_budget) {
return 0; @@ -573,6 +574,8 @@ static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) struct thermal_cooling_device *cdev = NULL; struct thermal_instance *inst = NULL; u32 dev_power; + u32 power_budget; + int control_temp = 0; struct _thermal_zone *zone; int ret;
@@ -580,9 +583,11 @@ static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) if (!zone) return -EINVAL;
+ if (zone->single_cooling_dev) { - /* ret = calc_power_budget(tz); */ - ret = inject_more_idle(zone); + power_budget = calc_power_budget(zone, control_temp); + /* only deadline tasks can ask for over-speed with idlers */ + ret = inject_more_idle(zone, power_budget); if (ret) { throttle_single_cdev(zone); power_actor_set_power(cdev, inst, dev_power);
The patch adds support of a new flag comming from DT trip point description. The flag is used by control algorithm for thresholds: 'switch_on' and 'desired' temperature.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- drivers/thermal/of-thermal.c | 22 ++++++++++++++++++++++ include/linux/thermal.h | 9 +++++++++ 2 files changed, 31 insertions(+)
diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c index f7049ee16e98..a746ce323ff6 100644 --- a/drivers/thermal/of-thermal.c +++ b/drivers/thermal/of-thermal.c @@ -288,6 +288,20 @@ static int of_thermal_get_trip_type(struct thermal_zone_device *tz, int trip, return 0; }
+static int +of_thermal_get_trip_ctrl_alg_flag(struct thermal_zone_device *tz,int trip, + enum thermal_trip_ctrl_alg *flag) +{ + struct __thermal_zone *data = tz->devdata; + + if (trip >= data->ntrips || trip < 0) + return -EDOM; + + *flag = data->trips[trip].alg_flag; + + return 0; +} + static int of_thermal_get_trip_temp(struct thermal_zone_device *tz, int trip, int *temp) { @@ -370,6 +384,7 @@ static struct thermal_zone_device_ops of_thermal_ops = { .set_mode = of_thermal_set_mode,
.get_trip_type = of_thermal_get_trip_type, + .get_trip_ctrl_alg_flag = of_thermal_get_trip_ctrl_alg_flag, .get_trip_temp = of_thermal_get_trip_temp, .set_trip_temp = of_thermal_set_trip_temp, .get_trip_hyst = of_thermal_get_trip_hyst, @@ -778,6 +793,13 @@ static int thermal_of_populate_trip(struct device_node *np, return ret; }
+ ret = of_property_read_u32(np, "ctrl-alg", &prop); + if (ret < 0) { + trip->alg_flag = THERMAL_TRIP_CTRL_ALG_NONE; + } else { + trip->alg_flag = (enum thermal_trip_ctrl_alg) prop; + } + /* Required for cooling map matching */ trip->np = np; of_node_get(np); diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 0b47cb72b96e..419cf0f0de27 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -70,6 +70,12 @@ enum thermal_trip_type { THERMAL_TRIP_CRITICAL, };
+enum thermal_trip_ctrl_alg { + THERMAL_TRIP_CTRL_ALG_NONE = 0, + THERMAL_TRIP_CTRL_ALG_SWITCH_ON = 1, + THERMAL_TRIP_CTRL_ALG_DESIRED = 2, +}; + enum thermal_trend { THERMAL_TREND_STABLE, /* temperature is stable */ THERMAL_TREND_RAISING, /* temperature is raising */ @@ -103,6 +109,8 @@ struct thermal_zone_device_ops { enum thermal_device_mode); int (*get_trip_type) (struct thermal_zone_device *, int, enum thermal_trip_type *); + int (*get_trip_ctrl_alg_flag) (struct thermal_zone_device *, int, + enum thermal_trip_ctrl_alg *); int (*get_trip_temp) (struct thermal_zone_device *, int, int *); int (*set_trip_temp) (struct thermal_zone_device *, int, int); int (*get_trip_hyst) (struct thermal_zone_device *, int, int *); @@ -374,6 +382,7 @@ struct thermal_trip { int temperature; int hysteresis; enum thermal_trip_type type; + enum thermal_trip_ctrl_alg alg_flag; };
/* Function declarations */
Adds implementation of trip flags and temeprature for 'switch_on' and 'desired' temperature for control algorithm.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 78 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 77 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index c66c8dca2465..3ae497d2ba34 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -470,11 +470,19 @@ fs_initcall(sched_power_init);
/////////////////thermal governor////////////////////////
+struct trip_ctrl_alg { + int switch_on_id; + int switch_on_temp; + int desired_id; + int desired_temp; +}; + struct _thermal_zone { struct thermal_zone_device *tz; bool single_cooling_dev; struct list_head node; struct list_head cooling_list; + struct trip_ctrl_alg trip_ctrl_alg; };
struct _cooling_dev { @@ -649,6 +657,68 @@ static bool cdev_in_instance_list(struct _thermal_zone *zone, return false; }
+static int sched_power_setup_trips(struct _thermal_zone *zone) +{ + int i; + struct thermal_zone_device *tz = zone->tz; + enum thermal_trip_ctrl_alg alg_flag; + int ret; + int temp; + + zone->trip_ctrl_alg.switch_on_id = -EINVAL; + zone->trip_ctrl_alg.desired_id = -EINVAL; + + for (i = 0; i < tz->trips; i++) { + ret = tz->ops->get_trip_ctrl_alg_flag(tz, i, + &alg_flag); + if (ret < 0) + continue; + + switch (alg_flag) { + case THERMAL_TRIP_CTRL_ALG_SWITCH_ON: + zone->trip_ctrl_alg.switch_on_id = i; + break; + case THERMAL_TRIP_CTRL_ALG_DESIRED: + zone->trip_ctrl_alg.desired_id = i; + break; + default: break; + } + } + + if (zone->trip_ctrl_alg.desired_id >= 0) { + int trip; + + trip = zone->trip_ctrl_alg.desired_id; + ret = tz->ops->get_trip_temp(tz, trip, &temp); + if (!ret) { + zone->trip_ctrl_alg.desired_temp = temp; + } else { + zone->trip_ctrl_alg.desired_id = -EINVAL; + pr_warn("missing 'desired' temp\n"); + } + + trip = zone->trip_ctrl_alg.switch_on_id; + if (trip >= 0) { + ret = tz->ops->get_trip_temp(tz, trip, &temp); + if (!ret) { + zone->trip_ctrl_alg.switch_on_temp = temp; + } else { + zone->trip_ctrl_alg.switch_on_id = -EINVAL; + pr_warn("missing desired' temp\n"); + } + } else { + pr_warn("missing 'swith_on' temp\n"); + } + } + + if (zone->trip_ctrl_alg.desired_id < 0) { + pr_warn("could not find temperature settings\n"); + ret = -EINVAL; + } + + return 0; +} + static int sched_power_gov_bind(struct thermal_zone_device *tz) { struct thermal_instance *inst; @@ -656,7 +726,7 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) struct _cooling_dev *cooling, *prev_cooling = NULL; struct _cooling_instance *_inst, *tmp; struct thermal_cooling_device *cdev; - int i = 0; + int i = 0, ret; int cpu; int cpus_size; struct cpumask *cpumask; @@ -723,6 +793,12 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) zone->single_cooling_dev = (i == 1 ? true : false); zone->tz = tz;
+ ret = sched_power_setup_trips(zone); + if (ret < 0) { + pr_info("lack of temp settings in DT allowing to control \ + the algorithm\n"); + } + mutex_lock(&tz_list_lock); list_add(&zone->node, &tz_list); mutex_unlock(&tz_list_lock);
The flags enable sched_power to work with these thermal zones.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index 5beb4538dfdc..c58edfbce239 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -23,6 +23,7 @@ thermal-zones { temperature = <70000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ type = "active"; + ctrl-alg = <1>; }; atlas0_alert_2: atlas0-alert-2 { temperature = <75000>; /* millicelsius */ @@ -43,6 +44,7 @@ thermal-zones { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ type = "active"; + ctrl-alg = <2>; }; atlas0_alert_6: atlas0-alert-6 { temperature = <95000>; /* millicelsius */ @@ -195,6 +197,7 @@ thermal-zones { temperature = <75000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ type = "active"; + ctrl-alg = <1>; }; apollo_alert_3: apollo-alert-3 { temperature = <80000>; /* millicelsius */ @@ -210,6 +213,7 @@ thermal-zones { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ type = "active"; + ctrl-alg = <2>; }; apollo_alert_6: apollo-alert-6 { temperature = <95000>; /* millicelsius */ @@ -304,11 +308,13 @@ thermal-zones { temperature = <55000>; hysteresis = <1000>; type = "passive"; + ctrl-alg = <1>; }; target: target { temperature = <70000>; hysteresis = <1000>; type = "passive"; + ctrl-alg = <2>; }; };
Simple implementation of throttling the CPUs.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 136 +++++++++++++++++++++++++++++++++++++------ kernel/sched/power.h | 3 + 2 files changed, 120 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 3ae497d2ba34..f74c59e83d4f 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -480,6 +480,7 @@ struct trip_ctrl_alg { struct _thermal_zone { struct thermal_zone_device *tz; bool single_cooling_dev; + int num_cooling; struct list_head node; struct list_head cooling_list; struct trip_ctrl_alg trip_ctrl_alg; @@ -499,6 +500,7 @@ struct _cooling_dev { struct _cooling_instance { struct list_head node; struct _cooling_dev *cooling; + u32 weight; };
static LIST_HEAD(tz_list); @@ -526,14 +528,15 @@ static u64 estimate_total_min_power(struct thermal_zone_device *tz, int trip) return total_min_power; }
-static u32 calc_power_budget(struct _thermal_zone *zone, int control_temp) +static u32 calc_power_budget(struct _thermal_zone *zone) { s64 temp_diff; + int desired_temp = zone->trip_ctrl_alg.desired_temp; s64 power_budget; struct thermal_zone_device *tz = zone->tz;
/* temperature is represented in milidegress */ - temp_diff = control_temp - tz->temperature; + temp_diff = desired_temp - tz->temperature;
power_budget = temp_diff;
@@ -576,14 +579,111 @@ static int throttle_single_cdev(struct _thermal_zone *zone) return 0; }
+static int cooling_dev_set_state(struct _thermal_zone *zone, + struct _cooling_dev *cooling, + unsigned long target) +{ + struct thermal_cooling_device *cdev = cooling->cdev; + unsigned long curr_state; + + cdev->ops->get_cur_state(cdev, &curr_state); + + if (curr_state == target) + return 0; + + /* check if we can it go with higher freq for zone with a few devices*/ + if (!zone->single_cooling_dev && cooling->max_single_state > target) + return -EINVAL; + + if (zone->single_cooling_dev && cooling->max_single_state > target) + cooling->max_single_state = target; + + mutex_lock(&cdev->lock); + if (!cdev->ops->set_cur_state(cdev, target)) + thermal_cooling_device_stats_update(cdev, target); + + cdev->updated = true; + mutex_unlock(&cdev->lock); + + return 0; +} + +static int set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, + u32 power) +{ + int ret; + unsigned long state; + struct thermal_cooling_device *cdev; + struct thermal_zone_device *tz = zone->tz; + + cdev = inst->cooling->cdev; + ret = cdev->ops->power2state(cdev, tz, power, &state); + + pr_info("set_power=%u, state=%lu, temp=%d\n", power, state, + tz->temperature); + if (!ret) { + ret = cooling_dev_set_state(zone, inst->cooling, state); + } + + return ret; +} + +static int get_requested_power(struct _cooling_instance *inst, + struct _thermal_zone *zone, u32 *power) +{ + struct thermal_cooling_device *cdev; + struct thermal_zone_device *tz = zone->tz; + + cdev = inst->cooling->cdev; + return cdev->ops->get_requested_power(cdev, tz, power); +} + +static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) +{ + struct _cooling_instance *inst; + u32 *power; + u64 sum_power = 0; + int i = 0; + + power = kzalloc(sizeof(u32) * zone->num_cooling, GFP_KERNEL); + if (!power) + return -ENOMEM; + + /* estimate cooling dev's power and total power */ + list_for_each_entry(inst, &zone->cooling_list, node) { + get_requested_power(inst, zone, &power[i]); + + power[i] = (inst->weight * power[i]) >> 10; + sum_power += power[i]; + i++; + } + + if (sum_power <= 0) + goto cleanup; + + /* split power budget to cooling devices */ + i = 0; + list_for_each_entry(inst, &zone->cooling_list, node) { + power[i] = (power[i] * power_budget) / sum_power; + i++; + } + + /* set the new state for cooling device based on its granted power */ + i = 0; + list_for_each_entry(inst, &zone->cooling_list, node) { + set_power(inst, zone, power[i]); + i++; + } + +cleanup: + kfree(power); + + return 0; +}
static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) { - struct thermal_cooling_device *cdev = NULL; - struct thermal_instance *inst = NULL; - u32 dev_power; u32 power_budget; - int control_temp = 0; struct _thermal_zone *zone; int ret;
@@ -591,19 +691,16 @@ static int sched_power_gov_throttle(struct thermal_zone_device *tz, int trip) if (!zone) return -EINVAL;
+ if (zone->trip_ctrl_alg.desired_id < 0) + return -EINVAL;
- if (zone->single_cooling_dev) { - power_budget = calc_power_budget(zone, control_temp); - /* only deadline tasks can ask for over-speed with idlers */ - ret = inject_more_idle(zone, power_budget); - if (ret) { - throttle_single_cdev(zone); - power_actor_set_power(cdev, inst, dev_power); - } - } else { + /* skip calls from other trip points */ + if (trip != zone->trip_ctrl_alg.desired_id) + return 0;
- } + power_budget = calc_power_budget(zone);
+ ret = share_power_budget(zone, power_budget);
return 0; } @@ -784,7 +881,9 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) goto cleanup;
_inst->cooling = cooling; + _inst->weight = DEFAULT_WEIGHT; list_add(&_inst->node, &zone->cooling_list); + zone->num_cooling++;
pr_info("pinned cooling device into zone\n"); } @@ -795,15 +894,14 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz)
ret = sched_power_setup_trips(zone); if (ret < 0) { - pr_info("lack of temp settings in DT allowing to control \ - the algorithm\n"); + pr_warn("lack of temp settings needed in control algorithm\n"); }
mutex_lock(&tz_list_lock); list_add(&zone->node, &tz_list); mutex_unlock(&tz_list_lock);
- return 0; + return ret;
cleanup: if (prev_cooling) { diff --git a/kernel/sched/power.h b/kernel/sched/power.h index 7fafaa9f6609..77516a6f5809 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -17,6 +17,9 @@ // void (*func)(struct update_sched_power *, int, unsigned int, int); // };
+#define MAX_WEIGHT 1024 +#define DEFAULT_WEIGHT MAX_WEIGHT + struct power_budget { s64 temp; s64 temp_limit;
The patch adds simple implementation of control algorithm based on PI with additional decaying for integral.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 26 ++++++++++++++++++++++++-- kernel/sched/power.h | 2 ++ 2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index f74c59e83d4f..20f5618d8f2b 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -475,6 +475,9 @@ struct trip_ctrl_alg { int switch_on_temp; int desired_id; int desired_temp; + int k_p; + int k_i; + s64 integral; };
struct _thermal_zone { @@ -532,16 +535,33 @@ static u32 calc_power_budget(struct _thermal_zone *zone) { s64 temp_diff; int desired_temp = zone->trip_ctrl_alg.desired_temp; - s64 power_budget; + s64 power_budget, p, i; + s64 decay; + int k_i = zone->trip_ctrl_alg.k_i; struct thermal_zone_device *tz = zone->tz;
/* temperature is represented in milidegress */ temp_diff = desired_temp - tz->temperature;
- power_budget = temp_diff; + p = temp_diff * zone->trip_ctrl_alg.k_p; + + i = zone->trip_ctrl_alg.integral * k_i; + + if (temp_diff < 0) { + s64 i_0 = i + k_i * temp_diff; + + zone->trip_ctrl_alg.integral += temp_diff; + } + + power_budget = p + i; + power_budget >>= 10;
power_budget = max(0LL, power_budget);
+ /* decay of 1/8 (~12.5%), after a while will reach 0 */ + decay = zone->trip_ctrl_alg.integral >> 3; + zone->trip_ctrl_alg.integral -= decay; + return power_budget; }
@@ -764,6 +784,8 @@ static int sched_power_setup_trips(struct _thermal_zone *zone)
zone->trip_ctrl_alg.switch_on_id = -EINVAL; zone->trip_ctrl_alg.desired_id = -EINVAL; + zone->trip_ctrl_alg.k_p = DEFAULT_K_P; + zone->trip_ctrl_alg.k_i = DEFAULT_K_I;
for (i = 0; i < tz->trips; i++) { ret = tz->ops->get_trip_ctrl_alg_flag(tz, i, diff --git a/kernel/sched/power.h b/kernel/sched/power.h index 77516a6f5809..340b3924ecd7 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -19,6 +19,8 @@
#define MAX_WEIGHT 1024 #define DEFAULT_WEIGHT MAX_WEIGHT +#define DEFAULT_K_P 500 +#define DEFAULT_K_I 50
struct power_budget { s64 temp;
Change to passive with polling mode.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- .../arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 78 +++++++++---------- 1 file changed, 39 insertions(+), 39 deletions(-)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index c58edfbce239..11322bf8aa5f 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -11,45 +11,45 @@ thermal-zones { atlas0_thermal: atlas0-thermal { thermal-sensors = <&tmu_atlas0>; - polling-delay-passive = <0>; - polling-delay = <0>; + polling-delay-passive = <100>; + polling-delay = <1000>; trips { atlas0_alert_0: atlas0-alert-0 { temperature = <65000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas0_alert_1: atlas0-alert-1 { temperature = <70000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; ctrl-alg = <1>; }; atlas0_alert_2: atlas0-alert-2 { temperature = <75000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas0_alert_3: atlas0-alert-3 { temperature = <80000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas0_alert_4: atlas0-alert-4 { temperature = <85000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas0_alert_5: atlas0-alert-5 { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; ctrl-alg = <2>; }; atlas0_alert_6: atlas0-alert-6 { temperature = <95000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; };
@@ -100,37 +100,37 @@ thermal-zones { atlas1_alert_0: atlas1-alert-0 { temperature = <65000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas1_alert_1: atlas1-alert-1 { temperature = <70000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas1_alert_2: atlas1-alert-2 { temperature = <75000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas1_alert_3: atlas1-alert-3 { temperature = <80000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas1_alert_4: atlas1-alert-4 { temperature = <85000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas1_alert_5: atlas1-alert-5 { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; atlas1_alert_6: atlas1-alert-6 { temperature = <95000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; }; }; @@ -143,82 +143,82 @@ thermal-zones { g3d_alert_0: g3d-alert-0 { temperature = <70000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; g3d_alert_1: g3d-alert-1 { temperature = <75000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; g3d_alert_2: g3d-alert-2 { temperature = <80000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; g3d_alert_3: g3d-alert-3 { temperature = <85000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; g3d_alert_4: g3d-alert-4 { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; g3d_alert_5: g3d-alert-5 { temperature = <95000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; g3d_alert_6: g3d-alert-6 { temperature = <100000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; }; };
apollo_thermal: apollo-thermal { thermal-sensors = <&tmu_apollo>; - polling-delay-passive = <0>; - polling-delay = <0>; + polling-delay-passive = <100>; + polling-delay = <1000>; trips { apollo_alert_0: apollo-alert-0 { temperature = <65000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; apollo_alert_1: apollo-alert-1 { temperature = <70000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; apollo_alert_2: apollo-alert-2 { temperature = <75000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; ctrl-alg = <1>; }; apollo_alert_3: apollo-alert-3 { temperature = <80000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; apollo_alert_4: apollo-alert-4 { temperature = <85000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; apollo_alert_5: apollo-alert-5 { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; ctrl-alg = <2>; }; apollo_alert_6: apollo-alert-6 { temperature = <95000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; };
@@ -259,37 +259,37 @@ thermal-zones { isp_alert_0: isp-alert-0 { temperature = <80000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; isp_alert_1: isp-alert-1 { temperature = <85000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; isp_alert_2: isp-alert-2 { temperature = <90000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; isp_alert_3: isp-alert-3 { temperature = <95000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; isp_alert_4: isp-alert-4 { temperature = <100000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; isp_alert_5: isp-alert-5 { temperature = <105000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; isp_alert_6: isp-alert-6 { temperature = <110000>; /* millicelsius */ hysteresis = <1000>; /* millicelsius */ - type = "active"; + type = "passive"; }; }; };
In controler algorithm take into account current overshout error and don't wait till next period.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 20f5618d8f2b..8bd8502681e3 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -548,7 +548,7 @@ static u32 calc_power_budget(struct _thermal_zone *zone) i = zone->trip_ctrl_alg.integral * k_i;
if (temp_diff < 0) { - s64 i_0 = i + k_i * temp_diff; + i += k_i * temp_diff;
zone->trip_ctrl_alg.integral += temp_diff; }
Skip broken GPU sensor.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index 11322bf8aa5f..9cae1006af3b 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -298,7 +298,7 @@ thermal-zones { container; thermal-sensors = <&vtsens>; #thermal-subzone-cells = <1>; - subzones = <&apollo_thermal 50 &atlas0_thermal 100 &g3d_thermal 100>; + subzones = <&apollo_thermal 50 &atlas0_thermal 100>; polling-delay = <1000>; polling-delay-passive = <100>; sustainable-power = <2500>;
Add basic support for handling cluster weight reqest.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 134 ++++++++++++++++++++++++++++--------------- kernel/sched/power.h | 36 ++++++++++++ 2 files changed, 123 insertions(+), 47 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 8bd8502681e3..6c72a9bee8ac 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -22,6 +22,11 @@ static DEFINE_PER_CPU(struct cpu_power, cpu_power); DEFINE_PER_CPU(struct update_sched_power *, update_cpu_power);
+static LIST_HEAD(tz_list); +static LIST_HEAD(cdev_list); +static DEFINE_MUTEX(tz_list_lock); +static DEFINE_MUTEX(cdev_list_lock); + static struct sched_power sched_power;
void sched_power_set_update_func(int cpu, struct update_sched_power *update, @@ -176,7 +181,7 @@ static u64 cluster_power_budget(struct cpumask *cpus) }
static int -sched_power_reweight_cluster(int cpu, struct cpumask *cpus, unsigned int capacity, +sched_power_reweight_cpu(int cpu, struct cpumask *cpus, unsigned int capacity, unsigned int period, int flags) { /* int ret, i; */ @@ -210,6 +215,43 @@ sched_power_reweight_cluster(int cpu, struct cpumask *cpus, unsigned int capacit return 0; }
+static int sched_power_reweight_cluster(int cpu, struct cpumask *cpus, + int weight, int flags) +{ + struct _thermal_zone *zone, *system_zone = NULL; + struct _cooling_dev *cooling; + struct _cooling_instance *inst, *cluster_inst = NULL; + struct cpumask *cpumask; + + mutex_lock(&tz_list_lock); + list_for_each_entry(zone, &tz_list, node) { + if (zone->single_cooling_dev) + continue; + list_for_each_entry(inst, &zone->cooling_list, node) { + cpumask = to_cpumask(inst->cooling->cpus); + + if (cpumask_test_cpu(cpu, cpumask)) { + system_zone = zone; + cluster_inst = inst; + break; + } + } + } + mutex_unlock(&tz_list_lock); + + if (system_zone) { + mutex_lock(&system_zone->lock); + cluster_inst->weight = weight; + mutex_unlock(&system_zone->lock); + thermal_notify_framework(system_zone->tz, + system_zone->trip_ctrl_alg.desired_id); + } else { + return -ENODEV; + } + + return 0; +} + static int sched_power_cpu_capacity_request(int cpu, unsigned int capacity, unsigned int period, int flags) { @@ -222,16 +264,38 @@ static int sched_power_cpu_capacity_request(int cpu, unsigned int capacity, cpower = (&per_cpu(cpu_power, cpu));
//for cluster OR system wise - ret = sched_power_reweight_cluster(cpu, cpower->cluster_mask, capacity, + ret = sched_power_reweight_cpu(cpu, cpower->cluster_mask, capacity, period, flags);
return ret; }
+static int sched_power_handle_request(struct power_request *req) +{ + int ret; + struct cpu_power *cpower; + + cpower = (&per_cpu(cpu_power, req->cpu)); + + switch (req->flags) { + case SCHED_POWER_CPU_WEIGHT: + break; + case SCHED_POWER_CLUSTER_WEIGHT: + ret = sched_power_reweight_cluster(req->cpu, + cpower->cluster_mask, + req->weight, req->flags); + break; + default: + return; + } + + return 0; +} + static void sched_power_work(struct kthread_work *work) { struct sched_power *sp = container_of(work, struct sched_power, work); - int i; + int i, ret; struct cpu_power *cpower = NULL; struct power_request req; unsigned int w; @@ -240,15 +304,21 @@ static void sched_power_work(struct kthread_work *work) for_each_online_cpu(i) { cpower = (&per_cpu(cpu_power, i)); raw_spin_lock(&cpower->update_lock); - w = cpower->weight; + if (!cpower->req.time) { + raw_spin_unlock(&cpower->update_lock); + continue; + } req = cpower->req; - cpower->req.time = sched_clock(); + w = cpower->weight; + cpower->req.time = 0; cpower->weight = req.weight; raw_spin_unlock(&cpower->update_lock);
pr_info("cpower req poped\n"); - thermal_cpu_cdev_set_weight(req.cpu, req.weight); - need_update = true; + /* thermal_cpu_cdev_set_weight(req.cpu, req.weight); */ + ret = sched_power_handle_request(&req); + if (!ret) + need_update = true; }
if (need_update) @@ -470,46 +540,7 @@ fs_initcall(sched_power_init);
/////////////////thermal governor////////////////////////
-struct trip_ctrl_alg { - int switch_on_id; - int switch_on_temp; - int desired_id; - int desired_temp; - int k_p; - int k_i; - s64 integral; -}; - -struct _thermal_zone { - struct thermal_zone_device *tz; - bool single_cooling_dev; - int num_cooling; - struct list_head node; - struct list_head cooling_list; - struct trip_ctrl_alg trip_ctrl_alg; -}; - -struct _cooling_dev { - struct thermal_cooling_device *cdev; - struct list_head node; - int max_single_state; /* max state (0-highest) which means freqency */ - int min_sum_single_idle; /* minimum sum of idle calculated for 'single' - zone. Other zones should not go bellow this - value for the same 'state' (frequency) */ - bool cpu_dev; - unsigned long int cpus[0]; -}; - -struct _cooling_instance { - struct list_head node; - struct _cooling_dev *cooling; - u32 weight; -};
-static LIST_HEAD(tz_list); -static LIST_HEAD(cdev_list); -static DEFINE_MUTEX(tz_list_lock); -static DEFINE_MUTEX(cdev_list_lock);
static int cdev_get_min_power(struct thermal_cooling_device *cdev, u64 *min_power) @@ -669,14 +700,22 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) if (!power) return -ENOMEM;
- /* estimate cooling dev's power and total power */ + /* The calculation two-loops split is needed due to taking + zone->lock only for protecting 'weights'. */ list_for_each_entry(inst, &zone->cooling_list, node) { get_requested_power(inst, zone, &power[i]); + i++; + }
+ mutex_lock(&zone->lock); + /* estimate cooling dev's power and total power */ + i = 0; + list_for_each_entry(inst, &zone->cooling_list, node) { power[i] = (inst->weight * power[i]) >> 10; sum_power += power[i]; i++; } + mutex_unlock(&zone->lock);
if (sum_power <= 0) goto cleanup; @@ -856,6 +895,7 @@ static int sched_power_gov_bind(struct thermal_zone_device *tz) return -ENOMEM;
INIT_LIST_HEAD(&zone->cooling_list); + mutex_init(&zone->lock);
mutex_lock(&cdev_list_lock); list_for_each_entry(inst, &tz->thermal_instances, tz_node) { diff --git a/kernel/sched/power.h b/kernel/sched/power.h index 340b3924ecd7..c7e92295cab1 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -66,5 +66,41 @@ struct cpu_power { struct cpumask *cluster_mask; };
+struct trip_ctrl_alg { + int switch_on_id; + int switch_on_temp; + int desired_id; + int desired_temp; + int k_p; + int k_i; + s64 integral; +}; + +struct _thermal_zone { + struct thermal_zone_device *tz; + bool single_cooling_dev; + int num_cooling; + struct list_head node; + struct list_head cooling_list; + struct trip_ctrl_alg trip_ctrl_alg; + struct mutex lock; +}; + +struct _cooling_dev { + struct thermal_cooling_device *cdev; + struct list_head node; + int max_single_state; /* max state (0-highest) which means freqency */ + int min_sum_single_idle; /* minimum sum of idle calculated for 'single' + zone. Other zones should not go bellow this + value for the same 'state' (frequency) */ + bool cpu_dev; + unsigned long int cpus[0]; +}; + +struct _cooling_instance { + struct list_head node; + struct _cooling_dev *cooling; + u32 weight; +};
#endif
Setting 1 allows to finish calculation and throttle devices.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 6c72a9bee8ac..5639e6c2825d 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -707,6 +707,7 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) i++; }
+ /* protect 'weights' changes in the background */ mutex_lock(&zone->lock); /* estimate cooling dev's power and total power */ i = 0; @@ -717,8 +718,8 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) } mutex_unlock(&zone->lock);
- if (sum_power <= 0) - goto cleanup; + if (sum_power == 0) + sum_power = 1;
/* split power budget to cooling devices */ i = 0; @@ -727,6 +728,8 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) i++; }
+ /* clamp max possible power for devices and re-share the rest */ + /* set the new state for cooling device based on its granted power */ i = 0; list_for_each_entry(inst, &zone->cooling_list, node) { @@ -734,7 +737,6 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) i++; }
-cleanup: kfree(power);
return 0;
Trigger later the algorithm in sched_power.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi index 9cae1006af3b..c17b1ad4493e 100644 --- a/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi +++ b/arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi @@ -305,13 +305,13 @@ thermal-zones {
trips { threshold: threshold { - temperature = <55000>; + temperature = <70000>; hysteresis = <1000>; type = "passive"; ctrl-alg = <1>; }; target: target { - temperature = <70000>; + temperature = <80000>; hysteresis = <1000>; type = "passive"; ctrl-alg = <2>;
Fix values of the flag.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 28 +++++++++++++++++----------- kernel/sched/sched.h | 6 +++--- 2 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 5639e6c2825d..fded1ba78b5e 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -131,15 +131,15 @@ EXPORT_SYMBOL_GPL(sched_power_cpu_reinit_weight); //////////////////////////////////////////////////////////////
-static bool should_update_next_weight(u64 time, int flags) +static bool should_process_next_request(u64 time, int flags) { if (flags & SCHED_POWER_FORCE_UPDATE_RT) return 1;
- if (time >= sched_clock() + MINIMUM_UPDATE_TIME) - return 1; + /* if (time >= sched_clock() + MINIMUM_UPDATE_TIME) */ + /* return 1; */
- return 0; + return 1; }
static int play_idle_setup; @@ -240,12 +240,14 @@ static int sched_power_reweight_cluster(int cpu, struct cpumask *cpus, mutex_unlock(&tz_list_lock);
if (system_zone) { + pr_info("zone found\n"); mutex_lock(&system_zone->lock); cluster_inst->weight = weight; mutex_unlock(&system_zone->lock); thermal_notify_framework(system_zone->tz, system_zone->trip_ctrl_alg.desired_id); } else { + pr_info("no such zone\n"); return -ENODEV; }
@@ -277,6 +279,7 @@ static int sched_power_handle_request(struct power_request *req)
cpower = (&per_cpu(cpu_power, req->cpu));
+ pr_info("req->flags=%d\n", req->flags); switch (req->flags) { case SCHED_POWER_CPU_WEIGHT: break; @@ -338,7 +341,7 @@ static void sched_power_irq_work(struct irq_work *irq_work) }
static void -sched_power_update_cpu_weight(struct cpu_power *cpower, int cpu, int weight, +sched_power_update_weight(struct cpu_power *cpower, int cpu, int weight, int flags, int time) { raw_spin_lock(&cpower->update_lock); @@ -361,15 +364,17 @@ static void sched_power_update(struct update_sched_power *update, int cpu, return;
/* Filter to frequent changes or not needed*/ - if (!should_update_next_weight(time, flags)) + if (!should_process_next_request(time, flags)) return;
+ flags &= 0x3; + switch (flags) { case SCHED_POWER_CPU_WEIGHT: - sched_power_update_cpu_weight(cpower, cpu, weight, flags, time); + sched_power_update_weight(cpower, cpu, weight, flags, time); break; case SCHED_POWER_CLUSTER_WEIGHT: - + sched_power_update_weight(cpower, cpu, weight, flags, time); break; default: return; @@ -382,7 +387,8 @@ static void sched_power_update(struct update_sched_power *update, int cpu, irq_work_queue(&sp->irq_work); }
- if (!play_idle_setup && cpu == 4) { + if (!play_idle_setup) { + pr_info("play idle demo\n"); play_idle_setup = 1; idle_inject_set_duration(cpower->ii_dev, 10, 4); idle_inject_start(cpower->ii_dev); @@ -670,8 +676,8 @@ static int set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, cdev = inst->cooling->cdev; ret = cdev->ops->power2state(cdev, tz, power, &state);
- pr_info("set_power=%u, state=%lu, temp=%d\n", power, state, - tz->temperature); + pr_info("inst_weight=%u, set_power=%u, state=%lu, temp=%d\n", + inst->weight, power, state, tz->temperature); if (!ret) { ret = cooling_dev_set_state(zone, inst->cooling, state); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0b8f8505d3bc..838bb7d318be 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2245,9 +2245,9 @@ unsigned long scale_irq_capacity(unsigned long util, unsigned long irq, unsigned } #endif
-#define SCHED_POWER_FORCE_UPDATE_RT 0x01 -#define SCHED_POWER_CPU_WEIGHT 0x2 -#define SCHED_POWER_CLUSTER_WEIGHT 0x4 +#define SCHED_POWER_CPU_WEIGHT 0x1 +#define SCHED_POWER_CLUSTER_WEIGHT 0x2 +#define SCHED_POWER_FORCE_UPDATE_RT 0x04
#ifdef CONFIG_THERMAL struct update_sched_power {
Improve the algorithm of sharing the power budget to cooling devices.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 69 +++++++++++++++++++++++++++++++++++++------- kernel/sched/power.h | 8 +++++ 2 files changed, 67 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index fded1ba78b5e..b0e103695fa2 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -665,7 +665,7 @@ static int cooling_dev_set_state(struct _thermal_zone *zone, return 0; }
-static int set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, +static int _set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, u32 power) { int ret; @@ -685,7 +685,7 @@ static int set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, return ret; }
-static int get_requested_power(struct _cooling_instance *inst, +static int _get_requested_power(struct _cooling_instance *inst, struct _thermal_zone *zone, u32 *power) { struct thermal_cooling_device *cdev; @@ -695,21 +695,40 @@ static int get_requested_power(struct _cooling_instance *inst, return cdev->ops->get_requested_power(cdev, tz, power); }
+static int _get_max_power(struct _cooling_instance *inst, + struct _thermal_zone *zone, u32 *power) +{ + struct thermal_cooling_device *cdev; + struct thermal_zone_device *tz = zone->tz; + + cdev = inst->cooling->cdev; + return cdev->ops->state2power(cdev, tz, MAX_POWER_STATE_ID, power); +} + static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) { struct _cooling_instance *inst; - u32 *power; + struct power_info *power, *p; u64 sum_power = 0; + u64 extra_power = 0; + u64 wish_power = 0; + u64 total_weight = 0; + u64 left_power_budget = power_budget; int i = 0; + int ret;
- power = kzalloc(sizeof(u32) * zone->num_cooling, GFP_KERNEL); + power = kzalloc(sizeof(struct power_info) * zone->num_cooling, + GFP_KERNEL); if (!power) return -ENOMEM;
/* The calculation two-loops split is needed due to taking zone->lock only for protecting 'weights'. */ list_for_each_entry(inst, &zone->cooling_list, node) { - get_requested_power(inst, zone, &power[i]); + p = &power[i]; + ret = _get_requested_power(inst, zone, &p->requested); + ret = _get_max_power(inst, zone, &p->max_possible); + /* pr_info("req=%u\n", p->requested); */ i++; }
@@ -718,8 +737,11 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) /* estimate cooling dev's power and total power */ i = 0; list_for_each_entry(inst, &zone->cooling_list, node) { - power[i] = (inst->weight * power[i]) >> 10; - sum_power += power[i]; + p = &power[i]; + p->requested = (inst->weight * p->requested) >> 10; + sum_power += p->requested; + total_weight += inst->weight; + /* pr_info("req=%u\n", p->requested); */ i++; } mutex_unlock(&zone->lock); @@ -730,16 +752,43 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) /* split power budget to cooling devices */ i = 0; list_for_each_entry(inst, &zone->cooling_list, node) { - power[i] = (power[i] * power_budget) / sum_power; + p = &power[i]; + p->requested = (p->requested * power_budget) / sum_power; + /* pr_info("req=%u max=%u\n", p->requested, p->max_possible); */ + if (p->requested > p->max_possible) { + extra_power += p->requested - p->max_possible; + p->requested = p->max_possible; + } + wish_power += p->max_possible - p->requested; + left_power_budget -= p->requested; i++; }
- /* clamp max possible power for devices and re-share the rest */ + /* re-share the rest extra power to devices according to their wish and + * weight*/ + if ((extra_power || left_power_budget) && wish_power) { + u32 headroom; + extra_power = max(left_power_budget, extra_power); + extra_power = min(wish_power, extra_power); + + i = 0; + list_for_each_entry(inst, &zone->cooling_list, node) { + p = &power[i]; + headroom = p->max_possible - p->requested; + /* headroom *= inst->weight; */ + /* headroom /= total_weight; */ + p->requested += (headroom * extra_power) / wish_power; + /* pr_info("req=%u\n", p->requested); */ + i++; + } + } +
/* set the new state for cooling device based on its granted power */ i = 0; list_for_each_entry(inst, &zone->cooling_list, node) { - set_power(inst, zone, power[i]); + p = &power[i]; + _set_power(inst, zone, p->requested); i++; }
diff --git a/kernel/sched/power.h b/kernel/sched/power.h index c7e92295cab1..1a234a3ef924 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -21,6 +21,9 @@ #define DEFAULT_WEIGHT MAX_WEIGHT #define DEFAULT_K_P 500 #define DEFAULT_K_I 50 +/* MAX_POWER_STATE_ID for current implementation in thermal framework + * 'freq table' (DESC ordered) */ +#define MAX_POWER_STATE_ID 0
struct power_budget { s64 temp; @@ -44,6 +47,11 @@ struct power_request { int flags; };
+struct power_info { + u32 max_possible; + u32 requested; +}; + struct _cooling_dev;
struct cpu_power {
The power which cannot be applied to the cooling device might be re-used in the next phase.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index b0e103695fa2..a630f4a532ef 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -642,6 +642,7 @@ static int cooling_dev_set_state(struct _thermal_zone *zone, { struct thermal_cooling_device *cdev = cooling->cdev; unsigned long curr_state; + int ret = 0;
cdev->ops->get_cur_state(cdev, &curr_state);
@@ -658,11 +659,13 @@ static int cooling_dev_set_state(struct _thermal_zone *zone, mutex_lock(&cdev->lock); if (!cdev->ops->set_cur_state(cdev, target)) thermal_cooling_device_stats_update(cdev, target); + else + ret = -EINVAL;
cdev->updated = true; mutex_unlock(&cdev->lock);
- return 0; + return ret; }
static int _set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, @@ -675,6 +678,8 @@ static int _set_power(struct _cooling_instance *inst, struct _thermal_zone *zone
cdev = inst->cooling->cdev; ret = cdev->ops->power2state(cdev, tz, power, &state); + if (ret) + return ret;
pr_info("inst_weight=%u, set_power=%u, state=%lu, temp=%d\n", inst->weight, power, state, tz->temperature); @@ -788,7 +793,13 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) i = 0; list_for_each_entry(inst, &zone->cooling_list, node) { p = &power[i]; - _set_power(inst, zone, p->requested); + ret = _set_power(inst, zone, p->requested); + if (ret) { + /* TODO: collect power which cannot be used for other + * devices */ + extra_power += p->requested; + p->requested = 0; + } i++; }
The patch adds basic check for possible state change. The 'target' state is compared with maximum allowed for the single cooling device.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 51 +++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 20 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index a630f4a532ef..cf56484f95ca 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -240,14 +240,14 @@ static int sched_power_reweight_cluster(int cpu, struct cpumask *cpus, mutex_unlock(&tz_list_lock);
if (system_zone) { - pr_info("zone found\n"); + /* pr_info("zone found\n"); */ mutex_lock(&system_zone->lock); cluster_inst->weight = weight; mutex_unlock(&system_zone->lock); thermal_notify_framework(system_zone->tz, system_zone->trip_ctrl_alg.desired_id); } else { - pr_info("no such zone\n"); + /* pr_info("no such zone\n"); */ return -ENODEV; }
@@ -279,7 +279,7 @@ static int sched_power_handle_request(struct power_request *req)
cpower = (&per_cpu(cpu_power, req->cpu));
- pr_info("req->flags=%d\n", req->flags); + /* pr_info("req->flags=%d\n", req->flags); */ switch (req->flags) { case SCHED_POWER_CPU_WEIGHT: break; @@ -317,8 +317,7 @@ static void sched_power_work(struct kthread_work *work) cpower->weight = req.weight; raw_spin_unlock(&cpower->update_lock);
- pr_info("cpower req poped\n"); - /* thermal_cpu_cdev_set_weight(req.cpu, req.weight); */ + /* pr_info("cpower req poped\n"); */ ret = sched_power_handle_request(&req); if (!ret) need_update = true; @@ -638,23 +637,32 @@ static int throttle_single_cdev(struct _thermal_zone *zone)
static int cooling_dev_set_state(struct _thermal_zone *zone, struct _cooling_dev *cooling, - unsigned long target) + unsigned long target, unsigned long *set_state) { struct thermal_cooling_device *cdev = cooling->cdev; unsigned long curr_state; int ret = 0;
- cdev->ops->get_cur_state(cdev, &curr_state); + ret = cdev->ops->get_cur_state(cdev, &curr_state); + + if (ret) + return ret;
if (curr_state == target) return 0;
- /* check if we can it go with higher freq for zone with a few devices*/ - if (!zone->single_cooling_dev && cooling->max_single_state > target) - return -EINVAL; - - if (zone->single_cooling_dev && cooling->max_single_state > target) + /* check if we can go with higher freq for zone with a few devices*/ + if (zone->single_cooling_dev) { + /* we treat single-cooling-dev-zone as a guard for max temp */ cooling->max_single_state = target; + } else { + if (cooling->max_single_state < target) { + target = cooling->max_single_state; + ret = -EAGAIN; + } + } + + *set_state = target;
mutex_lock(&cdev->lock); if (!cdev->ops->set_cur_state(cdev, target)) @@ -672,19 +680,22 @@ static int _set_power(struct _cooling_instance *inst, struct _thermal_zone *zone u32 power) { int ret; - unsigned long state; + unsigned long state = 0, target; struct thermal_cooling_device *cdev; struct thermal_zone_device *tz = zone->tz;
cdev = inst->cooling->cdev; - ret = cdev->ops->power2state(cdev, tz, power, &state); - if (ret) - return ret; - - pr_info("inst_weight=%u, set_power=%u, state=%lu, temp=%d\n", - inst->weight, power, state, tz->temperature); + ret = cdev->ops->power2state(cdev, tz, power, &target); if (!ret) { - ret = cooling_dev_set_state(zone, inst->cooling, state); + ret = cooling_dev_set_state(zone, inst->cooling, target, + &state); + if (!ret) + pr_info("inst_weight=%u, set_power=%u, target=%lu, state=%lu, temp=%d\n", + inst->weight, power, target, state, + tz->temperature); + else + pr_info("_set_power: ret=%d\n, target=%lu, state=%lu\n", + ret, target, state); }
return ret;
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index cf56484f95ca..923f689860d1 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -656,7 +656,10 @@ static int cooling_dev_set_state(struct _thermal_zone *zone, /* we treat single-cooling-dev-zone as a guard for max temp */ cooling->max_single_state = target; } else { - if (cooling->max_single_state < target) { + /* Lower 'target' state means higer frequency. Prevent going to + * higer freq if there was a limit due temperature value + * from sensor closer to the device. */ + if (cooling->max_single_state > target) { target = cooling->max_single_state; ret = -EAGAIN; }
When sharing power budget fails, try to recalculate and share again.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- include/trace/events/sched_power.h | 60 ++++++++++++++++++++++++++++++ kernel/sched/power.c | 51 +++++++++++++++++++------ kernel/sched/power.h | 1 + 3 files changed, 100 insertions(+), 12 deletions(-) create mode 100644 include/trace/events/sched_power.h
diff --git a/include/trace/events/sched_power.h b/include/trace/events/sched_power.h new file mode 100644 index 000000000000..ab29da831335 --- /dev/null +++ b/include/trace/events/sched_power.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM sched_power + +#if !defined(_TRACE_SCHED_POWER_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SCHED_POWER_H + +#include <linux/tracepoint.h> + + +TRACE_EVENT(sched_power_set_power, + TP_PROTO(int id, int cid, u32 weight, u32 power, unsigned long target, + unsigned long state, int ret), + TP_ARGS(id, cid, weight, power, target, state, ret), + TP_STRUCT__entry( + __field(int, id ) + __field(int, cid ) + __field(u32, weight ) + __field(u32, power ) + __field(unsigned long, target ) + __field(unsigned long, state ) + __field(int, ret ) + ), + TP_fast_assign( + __entry->id = id; + __entry->cid = cid; + __entry->weight = weight; + __entry->power = power; + __entry->target = target; + __entry->state = state; + __entry->ret = ret; + ), + + TP_printk("zone_id=%d cool_id=%d weight=%u power=%u target=%lu state=%lu ret=%d", + __entry->id, __entry->cid, __entry->weight, + __entry->power, __entry->target, __entry->state, __entry->ret) +); + +TRACE_EVENT(sched_power_calc_budget, + TP_PROTO(int id, int temp, s64 power), + TP_ARGS(id, temp, power), + TP_STRUCT__entry( + __field(int, id) + __field(int, temp) + __field(s32, power) + ), + TP_fast_assign( + __entry->id = id; + __entry->temp = temp; + __entry->power = power; + ), + + TP_printk("zone_id=%d temp=%d power=%d", + __entry->id, __entry->temp, __entry->power) +); + +#endif /* _TRACE_SCHED_POWER_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/kernel/sched/power.c b/kernel/sched/power.c index 923f689860d1..daee84b00b37 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -12,6 +12,9 @@ #include <linux/thermal_core.h> #include <linux/idle_inject.h>
+#define CREATE_TRACE_POINTS +#include <trace/events/sched_power.h> + #include "power.h"
#define THERMAL_REQUEST_KFIFO_SIZE (64 * sizeof(struct power_request)) @@ -598,6 +601,9 @@ static u32 calc_power_budget(struct _thermal_zone *zone) decay = zone->trip_ctrl_alg.integral >> 3; zone->trip_ctrl_alg.integral -= decay;
+ trace_sched_power_calc_budget(zone->tz->id, tz->temperature, + power_budget); + return power_budget; }
@@ -680,10 +686,10 @@ static int cooling_dev_set_state(struct _thermal_zone *zone, }
static int _set_power(struct _cooling_instance *inst, struct _thermal_zone *zone, - u32 power) + u32 power, unsigned long *state) { int ret; - unsigned long state = 0, target; + unsigned long target; struct thermal_cooling_device *cdev; struct thermal_zone_device *tz = zone->tz;
@@ -691,14 +697,11 @@ static int _set_power(struct _cooling_instance *inst, struct _thermal_zone *zone ret = cdev->ops->power2state(cdev, tz, power, &target); if (!ret) { ret = cooling_dev_set_state(zone, inst->cooling, target, - &state); - if (!ret) - pr_info("inst_weight=%u, set_power=%u, target=%lu, state=%lu, temp=%d\n", - inst->weight, power, target, state, - tz->temperature); - else - pr_info("_set_power: ret=%d\n, target=%lu, state=%lu\n", - ret, target, state); + state); + + trace_sched_power_set_power(zone->tz->id, cdev->id, + inst->weight, power, target, *state, + ret); }
return ret; @@ -724,6 +727,17 @@ static int _get_max_power(struct _cooling_instance *inst, return cdev->ops->state2power(cdev, tz, MAX_POWER_STATE_ID, power); }
+static int _get_power_for_state(struct _cooling_instance *inst, + struct _thermal_zone *zone, u32 *power, + unsigned long state) +{ + struct thermal_cooling_device *cdev; + struct thermal_zone_device *tz = zone->tz; + + cdev = inst->cooling->cdev; + return cdev->ops->state2power(cdev, tz, state, power); +} + static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) { struct _cooling_instance *inst; @@ -735,18 +749,24 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) u64 left_power_budget = power_budget; int i = 0; int ret; + unsigned long state = 0; + int rebalance = 0;
power = kzalloc(sizeof(struct power_info) * zone->num_cooling, GFP_KERNEL); if (!power) return -ENOMEM;
+try_balance: + if (rebalance > MAX_REBALANCE_TRIES_POWER_BUDGET) + goto balanced; /* The calculation two-loops split is needed due to taking zone->lock only for protecting 'weights'. */ list_for_each_entry(inst, &zone->cooling_list, node) { p = &power[i]; ret = _get_requested_power(inst, zone, &p->requested); - ret = _get_max_power(inst, zone, &p->max_possible); + if (!rebalance) + ret = _get_max_power(inst, zone, &p->max_possible); /* pr_info("req=%u\n", p->requested); */ i++; } @@ -807,16 +827,23 @@ static int share_power_budget(struct _thermal_zone *zone, u32 power_budget) i = 0; list_for_each_entry(inst, &zone->cooling_list, node) { p = &power[i]; - ret = _set_power(inst, zone, p->requested); + ret = _set_power(inst, zone, p->requested, &state); if (ret) { /* TODO: collect power which cannot be used for other * devices */ + rebalance++; extra_power += p->requested; p->requested = 0; + ret = _get_power_for_state(inst, zone, &p->max_possible, + state); } i++; }
+ if (rebalance) + goto try_balance; + +balanced: kfree(power);
return 0; diff --git a/kernel/sched/power.h b/kernel/sched/power.h index 1a234a3ef924..3f30dc2b9374 100644 --- a/kernel/sched/power.h +++ b/kernel/sched/power.h @@ -24,6 +24,7 @@ /* MAX_POWER_STATE_ID for current implementation in thermal framework * 'freq table' (DESC ordered) */ #define MAX_POWER_STATE_ID 0 +#define MAX_REBALANCE_TRIES_POWER_BUDGET 100
struct power_budget { s64 temp;
Skip changing state when zone is single cooling, just change the max allowed state.
Signed-off-by: Lukasz Luba l.luba@partner.samsung.com --- kernel/sched/power.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/power.c b/kernel/sched/power.c index daee84b00b37..a4d31ec5f638 100644 --- a/kernel/sched/power.c +++ b/kernel/sched/power.c @@ -1,8 +1,12 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Scheduler CPU power + * Scheduler CPU power allocation. + * It is based heavily on IPA (PID approach), so related copyright attached. + * It also uses design from schedutil. * - * Copyright (C) 2018 Samsung + * Copyright (C) 2014 ARM Ltd. + * Copyright (C) 2018 Samsung Electronics co., Ltd + * Author: Lukasz Luba l.luba@partner.samsung.com */
#define pr_fmt(fmt) "SCHED_POWER: " fmt @@ -659,8 +663,11 @@ static int cooling_dev_set_state(struct _thermal_zone *zone,
/* check if we can go with higher freq for zone with a few devices*/ if (zone->single_cooling_dev) { - /* we treat single-cooling-dev-zone as a guard for max temp */ + /* We treat single-cooling-dev-zone as a guard for max temp + * and which does not disturb the power split for whole chip. */ cooling->max_single_state = target; + if (target < curr_state) + goto skip_change; } else { /* Lower 'target' state means higer frequency. Prevent going to * higer freq if there was a limit due temperature value @@ -682,6 +689,7 @@ static int cooling_dev_set_state(struct _thermal_zone *zone, cdev->updated = true; mutex_unlock(&cdev->lock);
+skip_change: return ret; }