Hi,
A number of patch sets related to power-efficient scheduling have been
posted over the last couple of months. Most of them do not have much
data to back them up, so I decided to do some testing.
Common for all of the patch sets that I have tested, except one, is that
they attempt to pack tasks on as few cpus as possible to allow the
remaining cpus to enter deeper sleep states - a strategy that should
make sense on most platforms that support per-cpu power gating and
multi-socket machines.
Kernel: 3.9
Patch sets:
rlb-v4: sched: use runnable load based balance (Alex Shi)
<https://lkml.org/lkml/2013/4/27/13>
pas-v7: sched: power aware scheduling (Alex Shi)
<https://lkml.org/lkml/2013/4/3/732>
pst-v3: sched: packing small tasks (Vincent Guittot)
<https://lkml.org/lkml/2013/3/22/183>
pst-v4: sched: packing small tasks (Vincent Guittot)
<https://lkml.org/lkml/2013/4/25/396>
Configuration:
pas-v7: Set to "powersaving" mode.
pst-v4: Set to "Full" packing mode.
Platform:
ARM TC2 (test-chip), 2xCortex-A15 + 3xCortex-A7. Cortex-A15s disabled.
Measurement technique:
Time spent non-idle (not in idle state) for each cpu based on cpuidle
ftrace events. TC2 does not have per-core power-gating, so packing
inside the A7 cluster does not lead to any significant power savings.
Note that any product grade hardware (TC2 is a test-chip) will very
likely have per-core power-gating, so in those cases packing will have
an appreciable effect on power savings.
Measuring non-idle time rather than power should give a more clear idea
about the effect of the patch sets given that the idle back-end is
highly implementation specific.
Benchmarks:
audio playback (Android): 30s mp3 file playback on Android.
bbench+audio (Android): Web page rendering while doing mp3 playback.
andebench_native (Android): Android benchmark running in native mode.
cyclictest: Short periodic tasks.
Results:
Two runs for each patch set.
audio playback (Android) SMP
non-idle % cpu 0 cpu 1 cpu 2
3.9_1 11.96 2.86 2.48
3.9_2 12.64 2.81 1.88
rlb-v4_1 12.61 2.44 1.90
rlb-v4_2 12.45 2.44 1.90
pas-v7_1 16.17 0.03 0.24
pas-v7_2 16.08 0.28 0.07
pst-v3_1 15.18 2.76 1.70
pst-v3_2 15.13 0.80 0.38
pst-v4_1 16.14 0.05 0.00
pst-v4_2 16.34 0.06 0.00
bbench+audio (Android) SMP
non-idle % cpu 0 cpu 1 cpu 2 render time
3.9_1 25.00 20.73 21.22 812
3.9_2 24.29 19.78 22.34 795
rlb-v4_1 23.84 19.36 22.74 782
rlb-v4_2 24.07 19.36 22.74 797
pas-v7_1 28.29 17.86 16.01 869
pas-v7_2 28.62 18.54 15.05 908
pst-v3_1 29.14 20.59 21.72 830
pst-v3_2 27.69 18.81 20.06 830
pst-v4_1 42.20 13.63 2.29 880
pst-v4_2 41.56 14.40 2.17 935
andebench_native (8 threads) (Android) SMP
non-idle % cpu 0 cpu 1 cpu 2 Score
3.9_1 99.22 98.88 99.61 4139
3.9_2 99.56 99.31 99.46 4148
rlb-v4_1 99.49 99.61 99.53 4153
rlb-v4_2 99.56 99.61 99.53 4149
pas-v7_1 99.53 99.59 99.29 4149
pas-v7_2 99.42 99.63 99.48 4150
pst-v3_1 97.89 99.33 99.42 4097
pst-v3_2 99.16 99.62 99.42 4097
pst-v4_1 99.34 99.01 99.59 4146
pst-v4_2 99.49 99.52 99.20 4146
cyclictest SMP
non-idle % cpu 0 cpu 1 cpu 2
3.9_1 9.13 8.88 8.41
3.9_2 10.27 8.02 6.30
rlb-v4_1 8.88 8.09 8.11
rlb-v4_2 8.49 8.09 8.11
pas-v7_1 10.20 0.02 11.50
pas-v7_2 7.86 14.31 0.02
pst-v3_1 20.44 8.68 7.97
pst-v3_2 20.41 0.78 1.00
pst-v4_1 21.32 0.21 0.05
pst-v4_2 21.56 0.21 0.04
Overall, pas-v7 seems to do a fairly good job at packing. The idle time
distribution seems to be somewhere between pst-v3 and the more
aggressive pst-v4 for all the benchmarks. pst-v4 manages to keep two
cpus nearly idle (<0.25% non-idle) for both cyclictest and audio, which
is better than both pst-v3 and pas-v7. pas-v7 fails to pack cyclictest.
Packing does come at at cost which can be seen for bbench+audio, where
pst-v3 and rlb-v4 get better render times than pas-v7 and pst-v4 which
do more aggressive packing. rlb-v4 does not pack, it is only included
for reference.
>From a packing perspective pst-v4 seems to do the best job for the
workloads that I have tested on ARM TC2. The less aggressive packing in
pst-v3 may be a better choice for in terms of performance.
I'm well aware that these tests are heavily focused on mobile workloads.
I would therefore encourage people to share your test results for your
workloads on your platforms to complete the picture. Comments are also
welcome.
Thanks,
Morten
The number of cpuidle drivers is increasing more and more. Today we have
in total 24 drivers. A lot of the code is duplicated, at least for the
initialization. A work of consolidation has been done during this year:
* a lot of code cleanup in all the drivers
* time keeping is now part of the framework
* timer broadcast is now part of the framework
* a WFI state function for ARM is defined and used in the drivers
* an init function has been proposed to factor out the initialization across
the drivers (patchset pending)
What has been observed is a lot of code duplicationis, modification made in
the framework takes awhile before reaching the driver which uses an old API,
duplicate routine and bugs, etc ...
It appears the drivers are belonging to different trees, the cpuidle framework
is under linux-pm, the drivers are per SoC tree. The communication is made
difficult because of the different mailing lists where the cpuidle are
submitted.
After this work, it is time to prevent all these problems to occur again.
I propose to move the cpuidle drivers to the drivers/cpuidle directory, hence
having one single submission path for cpuidle in order to have the cpuidle
framework and the different drivers synced.
This series move the AT91 cpuidle driver under drivers/cpuidle. That does not
change the rule to have the patches acked-by the author of the driver.
Note the calxeda and kirkwood drivers are now in drivers/cpuidle.
Daniel Lezcano (2):
ARM: at91: cpuidle: encapsulate the standby code
ARM: at91: cpuidle: move the driver to drivers/cpuidle directory
arch/arm/mach-at91/Makefile | 1 -
arch/arm/mach-at91/cpuidle.c | 66 ------------------------------------------
arch/arm/mach-at91/pm.c | 8 ++++-
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/at91.c | 55 +++++++++++++++++++++++++++++++++++
5 files changed, 63 insertions(+), 68 deletions(-)
delete mode 100644 arch/arm/mach-at91/cpuidle.c
create mode 100644 drivers/cpuidle/at91.c
--
1.7.9.5
Hi Todd and others,
If we have a multi-package system, where we have multiple instances of struct
policy (per package), currently we can't have multiple instances of same
governor. i.e. We can't have multiple instances of Interactive governor for
multiple packages.
This is a bottleneck for multicluster system, where we want different packages
to use Interactive governor, but with different tunables.
---------x------------x---------
Recently, I have upstreamed this support in 3.10-rc1 for cpufreq core, Ondemand
and Conservative governor. Now is an attempt for Interactive Governor.
I didn't had any clue on what kernel to rebase my patches over as I couldn't
find a 3.10-rc based branch in your tree and so based it on
experimental/android-3.9.
So, this is what this patchset does:
- Backports some important patches from v3.10-rc1/2 to v3.9: First 8 patches
- Added few more supportive patches which might go in rc3: Next 4 patches
- Finally updated Interactive governor: Last 4 patches
So, Review is probably required only for last 4 patches. The last patch is a bit
long, it is mostly rearrangement of the code rather then major update. It is
based on the patchset which I wrote for Ondemand/Conservative governor.
This has been tested on ARM big LITTLE platform which has multiple packages
requiring separate tunables.
Nathan Zimmer (1):
cpufreq: Convert the cpufreq_driver_lock to a rwlock
Stratos Karafotis (1):
cpufreq: governors: Calculate iowait time only when necessary
Viresh Kumar (14):
cpufreq: Add per policy governor-init/exit infrastructure
cpufreq: governor: Implement per policy instances of governors
cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
cpufreq: Don't call __cpufreq_governor() for drivers without target()
cpufreq: governors: Fix CPUFREQ_GOV_POLICY_{INIT|EXIT} notifiers
cpufreq: Issue CPUFREQ_GOV_POLICY_EXIT notifier before dropping
policy refcount
cpufreq: Add EXPORT_SYMBOL_GPL for have_governor_per_policy
cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c
cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT
cpufreq: Move get_cpu_idle_time() to cpufreq.c
cpufreq: interactive: Use generic get_cpu_idle_time() from cpufreq.c
cpufreq: interactive: Remove unnecessary cpu_online() check
cpufreq: interactive: Move definition of cpufreq_gov_interactive
downwards
cpufreq: Interactive: Implement per policy instances of governor
drivers/cpufreq/cpufreq.c | 157 ++++++--
drivers/cpufreq/cpufreq_conservative.c | 195 ++++++----
drivers/cpufreq/cpufreq_governor.c | 273 +++++++-------
drivers/cpufreq/cpufreq_governor.h | 120 +++++-
drivers/cpufreq/cpufreq_interactive.c | 663 +++++++++++++++++++--------------
drivers/cpufreq/cpufreq_ondemand.c | 274 ++++++++------
include/linux/cpufreq.h | 19 +-
7 files changed, 1043 insertions(+), 658 deletions(-)
--
1.7.12.rc2.18.g61b472e
This patch series does the following:
1) Factors out possible common code, unifies the clk strutures used
for PLL35xx & PLL36xx and usues clk->base instead of clk->con0
2) Defines a common rate_table which will contain recommended p, m, s and k
values for supported rates that needs to be changed for changing
corresponding PLL's rate
3) Adds set_rate() and round_rate() clk_ops for PLL35xx and PLL36xx
changes since v2:
- Added new patch to reorder the MUX registration for mout_vpllsrc MUX
before the PLL registrations. And to add the alias for the mout_vpllsrc MUX.
- Added a check to confirm parent rate while registrating the PLL
rate tables.
changes since v1:
- removed sorting and bsearch
- modified the definition of struct "samsung_pll_rate_table"
- added generic round_rate()
- rectified the ops assignment for "rate table passed as NULL"
during PLL registration
Is rebased on branch kgene's "for-next"
https://git.kernel.org/cgit/linux/kernel/git/kgene/linux-samsung.git/log/?h…
And tested these patch on chromebook for EPLL settings for Audio on our chrome tree.
Vikas Sajjan (3):
clk: samsung: Add set_rate() clk_ops for PLL36xx
clk: samsung: Add alias for mout_vpllsrc and reorder MUX registration
for it
clk: samsung: Add EPLL and VPLL freq table for exynos5250 SoC
Yadwinder Singh Brar (3):
clk: samsung: Use clk->base instead of directly using clk->con0 for
PLL3xxx
clk: samsung: Add support to register rate_table for PLL3xxx
clk: samsung: Add set_rate() clk_ops for PLL35xx
drivers/clk/samsung/clk-exynos4.c | 10 +-
drivers/clk/samsung/clk-exynos5250.c | 69 +++++++++--
drivers/clk/samsung/clk-pll.c | 226 ++++++++++++++++++++++++++++++----
drivers/clk/samsung/clk-pll.h | 35 +++++-
drivers/clk/samsung/clk.h | 2 +
5 files changed, 300 insertions(+), 42 deletions(-)
--
1.7.9.5
also cc linaro kernel
Hi,
This patch will forward target residency information from the arm_big_little
driver to mcpm.
If multiple powerdown states are used, the vendor specific code will need a way to distinguish the intended c-state information.
I do not have TC2 hardware to verify this. Would someone be able to help
verify this change on TC2?
Thanks!
Sebastian
Sebastian Capella (1):
cpuidle: arm_big_little: route target residency to mcpm
drivers/cpuidle/arm_big_little.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--
1.7.9.5
This patch series does the following:
1) Factors out possible common code, unifies the clk strutures used
for PLL35xx & PLL36xx and usues clk->base instead of clk->con0
2) Defines a common rate_table which will contain recommended p, m, s and k
values for supported rates that needs to be changed for changing
corresponding PLL's rate
3) Adds set_rate() and round_rate() clk_ops for PLL35xx and PLL36xx
changes since v1:
- removed sorting and bsearch
- modified the definition of struct "samsung_pll_rate_table"
- added generic round_rate()
- rectified the ops assignment for "rate table passed as NULL"
during PLL registration
Is rebased on branch kgene's "for-next"
https://git.kernel.org/cgit/linux/kernel/git/kgene/linux-samsung.git/log/?h…
And tested these patch on chromebook for EPLL settings for Audio on our chrome tree.
Vikas Sajjan (2):
clk: samsung: Add set_rate() clk_ops for PLL36xx
clk: samsung: Add EPLL and VPLL freq table for exynos5250 SoC
Yadwinder Singh Brar (3):
clk: samsung: Use clk->base instead of directly using clk->con0 for
PLL3xxx
clk: samsung: Add support to register rate_table for PLL3xxx
clk: samsung: Add set_rate() clk_ops for PLL35xx
drivers/clk/samsung/clk-exynos4.c | 10 +-
drivers/clk/samsung/clk-exynos5250.c | 30 +++--
drivers/clk/samsung/clk-pll.c | 226 ++++++++++++++++++++++++++++++----
drivers/clk/samsung/clk-pll.h | 33 ++++-
4 files changed, 260 insertions(+), 39 deletions(-)
--
1.7.9.5
Currently, the RTC IRQ is never wakeup-enabled so is not capable of
bringing the system out of suspend.
On OMAP platforms, we have gotten by without this because the TWL RTC
is on an I2C-connected chip which is capable of waking up the OMAP via
the IO ring when the OMAP is in low-power states.
However, if the OMAP suspends without hitting the low-power states
(and the IO ring is not enabled), RTC wakeups will not work because
the IRQ is not wakeup enabled.
To fix, ensure the RTC IRQ is wakeup enabled whenever the RTC alarm is
set.
Cc: Alessandro Zummo <a.zummo(a)towertech.it>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Kevin Hilman <khilman(a)linaro.org>
---
drivers/rtc/rtc-twl.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-twl.c b/drivers/rtc/rtc-twl.c
index 8751a52..bbda0fd 100644
--- a/drivers/rtc/rtc-twl.c
+++ b/drivers/rtc/rtc-twl.c
@@ -213,12 +213,24 @@ static int mask_rtc_irq_bit(unsigned char bit)
static int twl_rtc_alarm_irq_enable(struct device *dev, unsigned enabled)
{
+ struct platform_device *pdev = to_platform_device(dev);
+ int irq = platform_get_irq(pdev, 0);
+ static bool twl_rtc_wake_enabled;
int ret;
- if (enabled)
+ if (enabled) {
ret = set_rtc_irq_bit(BIT_RTC_INTERRUPTS_REG_IT_ALARM_M);
- else
+ if (device_can_wakeup(dev) && !twl_rtc_wake_enabled) {
+ enable_irq_wake(irq);
+ twl_rtc_wake_enabled = true;
+ }
+ } else {
ret = mask_rtc_irq_bit(BIT_RTC_INTERRUPTS_REG_IT_ALARM_M);
+ if (twl_rtc_wake_enabled) {
+ disable_irq_wake(irq);
+ twl_rtc_wake_enabled = false;
+ }
+ }
return ret;
}
--
1.8.2
I have faced a sequence where the Idle Load Balance was sometime not
triggered for a while on my platform.
CPU 0 and CPU 1 are running tasks and CPU 2 is idle
CPU 1 kicks the Idle Load Balance
CPU 1 selects CPU 2 as the new Idle Load Balancer
CPU 1 sets NOHZ_BALANCE_KICK for CPU 2
CPU 1 sends a reschedule IPI to CPU 2
While CPU 2 wakes up, CPU 0 or CPU 1 migrates a waking task A on CPU 2
CPU 2 finally wakes up, runs task A and discards the Idle Load Balance
Task A quickly goes back to sleep (before a tick occurs on CPU 2)
CPU 2 goes back to idle with NOHZ_BALANCE_KICK set
Whenever CPU 2 will be selected for the ILB, reschedule IPI will be not
sent to CPU2, which is idle, because NOHZ_BALANCE_KICK is already set
and no Idle Load Balance will be performed.
We must wait for the sched softirq to be raised on CPU 2 thanks to
another part of the kernel to clear NOHZ_BALANCE_KICKand come back to
a normal situation.
The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if
we can't raise the sched_softirq for the Idle Load Balance.
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58453b8..51fc715 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1420,7 +1420,8 @@ void scheduler_ipi(void)
if (unlikely(got_nohz_idle_kick() && !need_resched())) {
this_rq()->idle_balance = 1;
raise_softirq_irqoff(SCHED_SOFTIRQ);
- }
+ } else
+ clear_bit(NOHZ_BALANCE_KICK, nohz_flags(smp_processor_id()));
irq_exit();
}
--
1.7.9.5
Commit bf4d1b5ddb78f86078ac6ae0415802d5f0c68f92 brought the multiple driver
support. The code added a couple of new API to register the driver per cpu.
That led to some code complexity to handle the kernel config options when
the multiple driver support is enabled or not, which is not really necessary.
The code has to be compatible when the multiple driver support is not enabled,
and the multiple driver support has to be compatible with the old api.
This patch removes this API, which is not yet used by any driver but needed
for the HMP cpuidle drivers which will come soon, and replaces its usage
by a cpumask pointer in the cpuidle driver structure telling what cpus are
handled by the driver. That let the API cpuidle_[un]register_driver to be used
for the multipled driver support and also the cpuidle_[un]register functions,
added recently in the cpuidle framework.
The current code, a bit poor in comments, has been commented and simplified.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
[V2]:
- fixed bad refcount check
- inverted clockevent notify off order at unregister time
drivers/cpuidle/cpuidle.c | 4 +-
drivers/cpuidle/driver.c | 324 ++++++++++++++++++++++++++++-----------------
include/linux/cpuidle.h | 21 +--
3 files changed, 213 insertions(+), 136 deletions(-)
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index c3a93fe..fdc432f 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -466,7 +466,7 @@ void cpuidle_unregister(struct cpuidle_driver *drv)
int cpu;
struct cpuidle_device *device;
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, drv->cpumask) {
device = &per_cpu(cpuidle_dev, cpu);
cpuidle_unregister_device(device);
}
@@ -498,7 +498,7 @@ int cpuidle_register(struct cpuidle_driver *drv,
return ret;
}
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, drv->cpumask) {
device = &per_cpu(cpuidle_dev, cpu);
device->cpu = cpu;
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 8dfaaae..0268346 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -18,206 +18,266 @@
DEFINE_SPINLOCK(cpuidle_driver_lock);
-static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu);
-static struct cpuidle_driver * __cpuidle_get_cpu_driver(int cpu);
+#ifdef CONFIG_CPU_IDLE_MULTIPLE_DRIVERS
-static void cpuidle_setup_broadcast_timer(void *arg)
+static DEFINE_PER_CPU(struct cpuidle_driver *, cpuidle_drivers);
+
+/**
+ * __cpuidle_get_cpu_driver: returns the cpuidle driver tied with the specified
+ * cpu.
+ *
+ * @cpu: an integer specifying the cpu number
+ *
+ * Returns a pointer to struct cpuidle_driver, NULL if no driver has been
+ * registered for this driver
+ */
+static struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
{
- int cpu = smp_processor_id();
- clockevents_notify((long)(arg), &cpu);
+ return per_cpu(cpuidle_drivers, cpu);
}
-static void __cpuidle_driver_init(struct cpuidle_driver *drv, int cpu)
+/**
+ * __cpuidle_set_driver: assign to the per cpu variable the driver pointer for
+ * each cpu the driver is assigned to with the cpumask.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static inline int __cpuidle_set_driver(struct cpuidle_driver *drv)
{
- int i;
+ int cpu;
- drv->refcnt = 0;
+ for_each_cpu(cpu, drv->cpumask) {
- for (i = drv->state_count - 1; i >= 0 ; i--) {
+ if (__cpuidle_get_cpu_driver(cpu))
+ return -EBUSY;
- if (!(drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP))
- continue;
-
- drv->bctimer = 1;
- on_each_cpu_mask(get_cpu_mask(cpu), cpuidle_setup_broadcast_timer,
- (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1);
- break;
+ per_cpu(cpuidle_drivers, cpu) = drv;
}
+
+ return 0;
}
-static int __cpuidle_register_driver(struct cpuidle_driver *drv, int cpu)
+/**
+ * __cpuidle_unset_driver: for each cpu the driver is handling, set the per cpu
+ * variable driver to NULL.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ */
+static inline void __cpuidle_unset_driver(struct cpuidle_driver *drv)
{
- if (!drv || !drv->state_count)
- return -EINVAL;
-
- if (cpuidle_disabled())
- return -ENODEV;
-
- if (__cpuidle_get_cpu_driver(cpu))
- return -EBUSY;
+ int cpu;
- __cpuidle_driver_init(drv, cpu);
+ for_each_cpu(cpu, drv->cpumask) {
- __cpuidle_set_cpu_driver(drv, cpu);
+ if (drv != __cpuidle_get_cpu_driver(cpu))
+ continue;
- return 0;
+ per_cpu(cpuidle_drivers, cpu) = NULL;
+ }
}
-static void __cpuidle_unregister_driver(struct cpuidle_driver *drv, int cpu)
-{
- if (drv != __cpuidle_get_cpu_driver(cpu))
- return;
+#else
- if (!WARN_ON(drv->refcnt > 0))
- __cpuidle_set_cpu_driver(NULL, cpu);
+static struct cpuidle_driver *cpuidle_curr_driver;
- if (drv->bctimer) {
- drv->bctimer = 0;
- on_each_cpu_mask(get_cpu_mask(cpu), cpuidle_setup_broadcast_timer,
- (void *)CLOCK_EVT_NOTIFY_BROADCAST_OFF, 1);
- }
+/**
+ * __cpuidle_get_cpu_driver: returns the global cpuidle driver pointer.
+ *
+ * @cpu: an integer specifying the cpu number, this parameter is ignored
+ *
+ * Returns a pointer to a struct cpuidle_driver, NULL if no driver was
+ * previously registered
+ */
+static inline struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
+{
+ return cpuidle_curr_driver;
}
-#ifdef CONFIG_CPU_IDLE_MULTIPLE_DRIVERS
+/**
+ * __cpuidle_set_driver: assign the cpuidle driver pointer to the global cpuidle
+ * driver variable.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static inline int __cpuidle_set_driver(struct cpuidle_driver *drv)
+{
+ if (cpuidle_curr_driver)
+ return -EBUSY;
-static DEFINE_PER_CPU(struct cpuidle_driver *, cpuidle_drivers);
+ cpuidle_curr_driver = drv;
-static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu)
-{
- per_cpu(cpuidle_drivers, cpu) = drv;
+ return 0;
}
-static struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
+/**
+ * __cpuidle_unset_driver: reset the global cpuidle driver variable if the
+ * cpuidle driver pointer match it.
+ *
+ * @drv: a pointer to a struct cpuidle_driver
+ */
+static inline void __cpuidle_unset_driver(struct cpuidle_driver *drv)
{
- return per_cpu(cpuidle_drivers, cpu);
+ if (drv == cpuidle_curr_driver)
+ cpuidle_curr_driver = NULL;
}
-static void __cpuidle_unregister_all_cpu_driver(struct cpuidle_driver *drv)
+#endif
+
+/**
+ * cpuidle_setup_broadcast_timer: set the broadcast timer notification for the
+ * current cpu. This function is called per cpu context invoked by a smp cross
+ * call. It is not supposed to be called directly.
+ *
+ * @arg: a void pointer, actually used to match the smp cross call api but used
+ * as a long with two values:
+ * - CLOCK_EVT_NOTIFY_BROADCAST_ON
+ * - CLOCK_EVT_NOTIFY_BROADCAST_OFF
+ */
+static void cpuidle_setup_broadcast_timer(void *arg)
{
- int cpu;
- for_each_present_cpu(cpu)
- __cpuidle_unregister_driver(drv, cpu);
+ int cpu = smp_processor_id();
+ clockevents_notify((long)(arg), &cpu);
}
-static int __cpuidle_register_all_cpu_driver(struct cpuidle_driver *drv)
+/**
+ * __cpuidle_driver_init: initialize the driver internal data.
+ *
+ * @drv: a valid pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static int __cpuidle_driver_init(struct cpuidle_driver *drv)
{
- int ret = 0;
- int i, cpu;
+ int i;
- for_each_present_cpu(cpu) {
- ret = __cpuidle_register_driver(drv, cpu);
- if (ret)
- break;
- }
+ drv->refcnt = 0;
- if (ret)
- for_each_present_cpu(i) {
- if (i == cpu)
- break;
- __cpuidle_unregister_driver(drv, i);
- }
+ /*
+ * we default here to all cpu possible because if the kernel
+ * boots with some cpus offline and then we online one of them
+ * the cpu notifier won't know which driver to assign
+ */
+ if (!drv->cpumask)
+ drv->cpumask = (struct cpumask *)cpu_possible_mask;
+
+ /*
+ * we look for the timer stop flag in the different states,
+ * so know we have to setup the broadcast timer. The loop is
+ * in reverse order, because usually the deeper state has this
+ * flag set
+ */
+ for (i = drv->state_count - 1; i >= 0 ; i--) {
+ if (!(drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP))
+ continue;
- return ret;
+ drv->bctimer = 1;
+ break;
+ }
+
+ return 0;
}
-int cpuidle_register_cpu_driver(struct cpuidle_driver *drv, int cpu)
+/**
+ * __cpuidle_register_driver: do some sanity checks, initializes the driver,
+ * assign the driver to the global cpuidle driver variable(s) and setup the
+ * broadcast timer if the cpuidle driver has some states which shutdown the
+ * local timer.
+ *
+ * @drv: a valid pointer to a struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
+ */
+static int __cpuidle_register_driver(struct cpuidle_driver *drv)
{
int ret;
- spin_lock(&cpuidle_driver_lock);
- ret = __cpuidle_register_driver(drv, cpu);
- spin_unlock(&cpuidle_driver_lock);
+ if (!drv || !drv->state_count)
+ return -EINVAL;
- return ret;
-}
+ if (cpuidle_disabled())
+ return -ENODEV;
-void cpuidle_unregister_cpu_driver(struct cpuidle_driver *drv, int cpu)
-{
- spin_lock(&cpuidle_driver_lock);
- __cpuidle_unregister_driver(drv, cpu);
- spin_unlock(&cpuidle_driver_lock);
-}
+ ret = __cpuidle_driver_init(drv);
+ if (ret)
+ return ret;
-/**
- * cpuidle_register_driver - registers a driver
- * @drv: the driver
- */
-int cpuidle_register_driver(struct cpuidle_driver *drv)
-{
- int ret;
+ ret = __cpuidle_set_driver(drv);
+ if (ret)
+ return ret;
- spin_lock(&cpuidle_driver_lock);
- ret = __cpuidle_register_all_cpu_driver(drv);
- spin_unlock(&cpuidle_driver_lock);
+ if (drv->bctimer)
+ on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+ (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1);
- return ret;
+ return 0;
}
-EXPORT_SYMBOL_GPL(cpuidle_register_driver);
/**
- * cpuidle_unregister_driver - unregisters a driver
- * @drv: the driver
+ * __cpuidle_unregister_driver: checks the driver is no longer in use, reset the
+ * global cpuidle driver variable(s) and disable the timer broadcast
+ * notification mechanism if it was in use.
+ *
+ * @drv: a valid pointer to a struct cpuidle_driver
+ *
*/
-void cpuidle_unregister_driver(struct cpuidle_driver *drv)
+static void __cpuidle_unregister_driver(struct cpuidle_driver *drv)
{
- spin_lock(&cpuidle_driver_lock);
- __cpuidle_unregister_all_cpu_driver(drv);
- spin_unlock(&cpuidle_driver_lock);
-}
-EXPORT_SYMBOL_GPL(cpuidle_unregister_driver);
-
-#else
-
-static struct cpuidle_driver *cpuidle_curr_driver;
+ if (WARN_ON(drv->refcnt > 0))
+ return;
-static inline void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu)
-{
- cpuidle_curr_driver = drv;
-}
+ if (drv->bctimer) {
+ drv->bctimer = 0;
+ on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer,
+ (void *)CLOCK_EVT_NOTIFY_BROADCAST_OFF, 1);
+ }
-static inline struct cpuidle_driver *__cpuidle_get_cpu_driver(int cpu)
-{
- return cpuidle_curr_driver;
+ __cpuidle_unset_driver(drv);
}
/**
- * cpuidle_register_driver - registers a driver
- * @drv: the driver
+ * cpuidle_register_driver: registers a driver by taking a lock to prevent
+ * multiple callers to [un]register a driver at the same time.
+ *
+ * @drv: a pointer to a valid struct cpuidle_driver
+ *
+ * Returns 0 on success, < 0 otherwise
*/
int cpuidle_register_driver(struct cpuidle_driver *drv)
{
- int ret, cpu;
+ int ret;
- cpu = get_cpu();
spin_lock(&cpuidle_driver_lock);
- ret = __cpuidle_register_driver(drv, cpu);
+ ret = __cpuidle_register_driver(drv);
spin_unlock(&cpuidle_driver_lock);
- put_cpu();
return ret;
}
EXPORT_SYMBOL_GPL(cpuidle_register_driver);
/**
- * cpuidle_unregister_driver - unregisters a driver
- * @drv: the driver
+ * cpuidle_unregister_driver: unregisters a driver by taking a lock to prevent
+ * multiple callers to [un]register a driver at the same time. The specified
+ * driver must match the driver currently registered.
+ *
+ * @drv: a pointer to a valid struct cpuidle_driver
*/
void cpuidle_unregister_driver(struct cpuidle_driver *drv)
{
- int cpu;
-
- cpu = get_cpu();
spin_lock(&cpuidle_driver_lock);
- __cpuidle_unregister_driver(drv, cpu);
+ __cpuidle_unregister_driver(drv);
spin_unlock(&cpuidle_driver_lock);
- put_cpu();
}
EXPORT_SYMBOL_GPL(cpuidle_unregister_driver);
-#endif
/**
- * cpuidle_get_driver - return the current driver
+ * cpuidle_get_driver: returns the driver tied with the current cpu.
+ *
+ * Returns a struct cpuidle_driver pointer, or NULL if no driver is registered
*/
struct cpuidle_driver *cpuidle_get_driver(void)
{
@@ -233,7 +293,12 @@ struct cpuidle_driver *cpuidle_get_driver(void)
EXPORT_SYMBOL_GPL(cpuidle_get_driver);
/**
- * cpuidle_get_cpu_driver - return the driver tied with a cpu
+ * cpuidle_get_cpu_driver: returns the driver registered with a cpu.
+ *
+ * @dev: a valid pointer to a struct cpuidle_device
+ *
+ * Returns a struct cpuidle_driver pointer, or NULL if no driver is registered
+ * for the specified cpu
*/
struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
{
@@ -244,6 +309,13 @@ struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev)
}
EXPORT_SYMBOL_GPL(cpuidle_get_cpu_driver);
+/**
+ * cpuidle_driver_ref: gets a refcount for the driver. Note this function takes
+ * a refcount for the driver assigned to the current cpu.
+ *
+ * Returns a struct cpuidle_driver pointer, or NULL if no driver is registered
+ * for the current cpu
+ */
struct cpuidle_driver *cpuidle_driver_ref(void)
{
struct cpuidle_driver *drv;
@@ -257,6 +329,10 @@ struct cpuidle_driver *cpuidle_driver_ref(void)
return drv;
}
+/**
+ * cpuidle_driver_unref: puts down the refcount for the driver. Note this
+ * function decrement the refcount for the driver assigned to the current cpu.
+ */
void cpuidle_driver_unref(void)
{
struct cpuidle_driver *drv = cpuidle_get_driver();
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 8f04062..63d78b1 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -101,16 +101,20 @@ static inline int cpuidle_get_last_residency(struct cpuidle_device *dev)
****************************/
struct cpuidle_driver {
- const char *name;
- struct module *owner;
- int refcnt;
+ const char *name;
+ struct module *owner;
+ int refcnt;
/* used by the cpuidle framework to setup the broadcast timer */
- unsigned int bctimer:1;
+ unsigned int bctimer:1;
+
/* states array must be ordered in decreasing power consumption */
- struct cpuidle_state states[CPUIDLE_STATE_MAX];
- int state_count;
- int safe_state_index;
+ struct cpuidle_state states[CPUIDLE_STATE_MAX];
+ int state_count;
+ int safe_state_index;
+
+ /* the driver handles the cpus in cpumask */
+ struct cpumask *cpumask;
};
#ifdef CONFIG_CPU_IDLE
@@ -135,9 +139,6 @@ extern void cpuidle_disable_device(struct cpuidle_device *dev);
extern int cpuidle_play_dead(void);
extern struct cpuidle_driver *cpuidle_get_cpu_driver(struct cpuidle_device *dev);
-extern int cpuidle_register_cpu_driver(struct cpuidle_driver *drv, int cpu);
-extern void cpuidle_unregister_cpu_driver(struct cpuidle_driver *drv, int cpu);
-
#else
static inline void disable_cpuidle(void) { }
static inline int cpuidle_idle_call(void) { return -ENODEV; }
--
1.7.9.5