Hi All,
This is V2 Resend of my sched_select_cpu() work. Resend because didn't got much
attention on V2. Including more guys now in cc :)
In order to save power, it would be useful to schedule work onto non-IDLE cpus
instead of waking up an IDLE one.
To achieve this, we need scheduler to guide kernel frameworks (like: timers &
workqueues) on which is the most preferred CPU that must be used for these
tasks.
This patchset is about implementing this concept.
- The first patch adds sched_select_cpu() routine which returns the preferred
cpu which is non-idle.
- Second patch removes idle_cpu() calls from timer & hrtimer.
- Third patch is about adapting this change in workqueue framework.
- Fourth patch add migration capability in running timer
Earlier discussions over v1 can be found here:
http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
Earlier discussions over this concept were done at last LPC:
http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer…
Module created for testing this behavior is present here:
http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary
Following are the steps followed in test module:
1. Run single work on each cpu
2. This work will start a timer after x (tested with 10) jiffies of delay
3. Timer routine queues a work... (This may be called from idle or non-idle cpu)
and starts the same timer again STEP 3 is done for n number of times (i.e.
queuing n works, one after other)
4. All works will call a single routine, which will count following per cpu:
- Total works processed by a CPU
- Total works processed by a CPU, which are queued from it
- Total works processed by a CPU, which aren't queued from it
Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-nano
Results:
-------
Without Workqueue Modification, i.e. PATCH 3/3:
[ 2493.022335] Workqueue Analyser: works processsed by CPU0, Total: 1000, Own: 0, migrated: 0
[ 2493.047789] Workqueue Analyser: works processsed by CPU1, Total: 1000, Own: 0, migrated: 0
[ 2493.072918] Workqueue Analyser: works processsed by CPU2, Total: 1000, Own: 0, migrated: 0
[ 2493.098576] Workqueue Analyser: works processsed by CPU3, Total: 1000, Own: 0, migrated: 0
[ 2493.123702] Workqueue Analyser: works processsed by CPU4, Total: 1000, Own: 0, migrated: 0
With Workqueue Modification, i.e. PATCH 3/3:
[ 2493.022335] Workqueue Analyser: works processsed by CPU0, Total: 1002, Own: 999, migrated: 3
[ 2493.047789] Workqueue Analyser: works processsed by CPU1, Total: 998, Own: 997, migrated: 1
[ 2493.072918] Workqueue Analyser: works processsed by CPU2, Total: 1013, Own: 996, migrated: 17
[ 2493.098576] Workqueue Analyser: works processsed by CPU3, Total: 998, Own: 993, migrated: 5
[ 2493.123702] Workqueue Analyser: works processsed by CPU4, Total: 989, Own: 987, migrated: 2
V2->V2-Resend
-------------
- Included timer migration patch in the same thread.
V1->V2
-----
- New SD_* macros removed now and earlier ones used
- sched_select_cpu() rewritten and it includes the check on current cpu's
idleness.
- cpu_idle() calls from timer and hrtimer removed now.
- Patch 2/3 from V1, removed as it doesn't apply to latest workqueue branch from
tejun.
- CONFIG_MIGRATE_WQ removed and so is wq_select_cpu()
- sched_select_cpu() called only from __queue_work()
- got tejun/for-3.7 branch in my tree, before making workqueue changes.
Viresh Kumar (4):
sched: Create sched_select_cpu() to give preferred CPU for power
saving
timer: hrtimer: Don't check idle_cpu() before calling
get_nohz_timer_target()
workqueue: Schedule work on non-idle cpu instead of current one
timer: Migrate running timer
include/linux/sched.h | 16 ++++++++++--
include/linux/timer.h | 2 ++
kernel/hrtimer.c | 2 +-
kernel/sched/core.c | 69 +++++++++++++++++++++++++++++++--------------------
kernel/timer.c | 50 ++++++++++++++++++++++---------------
kernel/workqueue.c | 4 +--
6 files changed, 91 insertions(+), 52 deletions(-)
--
1.7.12.rc2.18.g61b472e
I am attaching a patch by Thomas Gleixner which adds a kernel
command line parameter to set the defauilt IRQ affinity mask. Could
you please integrate this in your tree for the next Linaro release?
I've been using this patch for sometime now and it doesn't introduce
any regressions. There is a possibility that this patch will make it
upstream via the RT patches in the near future but in the meanwhile,
we'd like to carry this patch as well.
Cheers,
Punit
>From 52a7d44f58a262e166575abc57aa0bd3bfc8cfbb Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Fri, 25 May 2012 16:59:47 +0200
Subject: [PATCH] genirq: Add default affinity mask command line option
If we isolate CPUs, then we don't want random device interrupts on
them. Even w/o the user space irq balancer enabled we can end up with
irqs on non boot cpus.
Allow to restrict the default irq affinity mask.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
---
Documentation/kernel-parameters.txt | 9 +++++++++
kernel/irq/irqdesc.c | 21 +++++++++++++++++++--
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7d82468..00fedab 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1164,6 +1164,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
See comment before ip2_setup() in
drivers/char/ip2/ip2base.c.
+ irqaffinity= [SMP] Set the default irq affinity mask
+ Format:
+ <cpu number>,...,<cpu number>
+ or
+ <cpu number>-<cpu number>
+ (must be a positive range in ascending order)
+ or a mixture
+ <cpu number>,...,<cpu number>-<cpu number>
+
irqfixup [HW]
When an interrupt is not handled search all handlers
for it. Intended to get systems with badly broken
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 192a302..473b2b6 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -23,10 +23,27 @@
static struct lock_class_key irq_desc_lock_class;
#if defined(CONFIG_SMP)
+static int __init irq_affinity_setup(char *str)
+{
+ zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
+ cpulist_parse(str, irq_default_affinity);
+ /*
+ * Set at least the boot cpu. We don't want to end up with
+ * bugreports caused by random comandline masks
+ */
+ cpumask_set_cpu(smp_processor_id(), irq_default_affinity);
+ return 1;
+}
+__setup("irqaffinity=", irq_affinity_setup);
+
static void __init init_irq_default_affinity(void)
{
- alloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
- cpumask_setall(irq_default_affinity);
+#ifdef CONFIG_CPUMASK_OFFSTACK
+ if (!irq_default_affinity)
+ zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
+#endif
+ if (cpumask_empty(irq_default_affinity))
+ cpumask_setall(irq_default_affinity);
}
#else
static void __init init_irq_default_affinity(void)
--
1.7.9.5
Dear All,
I'd like to ask about the power aware scheduler development:
https://blueprints.launchpad.net/linaro-power-kernel/+spec/power-aware-
scheduler
Why I'm interested?
I'd like to test and further develop methods to put CPU to IDLE or
changing its operating frequency. I'm especially interested in
packing as much as possible tasks to a CPU and put the other one to
deep idle (RFTS - policy). I'm also curious how aggressive
SCHED_POLICY_POWERSAVING is going to be? (are there any special
requirements)
In the above page at the "Work items 2013.01" section states that
there is a work TODO in the "max_power and current_power" for DVFS.
Could you share code (if available) and plans for this development?
I've looked to the:
git://git.linaro.org/arm/big.LITTLE/mp.git tree (branch: power-aware-
scheduling-v4), but didn't find the power related code (especially
DVFS).
Probably I've looked at wrong place, so any guidance would be
appreciate.
As a side question:
On the linaro-dev mailing list there are some patches for tuning cpufreq
governor. Is there any roadmap for this effort?
--
Best regards,
Lukasz Majewski
Samsung R&D Poland (SRPOL) | Linux Platform Group
The nr_busy_cpus field of the sched_group_power is sometime different from 0
whereas the platform is fully idle. This serie fixes 3 use cases:
- when some CPUs enter idle state while booting all CPUs
- when a CPU is unplug and/or replug
Change since V1:
- remove the patch for SCHED softirq on an idle core use case as it was
a side effect of the other use cases.
Vincent Guittot (2):
sched: fix init NOHZ_IDLE flag
sched: fix update NOHZ_IDLE flag
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
--
1.7.9.5
With the recent changes in cpufreq core, we just need to set mask of all
possible cpus into policy->cpus. Rest would be done by core.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/exynos-cpufreq.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/drivers/cpufreq/exynos-cpufreq.c b/drivers/cpufreq/exynos-cpufreq.c
index 7012ea8..81eb84a 100644
--- a/drivers/cpufreq/exynos-cpufreq.c
+++ b/drivers/cpufreq/exynos-cpufreq.c
@@ -227,19 +227,7 @@ static int exynos_cpufreq_cpu_init(struct cpufreq_policy *policy)
/* set the transition latency value */
policy->cpuinfo.transition_latency = 100000;
- /*
- * EXYNOS4 multi-core processors has 2 cores
- * that the frequency cannot be set independently.
- * Each cpu is bound to the same speed.
- * So the affected cpu is all of the cpus.
- */
- if (num_online_cpus() == 1) {
- cpumask_copy(policy->related_cpus, cpu_possible_mask);
- cpumask_copy(policy->cpus, cpu_online_mask);
- } else {
- policy->shared_type = CPUFREQ_SHARED_TYPE_ANY;
- cpumask_setall(policy->cpus);
- }
+ cpumask_setall(policy->cpus);
return cpufreq_frequency_table_cpuinfo(policy, exynos_info->freq_table);
}
--
1.7.12.rc2.18.g61b472e
Set devfreq device min and max frequency limits when device
is added to devfreq, provided frequency table is supplied.
This helps governors to suggest target frequency with in
limits.
Signed-off-by: Rajagopal Venkat <rajagopal.venkat(a)linaro.org>
---
drivers/devfreq/devfreq.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index a8f0173..5782c9b 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -69,6 +69,29 @@ static struct devfreq *find_device_devfreq(struct device *dev)
}
/**
+ * devfreq_set_freq_limits() - Set min and max frequency from freq_table
+ * @devfreq: the devfreq instance
+ */
+static void devfreq_set_freq_limits(struct devfreq *devfreq)
+{
+ int idx;
+ unsigned long min = ~0, max = 0;
+
+ if (!devfreq->profile->freq_table)
+ return;
+
+ for (idx = 0; idx < devfreq->profile->max_state; idx++) {
+ if (min > devfreq->profile->freq_table[idx])
+ min = devfreq->profile->freq_table[idx];
+ if (max < devfreq->profile->freq_table[idx])
+ max = devfreq->profile->freq_table[idx];
+ }
+
+ devfreq->min_freq = min;
+ devfreq->max_freq = max;
+}
+
+/**
* devfreq_get_freq_level() - Lookup freq_table for the frequency
* @devfreq: the devfreq instance
* @freq: the target frequency
@@ -476,6 +499,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
devfreq->profile->max_state,
GFP_KERNEL);
devfreq->last_stat_updated = jiffies;
+ devfreq_set_freq_limits(devfreq);
dev_set_name(&devfreq->dev, dev_name(dev));
err = device_register(&devfreq->dev);
--
1.7.10.4
These patches are against the v3.8-rc4.
Guenter Roeck, Jean Delvare,
Please help to have a look at this.
Samuel Ortiz,
Please review 1/2 and leave your Acked-by: if no problem, I think these patches
should be merged into HWMON tree.
Thanks for all.
Hongbo Zhang (2):
ARM: ux500: rename ab8500 to abx500 for hwmon driver
hwmon: add ST-Ericsson ABX500 hwmon driver
drivers/hwmon/Kconfig | 13 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/ab8500.c | 160 +++++++++++
drivers/hwmon/abx500.c | 681 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/hwmon/abx500.h | 88 ++++++
drivers/mfd/ab8500-core.c | 6 +-
6 files changed, 946 insertions(+), 3 deletions(-)
create mode 100644 drivers/hwmon/ab8500.c
create mode 100644 drivers/hwmon/abx500.c
create mode 100644 drivers/hwmon/abx500.h
--
1.8.0