Hi All,
This is V2 Resend of my sched_select_cpu() work. Resend because didn't got much
attention on V2. Including more guys now in cc :)
In order to save power, it would be useful to schedule work onto non-IDLE cpus
instead of waking up an IDLE one.
To achieve this, we need scheduler to guide kernel frameworks (like: timers &
workqueues) on which is the most preferred CPU that must be used for these
tasks.
This patchset is about implementing this concept.
- The first patch adds sched_select_cpu() routine which returns the preferred
cpu which is non-idle.
- Second patch removes idle_cpu() calls from timer & hrtimer.
- Third patch is about adapting this change in workqueue framework.
- Fourth patch add migration capability in running timer
Earlier discussions over v1 can be found here:
http://www.mail-archive.com/linaro-dev@lists.linaro.org/msg13342.html
Earlier discussions over this concept were done at last LPC:
http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer…
Module created for testing this behavior is present here:
http://git.linaro.org/gitweb?p=people/vireshk/module.git;a=summary
Following are the steps followed in test module:
1. Run single work on each cpu
2. This work will start a timer after x (tested with 10) jiffies of delay
3. Timer routine queues a work... (This may be called from idle or non-idle cpu)
and starts the same timer again STEP 3 is done for n number of times (i.e.
queuing n works, one after other)
4. All works will call a single routine, which will count following per cpu:
- Total works processed by a CPU
- Total works processed by a CPU, which are queued from it
- Total works processed by a CPU, which aren't queued from it
Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-nano
Results:
-------
Without Workqueue Modification, i.e. PATCH 3/3:
[ 2493.022335] Workqueue Analyser: works processsed by CPU0, Total: 1000, Own: 0, migrated: 0
[ 2493.047789] Workqueue Analyser: works processsed by CPU1, Total: 1000, Own: 0, migrated: 0
[ 2493.072918] Workqueue Analyser: works processsed by CPU2, Total: 1000, Own: 0, migrated: 0
[ 2493.098576] Workqueue Analyser: works processsed by CPU3, Total: 1000, Own: 0, migrated: 0
[ 2493.123702] Workqueue Analyser: works processsed by CPU4, Total: 1000, Own: 0, migrated: 0
With Workqueue Modification, i.e. PATCH 3/3:
[ 2493.022335] Workqueue Analyser: works processsed by CPU0, Total: 1002, Own: 999, migrated: 3
[ 2493.047789] Workqueue Analyser: works processsed by CPU1, Total: 998, Own: 997, migrated: 1
[ 2493.072918] Workqueue Analyser: works processsed by CPU2, Total: 1013, Own: 996, migrated: 17
[ 2493.098576] Workqueue Analyser: works processsed by CPU3, Total: 998, Own: 993, migrated: 5
[ 2493.123702] Workqueue Analyser: works processsed by CPU4, Total: 989, Own: 987, migrated: 2
V2->V2-Resend
-------------
- Included timer migration patch in the same thread.
V1->V2
-----
- New SD_* macros removed now and earlier ones used
- sched_select_cpu() rewritten and it includes the check on current cpu's
idleness.
- cpu_idle() calls from timer and hrtimer removed now.
- Patch 2/3 from V1, removed as it doesn't apply to latest workqueue branch from
tejun.
- CONFIG_MIGRATE_WQ removed and so is wq_select_cpu()
- sched_select_cpu() called only from __queue_work()
- got tejun/for-3.7 branch in my tree, before making workqueue changes.
Viresh Kumar (4):
sched: Create sched_select_cpu() to give preferred CPU for power
saving
timer: hrtimer: Don't check idle_cpu() before calling
get_nohz_timer_target()
workqueue: Schedule work on non-idle cpu instead of current one
timer: Migrate running timer
include/linux/sched.h | 16 ++++++++++--
include/linux/timer.h | 2 ++
kernel/hrtimer.c | 2 +-
kernel/sched/core.c | 69 +++++++++++++++++++++++++++++++--------------------
kernel/timer.c | 50 ++++++++++++++++++++++---------------
kernel/workqueue.c | 4 +--
6 files changed, 91 insertions(+), 52 deletions(-)
--
1.7.12.rc2.18.g61b472e
I am attaching a patch by Thomas Gleixner which adds a kernel
command line parameter to set the defauilt IRQ affinity mask. Could
you please integrate this in your tree for the next Linaro release?
I've been using this patch for sometime now and it doesn't introduce
any regressions. There is a possibility that this patch will make it
upstream via the RT patches in the near future but in the meanwhile,
we'd like to carry this patch as well.
Cheers,
Punit
>From 52a7d44f58a262e166575abc57aa0bd3bfc8cfbb Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Fri, 25 May 2012 16:59:47 +0200
Subject: [PATCH] genirq: Add default affinity mask command line option
If we isolate CPUs, then we don't want random device interrupts on
them. Even w/o the user space irq balancer enabled we can end up with
irqs on non boot cpus.
Allow to restrict the default irq affinity mask.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
---
Documentation/kernel-parameters.txt | 9 +++++++++
kernel/irq/irqdesc.c | 21 +++++++++++++++++++--
2 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7d82468..00fedab 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1164,6 +1164,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
See comment before ip2_setup() in
drivers/char/ip2/ip2base.c.
+ irqaffinity= [SMP] Set the default irq affinity mask
+ Format:
+ <cpu number>,...,<cpu number>
+ or
+ <cpu number>-<cpu number>
+ (must be a positive range in ascending order)
+ or a mixture
+ <cpu number>,...,<cpu number>-<cpu number>
+
irqfixup [HW]
When an interrupt is not handled search all handlers
for it. Intended to get systems with badly broken
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 192a302..473b2b6 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -23,10 +23,27 @@
static struct lock_class_key irq_desc_lock_class;
#if defined(CONFIG_SMP)
+static int __init irq_affinity_setup(char *str)
+{
+ zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
+ cpulist_parse(str, irq_default_affinity);
+ /*
+ * Set at least the boot cpu. We don't want to end up with
+ * bugreports caused by random comandline masks
+ */
+ cpumask_set_cpu(smp_processor_id(), irq_default_affinity);
+ return 1;
+}
+__setup("irqaffinity=", irq_affinity_setup);
+
static void __init init_irq_default_affinity(void)
{
- alloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
- cpumask_setall(irq_default_affinity);
+#ifdef CONFIG_CPUMASK_OFFSTACK
+ if (!irq_default_affinity)
+ zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT);
+#endif
+ if (cpumask_empty(irq_default_affinity))
+ cpumask_setall(irq_default_affinity);
}
#else
static void __init init_irq_default_affinity(void)
--
1.7.9.5
The nr_busy_cpus field of the sched_group_power is sometime different from 0
whereas the platform is fully idle. This serie fixes 3 use cases:
- when the SCHED softirq is raised on an idle core for idle load balance but
the platform doesn't go out of the cpuidle state
- when some CPUs enter idle state while booting all CPUs
- when a CPU is unplug and/or replug
Vincent Guittot (3):
sched: fix nr_busy_cpus with coupled cpuidle
sched: fix init NOHZ_IDLE flag
sched: fix update NOHZ_IDLE flag
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 2 +-
kernel/time/tick-sched.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
--
1.7.9.5
cpufreq core doesn't manage offline cpus and if driver->init() has returned
mask including offline cpus, it may result in unwanted behavior by cpufreq core
or governors.
We need to get only online cpus in this mask. There are two places to fix this
mask, cpufreq core and cpufreq driver. It makes sense to do this at common place
and hence is done in core.
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
drivers/cpufreq/cpufreq.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1f93dbd..de99517 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -970,6 +970,13 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
pr_debug("initialization failed\n");
goto err_unlock_policy;
}
+
+ /*
+ * affected cpus must always be the one, which are online. We aren't
+ * managing offline cpus here.
+ */
+ cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
+
policy->user_policy.min = policy->min;
policy->user_policy.max = policy->max;
--
1.7.12.rc2.18.g61b472e
The states could be sorted in the power backward order, we get rid of
some lines of code and by this way that fixes also a bug in the dynamic
c-states.
Changelog:
V2:
* added the optimization in the select menu governor
Daniel Lezcano (2):
cpuidle - remove the power_specified field in the driver
cpuidle - optimize the select function for the 'menu' governor
drivers/cpuidle/cpuidle.c | 17 ++++-------------
drivers/cpuidle/driver.c | 25 -------------------------
drivers/cpuidle/governors/menu.c | 20 ++++++++------------
include/linux/cpuidle.h | 2 +-
4 files changed, 13 insertions(+), 51 deletions(-)
--
1.7.5.4
Hi,
I tried booting linux-linaro-2.6.37 kernel on my beagle board C4. I executed
following:
1. Installed linaro on a 4 GB SD card using linaro-image-tools 0.4.1 with
hwpack daily snapshot hwpack_linaro-omap3_20110125-0_armel_supported.tar.gz
and linaro-natty-headless-tar-20101202-1.tar.gz. It was booting properly on
my BB.
2. Cloned linux-linaro-2.6.37. Changed to source directory
3. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- omap2plus_defconfig
4. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- menuconfig (enabled
EARLY_PRINTK)
5. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- uImage
6. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- modules
7. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- modules_install
INSTALL_MOD_PATH=/media/rootfs
8. cp arch/arm/boot/uImage /media/boot; sync
Everything went on smoothly. Then I put the SD card on BB and powered it on.
I got a kernel panic: http://paste.ubuntu.com/560562
Please help me figuring out the problem. Is it because I didn't create
uInitrd? If so, then how to create it for ARM?
Regards,
Avik
Hi all,
this patch series implements Xen support for ARMv7 with virtualization
extensions. It allows a Linux guest to boot as dom0 and
as domU on Xen on ARM. PV console, disk and network frontends and
backends are all working correctly.
It has been tested on a Versatile Express Cortex A15 emulator, using the
latest Xen ARM developement branch
(git://xenbits.xen.org/people/ianc/xen-unstable.git arm-for-4.3) plus
the "ARM hypercall ABI: 64 bit ready" patch series
(http://marc.info/?l=xen-devel&m=134426267205408), and a simple ad-hoc
tool to build guest domains (marc.info/?l=xen-devel&m=134089788016546).
The patch marked with [HACK] has been dropped from this series, however
you can find it here:
http://marc.info/?l=linux-kernel&m=134513277823527&w=2.
I am also attaching to this email the dts'es that I am currently using
for dom0 and domU: vexpress-v2p-ca15-tc1.dts (that includes
vexpress-v2m-rs1-rtsm.dtsi) is the dts used for dom0 and it is passed to
Linux by Xen, while vexpress-virt.dts is the dts used for other domUs
and it is appended in binary form to the guest kernel image. I am not
sure where they are supposed to live yet, so I am just attaching them
here so that people can actually try out this series if they want to.
Comments are very welcome!
Patch #21 "arm/v2m: initialize arch_timers even if v2m_timer is not
present" touches generic ARM code and still needs to be acked/reviewed.
Arnd, Russell, what do you think about this series? If you are OK with
it, to whom should I submit it?
Changes in v4:
- rebase on 3.6-rc5;
- devicetree: "xen,xen" should be last as it is less specific;
- devicetree: use 2 address-cells and 2 size-cells in the reg property;
- do not xs_reset_watches on dom0;
- compile drivers/xen/pcpu.c only on x86;
- use "+=" instead of ":=" for dom0- targets;
- add a patch to update the MAINTAINERS file.
Changes in v3:
- move patches that have been picked up by Konrad at the end of the
series;
- improve comments;
- add a doc to describe the Xen Device Tree format;
- do not use xen_ulong_t for multicalls and apic_physbase;
- add a patch at the end of the series to use the new __HVC macro;
- add missing pvclock-abi.h include to ia64 header files;
- do not use an anonymous union in struct xen_add_to_physmap.
Changes in v2:
- fix up many comments and commit messages;
- remove the early_printk patches: rely on the emulated serial for now;
- remove the xen_guest_init patch: without any PV early_printk, we don't
need any early call to xen_guest_init, we can rely on core_initcall
alone;
- define an HYPERCALL macro for 5 arguments hypercall wrappers, even if
at the moment is unused;
- use ldm instead of pop in the hypercall wrappers;
- return -ENOSYS rather than -1 from the unimplemented grant_table
functions;
- remove the pvclock ifdef in the Xen headers;
- remove include linux/types.h from xen/interface/xen.h;
- replace pr_info with pr_debug in xen_guest_init;
- add a new patch to introduce xen_ulong_t and use it top replace all
the occurences of unsigned long in the public Xen interface;
- explicitely size all the pointers to 64 bit on ARM, so that the
hypercall ABI is "64 bit ready";
- clean up xenbus_init;
- make pci.o depend on CONFIG_PCI and acpi.o depend on CONFIG_ACPI;
- mark Xen guest support on ARM as EXPERIMENTAL;
- introduce GRANT_TABLE_PHYSADDR;
- remove unneeded initialization of boot_max_nr_grant_frames;
- add a new patch to clear IRQ_NOAUTOEN and IRQ_NOREQUEST in events.c;
- return -EINVAL from xen_remap_domain_mfn_range if
auto_translated_physmap;
- retain binary compatibility in xen_add_to_physmap: use a union to
introduce foreign_domid.
Shortlog and diffstat:
Stefano Stabellini (24):
arm: initial Xen support
xen/arm: hypercalls
xen/arm: page.h definitions
xen/arm: sync_bitops
xen/arm: empty implementation of grant_table arch specific functions
docs: Xen ARM DT bindings
xen/arm: Xen detection and shared_info page mapping
xen/arm: Introduce xen_pfn_t for pfn and mfn types
xen/arm: Introduce xen_ulong_t for unsigned long
xen/arm: compile and run xenbus
xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
xen/arm: introduce CONFIG_XEN on ARM
xen/arm: get privilege status
xen/arm: initialize grant_table on ARM
xen/arm: receive Xen events on ARM
xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
xen: allow privcmd for HVM guests
xen/arm: compile blkfront and blkback
xen/arm: compile netback
arm/v2m: initialize arch_timers even if v2m_timer is not present
xen: missing includes
xen: update xen_add_to_physmap interface
MAINTAINERS: add myself as Xen ARM maintainer
Documentation/devicetree/bindings/arm/xen.txt | 22 ++++
MAINTAINERS | 7 +
arch/arm/Kconfig | 10 ++
arch/arm/Makefile | 1 +
arch/arm/include/asm/hypervisor.h | 6 +
arch/arm/include/asm/sync_bitops.h | 27 ++++
arch/arm/include/asm/xen/events.h | 18 +++
arch/arm/include/asm/xen/hypercall.h | 69 ++++++++++
arch/arm/include/asm/xen/hypervisor.h | 19 +++
arch/arm/include/asm/xen/interface.h | 73 +++++++++++
arch/arm/include/asm/xen/page.h | 82 ++++++++++++
arch/arm/mach-vexpress/v2m.c | 11 +-
arch/arm/xen/Makefile | 1 +
arch/arm/xen/enlighten.c | 168 +++++++++++++++++++++++++
arch/arm/xen/grant-table.c | 53 ++++++++
arch/arm/xen/hypercall.S | 106 ++++++++++++++++
arch/ia64/include/asm/xen/interface.h | 8 +-
arch/x86/include/asm/xen/interface.h | 8 ++
arch/x86/xen/enlighten.c | 1 +
arch/x86/xen/irq.c | 1 +
arch/x86/xen/mmu.c | 3 +
arch/x86/xen/xen-ops.h | 1 -
drivers/block/xen-blkback/blkback.c | 1 +
drivers/net/xen-netback/netback.c | 1 +
drivers/net/xen-netfront.c | 1 +
drivers/tty/hvc/hvc_xen.c | 2 +
drivers/xen/Makefile | 13 ++-
drivers/xen/events.c | 18 +++-
drivers/xen/grant-table.c | 1 +
drivers/xen/privcmd.c | 4 -
drivers/xen/xenbus/xenbus_comms.c | 2 +-
drivers/xen/xenbus/xenbus_probe.c | 62 +++++++---
drivers/xen/xenbus/xenbus_probe_frontend.c | 1 +
drivers/xen/xenbus/xenbus_xs.c | 3 +-
include/xen/events.h | 2 +
include/xen/interface/features.h | 3 +
include/xen/interface/grant_table.h | 4 +-
include/xen/interface/io/protocols.h | 3 +
include/xen/interface/memory.h | 21 ++--
include/xen/interface/physdev.h | 2 +-
include/xen/interface/platform.h | 4 +-
include/xen/interface/version.h | 2 +-
include/xen/interface/xen.h | 7 +-
include/xen/privcmd.h | 3 +-
include/xen/xen.h | 2 +-
45 files changed, 796 insertions(+), 61 deletions(-)
A branch based on 3.6-rc5 is available here:
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 3.6-rc5-arm-4
Cheers,
Stefano
To enable CPU hotplug the need to provide some boot code at the reset
vector and which survives after the kernel has booted without being
overwritten. We achieve this by the getting the linker script to place
the code in boot.S at address zero. This now means we can delete the
code that relocates the secondary CPU pen code to "a location less
likely to be overridden".
We then modify the boot protocol slightly to allow hot-plugging of any
CPU, including CPU #0, when the system is already booted. This is done
by checking if SYS_FLAGS is already set before the normal check for CPU0
and the boot-or-wait decision made.
This patch is based on work by Nicolas Pitre.
Signed-off-by: Nicolas Pitre <nico(a)linaro.org>
Signed-off-by: Jon Medhurst <tixy(a)linaro.org>
---
boot.S | 41 ++++++++++++++++++++++++-----------------
model.lds.S | 3 ++-
2 files changed, 26 insertions(+), 18 deletions(-)
diff --git a/boot.S b/boot.S
index dd453e3..fb56693 100644
--- a/boot.S
+++ b/boot.S
@@ -12,6 +12,8 @@
.arch_extension virt
.text
+ b start @ Must be first instruction for power on reset vector
+
.macro enter_hyp
@ We assume we're entered in Secure Supervisor mode. To
@ get to Hyp mode we have to pass through Monitor mode
@@ -119,27 +121,32 @@ start:
orr r0, r0, r1
mcr p15, 0, r0, c1, c1, 2
- @ Check CPU nr again
- mrc p15, 0, r0, c0, c0, 5 @ MPIDR (ARMv7 only)
- bfc r0, #24, #8 @ CPU number, taking multicluster into account
- cmp r0, #0 @ primary CPU?
- beq 2f
-
- @
- @ Secondary CPUs (following the RealView SMP booting protocol)
- @
- enter_hyp
-
- ldr r1, =fs_start - 0x100
- adr r2, 1f
- ldmia r2, {r3 - r7} @ move the code to a location
- stmia r1, {r3 - r7} @ less likely to be overridden
+ /*
+ * If SYS_FLAGS is already set, this is a warm boot and we blindly
+ * branch to the indicated address right away, irrespective of the
+ * CPU we are.
+ */
#ifdef VEXPRESS
ldr r0, =0x1c010030 @ VE SYS_FLAGS register
#else
ldr r0, =0x10000030 @ RealView SYS_FLAGS register
#endif
- mov pc, r1 @ branch to the relocated code
+ ldr r1, [r0]
+ cmp r1, #0
+ beq 1f
+ enter_hyp
+ bx r1
+1:
+ /*
+ * Otherwise this is a cold boot. In this case it depends if
+ * we are the primary CPU or not. The primary CPU boots the system
+ * while the secondaries wait for the primary to set SYS_FLAGS.
+ */
+ mrc p15, 0, r1, c0, c0, 5
+ bics r1, #0xff000000
+ beq 2f
+
+ enter_hyp
1:
#ifdef VEXPRESS
wfe
@@ -147,7 +154,7 @@ start:
ldr r1, [r0]
cmp r1, #0
beq 1b
- mov pc, r1 @ branch to the given address
+ bx r1 @ branch to the given address
#endif
2:
diff --git a/model.lds.S b/model.lds.S
index 793df89..f37824e 100644
--- a/model.lds.S
+++ b/model.lds.S
@@ -27,7 +27,8 @@ STACKTOP = 0xff000000;
SECTIONS
{
- . = PHYS_OFFSET;
+ . = 0;
+ .boot : { boot.o }
. = PHYS_OFFSET + 0x8000 - 0x40;
--
1.7.10.4
Hi Viresh,
Here are the latest patches for HMP tunables to be included in the MP branch for the 12.11
release. They depend on Vincent Guittot's patches that you have on -exp-v12 branch which
we need included in the PULL request. Testing shows that they improve power consumption for HMP.
Thanks,
Liviu