This patchset implements "kiosk" mode for KDB debugger and is a
continuation of previous work by Anton Vorontsov (dating back to late
2012).
When kiosk mode is engaged several kdb commands become disabled leaving
only status reporting functions working normally. In particular arbitrary
memory read/write is prevented and it is no longer possible to alter
program flow.
Note that the commands that remain enabled are sufficient to run the
post-mortem macro commands, dumpcommon, dumpall and dumpcpu. One of the
motivating use-cases for this work is to realize post-mortem on embedded
devices (such as phones) without allowing the debug facility to be easily
exploited to compromise user privacy. In principle this means the feature
can be enabled on production devices.
There are a few patches, some are just cleanups, some are churn-ish
cleanups, but inevitable. And the rest implements the mode -- after all
the preparations, everything is pretty straightforward. The first patch
is actually a pure bug fix (arguably unrelated to kiosk mode) but
collides with the kiosk code to honour the sysrq mask so I have included
it here.
Changes since v1 (circa 2012):
* ef (Display exception frame) is essentially an overly complex peek
and has therefore been marked unsafe
* bt (Stack traceback) has been marked safe only with no arguments
* sr (Magic SysRq key) honours the sysrq mask when called in kiosk
mode
* Fixed over-zealous blocking of macro commands
* Symbol lookup is forbidden by kdbgetaddrarg (more robust, better
error reporting to user)
* Fix deadlock in sr (Magic SysRq key)
* Better help text in kiosk mode
* Default (kiosk on/off) can be changed From the config file.
Anton Vorontsov (7):
kdb: Remove currently unused kdbtab_t->cmd_flags
kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags
kdb: Rename kdb_register_repeat() to kdb_register_flags()
kdb: Use KDB_REPEAT_* values as flags
kdb: Remove KDB_REPEAT_NONE flag
kdb: Mark safe commands as KDB_SAFE and KDB_SAFE_NO_ARGS
kdb: Add kiosk mode
Daniel Thompson (3):
sysrq: Implement __handle_sysrq_nolock to avoid recursive locking in
kdb
kdb: Improve usability of help text when running in kiosk mode
kdb: Allow access to sensitive commands to be restricted by default
drivers/tty/sysrq.c | 11 ++-
include/linux/kdb.h | 20 ++--
include/linux/sysrq.h | 1 +
kernel/debug/kdb/kdb_bp.c | 22 ++---
kernel/debug/kdb/kdb_main.c | 207 +++++++++++++++++++++++------------------
kernel/debug/kdb/kdb_private.h | 3 +-
kernel/trace/trace_kdb.c | 4 +-
lib/Kconfig.kgdb | 21 +++++
8 files changed, 172 insertions(+), 117 deletions(-)
--
1.9.0
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE.
Now that we have the original capacity of a CPUS and its activity/utilization,
we can evaluate more accuratly the capacity of a group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we can certainly take advantage of this new
statistic in several other places of the load balance.
TODO:
- align variable's and field's name with the renaming [3]
Tests results:
I have put below results of 2 tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform
on a dual cortex-A7
hackbench scp
tip/master 25.75s(+/-0.25) 5.16MB/s(+/-1.49)
+ patches 1,2 25.89s(+/-0.31) 5.18MB/s(+/-1.45)
+ patches 3-10 25.68s(+/-0.22) 7.00MB/s(+/-1.88)
+ irq accounting 25.80s(+/-0.25) 8.06MB/s(+/-0.05)
on a quad cortex-A15
hackbench scp
tip/master 15.69s(+/-0.16) 9.70MB/s(+/-0.04)
+ patches 1,2 15.53s(+/-0.13) 9.72MB/s(+/-0.05)
+ patches 3-10 15.56s(+/-0.22) 9.88MB/s(+/-0.05)
+ irq accounting 15.99s(+/-0.08) 10.37MB/s(+/-0.03)
The improvement of scp bandwidth happens when tasks and irq are using
different CPU which is a bit random without irq accounting config
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/5/14/622
Vincent Guittot (11):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the power_orig
ARM: topology: use new cpu_power interface
sched: add per rq cpu_power_orig
Revert "sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED"
sched: get CPU's activity statistic
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
sched: replace capacity by activity
arch/arm/kernel/topology.c | 4 +-
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 229 ++++++++++++++++++++++-----------------------
kernel/sched/sched.h | 5 +-
4 files changed, 118 insertions(+), 122 deletions(-)
--
1.9.1
When building a multi_v7_defconfig kernel it is not possible to configure
DEBUG_LL to use any serial device except a ARM Primecell PL01X, or more
accurately and worse, it is possible to configure a different serial
device but KConfig does not honour this request.
The happens because the multi-platform mode may include ARCH_SPEAR13XX
and this forcibly engages DEBUG_UART_PL01X to provide some kind of
compatibility with single platform builds (SPEAr supports both single and
multi-platform). This in turn causes DEBUG_LL_INCLUDE to wedge at
debug/pl01x.S.
Problem is fixed by only deploying the compatibility options for SPEAr
when ARCH_MULTIPLATFORM is not set.
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
---
arch/arm/Kconfig.debug | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index 0531da8..f10c784 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -991,9 +991,9 @@ config DEBUG_LL_INCLUDE
config DEBUG_UART_PL01X
def_bool ARCH_EP93XX || \
ARCH_INTEGRATOR || \
- ARCH_SPEAR3XX || \
- ARCH_SPEAR6XX || \
- ARCH_SPEAR13XX || \
+ (ARCH_SPEAR3XX && !ARCH_MULTIPLATFORM) || \
+ (ARCH_SPEAR6XX && !ARCH_MULTIPLATFORM) || \
+ (ARCH_SPEAR13XX && !ARCH_MULTIPLATFORM) || \
ARCH_VERSATILE
# Compatibility options for 8250
--
1.9.0
This patchset makes it possible to use the kgdb NMI infrastructure
on ARM platforms.
The kgdb NMI infrastructure works by re-routing an UARTs interrupt
signal from IRQ to FIQ. The UART will no longer function normally
and will instead be managed by kgdb using the polled I/O functions.
Any character delivered to the UART causes the kgdb handler function
to be called. Each serial driver explicitly consents (or not) to this
abuse by calling the appropriate registration functions.
[PATCH 1/8] arm: fiq: Allow EOI to be communicated to the intc
[PATCH 2/8] irqchip: gic: Provide support for interrupt grouping
Both these patches lay the ground work to allow modern ARM
interrupt controllers to support FIQ correctly.
[PATCH 3/8] ARM: Move some macros from entry-armv to entry-header
[PATCH 4/8] ARM: Add KGDB/KDB FIQ debugger generic code
This is the heart of the patch series, allowing FIQs to be
registered with KGDB and handled by KGDB.
[PATCH 5/8] serial: amba-pl011: Pass on FIQ information to KGDB.
[PATCH 6/8] serial: asc: Add support for KGDB's FIQ/NMI mode
Extend to UART drivers to allow the register the appropriate FIQ
(implicitly promising to behave properly when their own IRQ handler
is cut off).
[PATCH 7/8] ARM: VIC: Add vic_set_fiq function to select if an...
[PATCH 8/8] arm: fiq: Hack FIQ routing backdoors into GIC and VIC
Here we hit the serious request-for-comment section. It is not
clear what the best way to get the interrupt controller to re-route
an interrupt source from the IRQ signal to the FIQ signal.
Clearly the approach here is wrong but it has been enough for me
to test my work so far.
Anton Vorontsov (2):
ARM: Move some macros from entry-armv to entry-header
ARM: Add KGDB/KDB FIQ debugger generic code
Arve Hjønnevåg (1):
ARM: VIC: Add vic_set_fiq function to select if an interrupt should
generate an IRQ or FIQ
Daniel Thompson (5):
arm: fiq: Allow EOI to be communicated to the intc
irqchip: gic: Provide support for interrupt grouping
serial: amba-pl011: Pass on FIQ information to KGDB.
serial: asc: Add support for KGDB's FIQ/NMI mode
arm: fiq: Hack FIQ routing backdoors into GIC and VIC
arch/arm/Kconfig | 2 +
arch/arm/Kconfig.debug | 18 ++++
arch/arm/boot/dts/stih416.dtsi | 2 +-
arch/arm/boot/dts/vexpress-v2m-rs1.dtsi | 2 +-
arch/arm/include/asm/fiq.h | 1 +
arch/arm/include/asm/kgdb.h | 7 ++
arch/arm/kernel/Makefile | 1 +
arch/arm/kernel/entry-armv.S | 151 +----------------------------
arch/arm/kernel/entry-header.S | 164 ++++++++++++++++++++++++++++++++
arch/arm/kernel/fiq.c | 50 ++++++++++
arch/arm/kernel/kgdb_fiq.c | 117 +++++++++++++++++++++++
arch/arm/kernel/kgdb_fiq_entry.S | 87 +++++++++++++++++
drivers/irqchip/irq-gic.c | 62 +++++++++++-
drivers/irqchip/irq-vic.c | 23 +++++
drivers/tty/serial/amba-pl011.c | 18 +++-
drivers/tty/serial/st-asc.c | 25 +++++
include/linux/irqchip/arm-gic.h | 3 +
include/linux/irqchip/arm-vic.h | 1 +
18 files changed, 576 insertions(+), 158 deletions(-)
create mode 100644 arch/arm/kernel/kgdb_fiq.c
create mode 100644 arch/arm/kernel/kgdb_fiq_entry.S
--
1.9.0
A new atomic modeset/pageflip ioctl being developed in DRM requires
get_user() to work for 64bit types (in addition to just put_user()).
v1: original
v2: pass correct size to check_uaccess, and better handling of narrowing
double word read with __get_user_xb() (Russell King's suggestion)
v3: fix a couple of checkpatch issues
Signed-off-by: Rob Clark <robdclark(a)gmail.com>
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
Cc: Russell King - ARM Linux <linux(a)arm.linux.org.uk>
---
Notes:
I'd like to wake this patch up again. It was rejected back in 2012 on
the grounds that other architectures (notably x86-32) didn't implement
this so adding for ARM risked portability problems in drivers. However
shortly after the discussion (in fact I believe that as a *result* of
that discussion) support for 64-bit get_user() was added for x86-32.
A quick review of different architectures uaccess.h shows that ARM is
in the minority (even after excluding 64-bit architectures) in not
implementing this feature.
The reasons to wake it up are the same as before. Recent contributions,
including to DRM[1] and binder[2] would prefer to use the 64-bit values
in their interfaces without gotchas like having to use copy_from_user().
[1] http://thread.gmane.org/gmane.comp.video.dri.devel/102135/focus=102149
[2] http://thread.gmane.org/gmane.linux.kernel/1653448/focus=1653449
arch/arm/include/asm/uaccess.h | 18 +++++++++++++++++-
arch/arm/lib/getuser.S | 17 ++++++++++++++++-
2 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h
index 75d9579..5f7db3fb 100644
--- a/arch/arm/include/asm/uaccess.h
+++ b/arch/arm/include/asm/uaccess.h
@@ -107,6 +107,7 @@ static inline void set_fs(mm_segment_t fs)
extern int __get_user_1(void *);
extern int __get_user_2(void *);
extern int __get_user_4(void *);
+extern int __get_user_8(void *);
#define __GUP_CLOBBER_1 "lr", "cc"
#ifdef CONFIG_CPU_USE_DOMAINS
@@ -115,6 +116,7 @@ extern int __get_user_4(void *);
#define __GUP_CLOBBER_2 "lr", "cc"
#endif
#define __GUP_CLOBBER_4 "lr", "cc"
+#define __GUP_CLOBBER_8 "lr", "cc"
#define __get_user_x(__r2,__p,__e,__l,__s) \
__asm__ __volatile__ ( \
@@ -125,11 +127,19 @@ extern int __get_user_4(void *);
: "0" (__p), "r" (__l) \
: __GUP_CLOBBER_##__s)
+/* narrowing a double-word get into a single 32bit word register: */
+#ifdef BIG_ENDIAN
+#define __get_user_xb(__r2, __p, __e, __l, __s) \
+ __get_user_x(__r2, (uintptr_t)__p + 4, __e, __l, __s)
+#else
+#define __get_user_xb __get_user_x
+#endif
+
#define __get_user_check(x,p) \
({ \
unsigned long __limit = current_thread_info()->addr_limit - 1; \
register const typeof(*(p)) __user *__p asm("r0") = (p);\
- register unsigned long __r2 asm("r2"); \
+ register typeof(x) __r2 asm("r2"); \
register unsigned long __l asm("r1") = __limit; \
register int __e asm("r0"); \
switch (sizeof(*(__p))) { \
@@ -142,6 +152,12 @@ extern int __get_user_4(void *);
case 4: \
__get_user_x(__r2, __p, __e, __l, 4); \
break; \
+ case 8: \
+ if (sizeof((x)) < 8) \
+ __get_user_xb(__r2, __p, __e, __l, 4); \
+ else \
+ __get_user_x(__r2, __p, __e, __l, 8); \
+ break; \
default: __e = __get_user_bad(); break; \
} \
x = (typeof(*(p))) __r2; \
diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S
index 9b06bb4..ed98707 100644
--- a/arch/arm/lib/getuser.S
+++ b/arch/arm/lib/getuser.S
@@ -18,7 +18,7 @@
* Inputs: r0 contains the address
* r1 contains the address limit, which must be preserved
* Outputs: r0 is the error code
- * r2 contains the zero-extended value
+ * r2, r3 contains the zero-extended value
* lr corrupted
*
* No other registers must be altered. (see <asm/uaccess.h>
@@ -66,6 +66,19 @@ ENTRY(__get_user_4)
mov pc, lr
ENDPROC(__get_user_4)
+ENTRY(__get_user_8)
+ check_uaccess r0, 8, r1, r2, __get_user_bad
+#ifdef CONFIG_THUMB2_KERNEL
+5: TUSER(ldr) r2, [r0]
+6: TUSER(ldr) r3, [r0, #4]
+#else
+5: TUSER(ldr) r2, [r0], #4
+6: TUSER(ldr) r3, [r0]
+#endif
+ mov r0, #0
+ mov pc, lr
+ENDPROC(__get_user_8)
+
__get_user_bad:
mov r2, #0
mov r0, #-EFAULT
@@ -77,4 +90,6 @@ ENDPROC(__get_user_bad)
.long 2b, __get_user_bad
.long 3b, __get_user_bad
.long 4b, __get_user_bad
+ .long 5b, __get_user_bad
+ .long 6b, __get_user_bad
.popsection
--
1.9.3
This patchset started out as a simple patch to introduce the irqs
command from Android's FIQ debugger to kdb. However it has since grown
more powerful because allowing kdb to reuse existing kernel
infrastructure gives us extra opportunities.
Based on the comments at the top of irqdesc.h (plotting to take the
irq_desc structure private to kernel/irq) and the relative similarity
between FIQ debugger's irqs command and the contents /proc/interrupts
we start by adding a kdb feature to print seq_files. This forms the
foundation for a new command, interrupts.
I have also been able to implement a much more generic command,
seq_file, that can display a good number of files from pseudo
filesystems. This command is very powerful although that power does mean
care must be taken to deploy it safely. It is deliberately and by
default aimed at your foot!
Note that the risk associated with the seq_file command is why I
implemented the interrupts command in C (in principle it could have been
a kdb macro). Doing it in C codifies the need for show_interrupts() to
continue using spin locks as its locking strategy.
To give an idea of what can be done with this command. The following
seq_operations structures worked correctly and report no errors:
cpuinfo_op
extfrag_op
fragmentation_op
gpiolib_seq_ops
int_seq_ops (a.k.a. /proc/interrupts)
pagetypeinfo_op
unusable_op
vmalloc_op
zoneinfo_op
The following display the information correctly but triggered errors
(sleeping function called from invalid context) with lock debugging
enabled:
consoles_op
crypto_seq_ops
diskstats_op
partitions_op
slabinfo_op
vmstat_op
All tests are run on an ARM multi_v7_defconfig kernel (plus lots of
debug features) and halted using magic SysRq so that kdb has interrupt
context. Note also that some of the seq_operations structures hook into
driver supplied code that will only be called if that driver is enabled
so the test above are useful but cannot be exhaustive.
Daniel Thompson (3):
kdb: Add framework to display sequence files
proc: Provide access to /proc/interrupts from kdb
kdb: Implement seq_file command
fs/proc/interrupts.c | 10 +++++++++
include/linux/kdb.h | 3 +++
kernel/debug/kdb/kdb_io.c | 51 +++++++++++++++++++++++++++++++++++++++++++++
kernel/debug/kdb/kdb_main.c | 28 +++++++++++++++++++++++++
4 files changed, 92 insertions(+)
--
1.9.0
The following driver is for exynos4210. I did not yet finished the other boards, so
I created a specific driver for 4210 which could be merged later.
The driver is based on Colin Cross's driver found at:
https://android.googlesource.com/kernel/exynos/+/e686b1ec67423c40b4fdf811f9…
This one was based on a 3.4 kernel and an old API.
It has been refreshed, simplified and based on the recent code cleanup I sent
today.
The AFTR could be entered when all the cpus (except cpu0) are down. In order to
reach this situation, the couple idle states are used.
There is a sync barrier at the entry and the exit of the low power function. So
all cpus will enter and exit the function at the same time.
At this point, CPU0 knows the other cpu will power down itself. CPU0 waits for
the CPU1 to be powered down and then initiate the AFTR power down sequence.
No interrupts are handled by CPU1, this is why we switch to the timer broadcast
even if the local timer is not impacted by the idle state.
When CPU0 wakes up, it powers up CPU1 and waits for it to boot. Then they both
exit the idle function.
This driver allows the exynos4210 to have the same power consumption at idle
time than the one when we have to unplug CPU1 in order to let CPU0 to reach
the AFTR state.
This patch is a RFC because, we have to find a way to remove the macros
definitions and cpu powerdown function without pulling the arch dependent
headers.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
arch/arm/mach-exynos/common.c | 11 +-
drivers/cpuidle/Kconfig.arm | 8 ++
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/cpuidle-exynos4210.c | 226 ++++++++++++++++++++++++++++++++++
4 files changed, 245 insertions(+), 1 deletion(-)
create mode 100644 drivers/cpuidle/cpuidle-exynos4210.c
diff --git a/arch/arm/mach-exynos/common.c b/arch/arm/mach-exynos/common.c
index d5fa21e..1765a98 100644
--- a/arch/arm/mach-exynos/common.c
+++ b/arch/arm/mach-exynos/common.c
@@ -299,9 +299,18 @@ static struct platform_device exynos_cpuidle = {
.id = -1,
};
+static struct platform_device exynos4210_cpuidle = {
+ .name = "exynos4210-cpuidle",
+ .dev.platform_data = exynos_sys_powerdown_aftr,
+ .id = -1,
+};
+
void __init exynos_cpuidle_init(void)
{
- platform_device_register(&exynos_cpuidle);
+ if (soc_is_exynos4210())
+ platform_device_register(&exynos4210_cpuidle);
+ else
+ platform_device_register(&exynos_cpuidle);
}
void __init exynos_cpufreq_init(void)
diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
index 92f0c12..2772130 100644
--- a/drivers/cpuidle/Kconfig.arm
+++ b/drivers/cpuidle/Kconfig.arm
@@ -51,3 +51,11 @@ config ARM_EXYNOS_CPUIDLE
depends on ARCH_EXYNOS
help
Select this to enable cpuidle for Exynos processors
+
+config ARM_EXYNOS4210_CPUIDLE
+ bool "Cpu Idle Driver for the Exynos 4210 processor"
+ default y
+ depends on ARCH_EXYNOS
+ select ARCH_NEEDS_CPU_IDLE_COUPLED if SMP
+ help
+ Select this to enable cpuidle for the Exynos 4210 processors
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 0d1540a..e0ec9bc 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_ARM_ZYNQ_CPUIDLE) += cpuidle-zynq.o
obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
obj-$(CONFIG_ARM_AT91_CPUIDLE) += cpuidle-at91.o
obj-$(CONFIG_ARM_EXYNOS_CPUIDLE) += cpuidle-exynos.o
+obj-$(CONFIG_ARM_EXYNOS4210_CPUIDLE) += cpuidle-exynos4210.o
###############################################################################
# POWERPC drivers
diff --git a/drivers/cpuidle/cpuidle-exynos4210.c b/drivers/cpuidle/cpuidle-exynos4210.c
new file mode 100644
index 0000000..56f6d51
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-exynos4210.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2014 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com
+ *
+ * Copyright (c) 2014 Linaro : Daniel Lezcano <daniel.lezcano(a)linaro.org>
+ * http://www.linaro.org
+ *
+ * Based on the work of Colin Cross <ccross(a)android.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/cpuidle.h>
+#include <linux/cpu_pm.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+
+#include <asm/proc-fns.h>
+#include <asm/suspend.h>
+#include <asm/cpuidle.h>
+
+#include <plat/pm.h>
+#include <plat/cpu.h>
+#include <plat/map-base.h>
+#include <plat/map-s5p.h>
+
+static atomic_t exynos_idle_barrier;
+static atomic_t cpu1_wakeup = ATOMIC_INIT(0);
+
+#define BOOT_VECTOR S5P_VA_SYSRAM
+#define S5P_VA_PMU S3C_ADDR(0x02180000)
+#define S5P_PMUREG(x) (S5P_VA_PMU + (x))
+#define S5P_ARM_CORE1_CONFIGURATION S5P_PMUREG(0x2080)
+#define S5P_ARM_CORE1_STATUS S5P_PMUREG(0x2084)
+
+static void (*exynos_aftr)(void);
+
+static int cpu_suspend_finish(unsigned long flags)
+{
+ if (flags)
+ exynos_aftr();
+
+ cpu_do_idle();
+
+ return -1;
+}
+
+static int exynos_cpu0_enter_aftr(void)
+{
+ int ret = -1;
+
+ /*
+ * If the other cpu is powered on, we have to power it off, because
+ * the AFTR state won't work otherwise
+ */
+ if (cpu_online(1)) {
+
+ /*
+ * We reach a sync point with the coupled idle state, we know
+ * the other cpu will power down itself or will abort the
+ * sequence, let's wait for one of these to happen
+ */
+ while (__raw_readl(S5P_ARM_CORE1_STATUS) & 3) {
+
+ /*
+ * The other cpu may skip idle and boot back
+ * up again
+ */
+ if (atomic_read(&cpu1_wakeup))
+ goto abort;
+
+ /*
+ * The other cpu may bounce through idle and
+ * boot back up again, getting stuck in the
+ * boot rom code
+ */
+ if (__raw_readl(BOOT_VECTOR) == 0)
+ goto abort;
+
+ cpu_relax();
+ }
+ }
+
+ cpu_pm_enter();
+
+ ret = cpu_suspend(1, cpu_suspend_finish);
+
+ cpu_pm_exit();
+
+abort:
+ if (cpu_online(1)) {
+ /*
+ * Set the boot vector to something non-zero
+ */
+ __raw_writel(virt_to_phys(s3c_cpu_resume),
+ BOOT_VECTOR);
+ dsb();
+
+ /*
+ * Turn on cpu1 and wait for it to be on
+ */
+ __raw_writel(0x3, S5P_ARM_CORE1_CONFIGURATION);
+ while ((__raw_readl(S5P_ARM_CORE1_STATUS) & 3) != 3)
+ cpu_relax();
+
+ /*
+ * Wait for cpu1 to get stuck in the boot rom
+ */
+ while ((__raw_readl(BOOT_VECTOR) != 0) &&
+ !atomic_read(&cpu1_wakeup))
+ cpu_relax();
+
+ if (!atomic_read(&cpu1_wakeup)) {
+ /*
+ * Poke cpu1 out of the boot rom
+ */
+ __raw_writel(virt_to_phys(s3c_cpu_resume),
+ BOOT_VECTOR);
+ dsb_sev();
+ }
+
+ /*
+ * Wait for cpu1 to finish booting
+ */
+ while (!atomic_read(&cpu1_wakeup))
+ cpu_relax();
+ }
+
+ return ret;
+}
+
+static int exynos_powerdown_cpu1(void)
+{
+ int ret = -1;
+
+ /*
+ * Idle sequence for cpu1
+ */
+ if (cpu_pm_enter())
+ goto cpu1_aborted;
+
+ /*
+ * Turn off cpu 1
+ */
+ __raw_writel(0, S5P_ARM_CORE1_CONFIGURATION);
+
+ ret = cpu_suspend(0, cpu_suspend_finish);
+
+ cpu_pm_exit();
+
+cpu1_aborted:
+ dsb();
+ /*
+ * Notify cpu 0 that cpu 1 is awake
+ */
+ atomic_set(&cpu1_wakeup, 1);
+
+ return ret;
+}
+
+static int exynos_enter_aftr(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
+{
+ int ret;
+
+ __raw_writel(virt_to_phys(s3c_cpu_resume), BOOT_VECTOR);
+
+ /*
+ * Waiting all cpus to reach this point at the same moment
+ */
+ cpuidle_coupled_parallel_barrier(dev, &exynos_idle_barrier);
+
+ /*
+ * Both cpus will reach this point at the same time
+ */
+ ret = dev->cpu ? exynos_powerdown_cpu1() : exynos_cpu0_enter_aftr();
+ if (ret)
+ index = ret;
+
+ /*
+ * Waiting all cpus to finish the power sequence before going further
+ */
+ cpuidle_coupled_parallel_barrier(dev, &exynos_idle_barrier);
+
+ atomic_set(&cpu1_wakeup, 0);
+
+ return index;
+}
+
+static struct cpuidle_driver exynos_idle_driver = {
+ .name = "exynos4210_idle",
+ .owner = THIS_MODULE,
+ .states = {
+ ARM_CPUIDLE_WFI_STATE,
+ [1] = {
+ .enter = exynos_enter_aftr,
+ .exit_latency = 5000,
+ .target_residency = 10000,
+ .flags = CPUIDLE_FLAG_TIME_VALID |
+ CPUIDLE_FLAG_COUPLED | CPUIDLE_FLAG_TIMER_STOP,
+ .name = "C1",
+ .desc = "ARM power down",
+ },
+ },
+ .state_count = 2,
+ .safe_state_index = 0,
+};
+
+static int exynos_cpuidle_probe(struct platform_device *pdev)
+{
+ exynos_aftr = (void *)(pdev->dev.platform_data);
+
+ return cpuidle_register(&exynos_idle_driver, cpu_possible_mask);
+}
+
+static struct platform_driver exynos_cpuidle_driver = {
+ .driver = {
+ .name = "exynos4210-cpuidle",
+ .owner = THIS_MODULE,
+ },
+ .probe = exynos_cpuidle_probe,
+};
+
+module_platform_driver(exynos_cpuidle_driver);
--
1.7.9.5
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm
SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE. We don't try anymore to evaluate the number of
available cores based on the group_capacity but instead we detect when the group
is fully utilized
Now that we have the original capacity of CPUS and their activity/utilization,
we can evaluate more accuratly the capacity and the level of utilization of a
group of CPUs.
This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we could certainly take advantage of this
new statistic in several other places of the load balance.
Tests results:
I have put below results of 3 kind of tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform
- ebizzy with various number of threads
on 3 kernel
tip = tip/sched/core
patch = tip + this patchset
patch+irq = tip + this patchset + irq accounting
each test has been run 6 times and the figure below show the stdev and the
diff compared to the tip kernel
Dual cortex A7 tip | patch | patch+irq
stdev | diff stdev | diff stdev
hackbench (+/-)1.02% | +0.42%(+/-)1.29% | -0.65%(+/-)0.44%
scp (+/-)0.41% | +0.18%(+/-)0.10% | +78.05%(+/-)0.70%
ebizzy -t 1 (+/-)1.72% | +1.43%(+/-)1.62% | +2.58%(+/-)2.11%
ebizzy -t 2 (+/-)0.42% | +0.06%(+/-)0.45% | +1.45%(+/-)4.05%
ebizzy -t 4 (+/-)0.73% | +8.39%(+/-)13.25% | +4.25%(+/-)10.08%
ebizzy -t 6 (+/-)10.30% | +2.19%(+/-)3.59% | +0.58%(+/-)1.80%
ebizzy -t 8 (+/-)1.45% | -0.05%(+/-)2.18% | +2.53%(+/-)3.40%
ebizzy -t 10 (+/-)3.78% | -2.71%(+/-)2.79% | -3.16%(+/-)3.06%
ebizzy -t 12 (+/-)3.21% | +1.13%(+/-)2.02% | -1.13%(+/-)4.43%
ebizzy -t 14 (+/-)2.05% | +0.15%(+/-)3.47% | -2.08%(+/-)1.40%
uad cortex A15 tip | patch | patch+irq
stdev | diff stdev | diff stdev
hackbench (+/-)0.55% | -0.58%(+/-)0.90% | +0.62%(+/-)0.43%
scp (+/-)0.21% | -0.10%(+/-)0.10% | +5.70%(+/-)0.53%
ebizzy -t 1 (+/-)0.42% | -0.58%(+/-)0.48% | -0.29%(+/-)0.18%
ebizzy -t 2 (+/-)0.52% | -0.83%(+/-)0.20% | -2.07%(+/-)0.35%
ebizzy -t 4 (+/-)0.22% | -1.39%(+/-)0.49% | -1.78%(+/-)0.67%
ebizzy -t 6 (+/-)0.44% | -0.78%(+/-)0.15% | -1.79%(+/-)1.10%
ebizzy -t 8 (+/-)0.43% | +0.13%(+/-)0.92% | -0.17%(+/-)0.67%
ebizzy -t 10 (+/-)0.71% | +0.10%(+/-)0.93% | -0.36%(+/-)0.77%
ebizzy -t 12 (+/-)0.65% | -1.07%(+/-)1.13% | -1.13%(+/-)0.70%
ebizzy -t 14 (+/-)0.92% | -0.28%(+/-)1.25% | +2.84%(+/-)9.33%
I haven't been able to fully test the patchset for a SMT system to check that
the regression that has been reported by Preethi has been solved but the
various tests that i have done, don't show any regression so far.
The correction of SD_PREFER_SIBLING mode and the use of the latter at SMT level
should have fix the regression.
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
Vincent Guittot (12):
sched: fix imbalance flag reset
sched: remove a wake_affine condition
sched: fix avg_load computation
sched: Allow all archs to set the power_orig
ARM: topology: use new cpu_power interface
sched: add per rq cpu_power_orig
sched: test the cpu's capacity in wake affine
sched: move cfs task on a CPU with higher capacity
Revert "sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED"
sched: get CPU's utilization statistic
sched: replace capacity_factor by utilization
sched: add SD_PREFER_SIBLING for SMT level
arch/arm/kernel/topology.c | 4 +-
kernel/sched/core.c | 3 +-
kernel/sched/fair.c | 290 +++++++++++++++++++++++----------------------
kernel/sched/sched.h | 5 +-
4 files changed, 158 insertions(+), 144 deletions(-)
--
1.9.1