This patchset is a first attempt at providing a consolidation of idle code for the ARM processor architecture and a request for comment on the provided methodology. It relies and it is based on kernel features such as suspend/resume, pm notifiers and common code for cpu_reset().
It integrates latest patches from ALKML for cpu pm notifiers and a cpu_reset function hook. Patches are included in the patchset for completeness.
The patchset depends on the following patches, whose links are provided for completeness:
https://patchwork.kernel.org/patch/882892/ https://patchwork.kernel.org/patch/873692/ https://patchwork.kernel.org/patch/873682/ https://patchwork.kernel.org/patch/873672/
The idle framework defines a common entry point in [sr_entry.S]
cpu_enter_idle(cstate, rstate, flags)
where:
C-state [CPU state]:
0 - RUN MODE 1 - STANDBY 2 - DORMANT (not supported by this patch) 3 - SHUTDOWN
R-state [CLUSTER state]
0 - RUN 1 - STANDBY (not supported by this patch) 2 - L2 RAM retention 3 - SHUTDOWN
flags:
SR_SAVE_L2: L2 registers saved and restored on shutdown SR_SAVE_SCU: SCU reset on cluster wake-up
The assembly entry point checks the targeted state and executes wfi, entering a shallow c-state or call into the sr framework to put the cpu and cluster in low-power state. If the target is a deep low-power state it saves the current stack pointer and registers on the stack for the resume path.
On deep-power state entry, since the CPU is hitting the off state, the code switches page tables (cloned from init_mm at boot) to cater for 1:1 mapping of kernel code, data, and uncached reserved memory pages obtained from platform code through a hook:
platform_context_pointer(size)
Every platform using the framework should implement this hook to return reserved memory pages that are going to be mapped as uncached and 1:1 mapped to cater for the MMU off resume path. This decision has been made in order to avoid fiddling with L2 when CPU enters low-power (context should be flushed to L3 so that a CPU can fetch it from memory when the MMU is off).
On the resume path the CPU loads a non-cacheable stack pointer to cater for the MMU enabling path, and after switching page tables returns to the OS.
The non-cacheable stack simplifies the L2 management in that, since for single CPU shutdown the L2 is still enabled, on MMU resume some stack used before the MMU is turned on might still be present and valid in L2 leading to corruption. After MMU is enabled a few bytes of the stack frame are copied back to the Linux stack and execution resumes.
Generic subsystem save/restore is triggered by cpu pm notifiers, to save/restore GIC, VFP, PMU state automatically.
The patchset introduces a new notifier chain which notifies listeners of a required platform shutdown/wakeup. Platform code should register to the chain and execute all actions required to put the system in low-power mode when called. It is called within a virtual address space cloned from init_mm at arch_initcall.
On cluster shutdown L2 cache memory should be either cleaned (complete shutdown) or just save L2 logic (L2 RAM retained). This is a major issue since on power-down the stack points to cacheable memory that must be cleaned from L2 before disabling the L2 controller. Current code performing that action is a hack and provides ground for discussions. The stack might be switched to non-cacheable on power down but by doing this code relying on thread_info is broken unless that struct is copied across the stack switch.
Atomicity of the code is provided by strongly ordered locking algorithm (Lamport's Bakery) since when CPUs are out of coherency and the D$ look-up are disabled normal spinlocks based on ldrex/strex are not functional. Atomicity of L2 clean/invalidate L2 and reset of SCU are fundamental to guarantee system stability. Lamport's bakery code is provided for completeness and it can be ignored; please refer to the patch commit note.
Entry on low-power mode is performed by a function pointer (*sr_sleep), to allow platforms to override the default behaviour (and possibly execute from different memory spaces).
Tested on dual-core A9 Cluster through all system low-power states supported by the patchset. A8, A5 support compile tested.
Colin Cross (3): ARM: Add cpu power management notifiers ARM: gic: Use cpu pm notifiers to save gic state ARM: vfp: Use cpu pm notifiers to save vfp state
Lorenzo Pieralisi (13): ARM: kernel: save/restore kernel IF ARM: kernel: save/restore generic infrastructure ARM: kernel: save/restore v7 assembly helpers ARM: kernel: save/restore arch runtime support ARM: kernel: v7 resets support ARM: kernel: save/restore v7 infrastructure support ARM: kernel: add support for Lamport's bakery locks ARM: kernel: add SCU reset hook ARM: mm: L2x0 save/restore support ARM: kernel: save/restore 1:1 page tables ARM: perf: use cpu pm notifiers to save pmu state ARM: PM: enhance idle pm notifiers ARM: kernel: save/restore build infrastructure
Will Deacon (1): ARM: proc: add definition of cpu_reset for ARMv6 and ARMv7 cores
arch/arm/Kconfig | 18 ++ arch/arm/common/gic.c | 212 +++++++++++++++++++++++ arch/arm/include/asm/cpu_pm.h | 69 ++++++++ arch/arm/include/asm/lb_lock.h | 34 ++++ arch/arm/include/asm/outercache.h | 22 +++ arch/arm/include/asm/smp_scu.h | 3 +- arch/arm/include/asm/sr_platform_api.h | 28 +++ arch/arm/kernel/Makefile | 5 + arch/arm/kernel/cpu_pm.c | 265 ++++++++++++++++++++++++++++ arch/arm/kernel/lb_lock.c | 85 +++++++++ arch/arm/kernel/perf_event.c | 22 +++ arch/arm/kernel/reset_v7.S | 109 ++++++++++++ arch/arm/kernel/smp_scu.c | 33 ++++- arch/arm/kernel/sr.h | 162 +++++++++++++++++ arch/arm/kernel/sr_api.c | 197 +++++++++++++++++++++ arch/arm/kernel/sr_arch.c | 74 ++++++++ arch/arm/kernel/sr_context.c | 23 +++ arch/arm/kernel/sr_entry.S | 213 +++++++++++++++++++++++ arch/arm/kernel/sr_helpers.h | 56 ++++++ arch/arm/kernel/sr_mapping.c | 78 +++++++++ arch/arm/kernel/sr_platform.c | 48 +++++ arch/arm/kernel/sr_power.c | 26 +++ arch/arm/kernel/sr_v7.c | 298 ++++++++++++++++++++++++++++++++ arch/arm/kernel/sr_v7_helpers.S | 47 +++++ arch/arm/mm/cache-l2x0.c | 63 +++++++ arch/arm/mm/proc-v6.S | 5 + arch/arm/mm/proc-v7.S | 7 + arch/arm/vfp/vfpmodule.c | 40 +++++ 28 files changed, 2238 insertions(+), 4 deletions(-) create mode 100644 arch/arm/include/asm/cpu_pm.h create mode 100644 arch/arm/include/asm/lb_lock.h create mode 100644 arch/arm/include/asm/sr_platform_api.h create mode 100644 arch/arm/kernel/cpu_pm.c create mode 100644 arch/arm/kernel/lb_lock.c create mode 100644 arch/arm/kernel/reset_v7.S create mode 100644 arch/arm/kernel/sr.h create mode 100644 arch/arm/kernel/sr_api.c create mode 100644 arch/arm/kernel/sr_arch.c create mode 100644 arch/arm/kernel/sr_context.c create mode 100644 arch/arm/kernel/sr_entry.S create mode 100644 arch/arm/kernel/sr_helpers.h create mode 100644 arch/arm/kernel/sr_mapping.c create mode 100644 arch/arm/kernel/sr_platform.c create mode 100644 arch/arm/kernel/sr_power.c create mode 100644 arch/arm/kernel/sr_v7.c create mode 100644 arch/arm/kernel/sr_v7_helpers.S
From: Will Deacon will.deacon@arm.com
This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7 cores, which disable the MMU via the SCTLR.
Signed-off-by: Will Deacon will.deacon@arm.com --- arch/arm/mm/proc-v6.S | 5 +++++ arch/arm/mm/proc-v7.S | 7 +++++++ 2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S index 1d2b845..f3b5232 100644 --- a/arch/arm/mm/proc-v6.S +++ b/arch/arm/mm/proc-v6.S @@ -56,6 +56,11 @@ ENTRY(cpu_v6_proc_fin) */ .align 5 ENTRY(cpu_v6_reset) + mrc p15, 0, r1, c1, c0, 0 @ ctrl register + bic r1, r1, #0x1 @ ...............m + mcr p15, 0, r1, c1, c0, 0 @ disable MMU + mov r1, #0 + mcr p15, 0, r1, c7, c5, 4 @ ISB mov pc, r0
/* diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 089c0b5..15d6191 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -58,9 +58,16 @@ ENDPROC(cpu_v7_proc_fin) * to what would be the reset vector. * * - loc - location to jump to for soft reset + * + * This code must be executed using a flat identity mapping with + * caches disabled. */ .align 5 ENTRY(cpu_v7_reset) + mrc p15, 0, r1, c1, c0, 0 @ ctrl register + bic r1, r1, #0x1 @ ...............m + mcr p15, 0, r1, c1, c0, 0 @ disable MMU + isb mov pc, r0 ENDPROC(cpu_v7_reset)
Minor nit,
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
From: Will Deaconwill.deacon@arm.com
This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7 cores, which disable the MMU via the SCTLR.
Signed-off-by: Will Deaconwill.deacon@arm.com
arch/arm/mm/proc-v6.S | 5 +++++ arch/arm/mm/proc-v7.S | 7 +++++++ 2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S index 1d2b845..f3b5232 100644 --- a/arch/arm/mm/proc-v6.S +++ b/arch/arm/mm/proc-v6.S @@ -56,6 +56,11 @@ ENTRY(cpu_v6_proc_fin) */ .align 5 ENTRY(cpu_v6_reset)
mrc p15, 0, r1, c1, c0, 0 @ ctrl register
bic r1, r1, #0x1 @ ...............m
mcr p15, 0, r1, c1, c0, 0 @ disable MMU
mov r1, #0
mcr p15, 0, r1, c7, c5, 4 @ ISB mov pc, r0
/*
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 089c0b5..15d6191 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -58,9 +58,16 @@ ENDPROC(cpu_v7_proc_fin)
- to what would be the reset vector.
- loc - location to jump to for soft reset
- This code must be executed using a flat identity mapping with
caches disabled.
Align the text body.
Regards Santosh
On Fri, Jul 08, 2011 at 02:12:13AM +0100, Santosh Shilimkar wrote:
Minor nit,
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
From: Will Deaconwill.deacon@arm.com
This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7 cores, which disable the MMU via the SCTLR.
Signed-off-by: Will Deaconwill.deacon@arm.com
arch/arm/mm/proc-v6.S | 5 +++++ arch/arm/mm/proc-v7.S | 7 +++++++ 2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S index 1d2b845..f3b5232 100644 --- a/arch/arm/mm/proc-v6.S +++ b/arch/arm/mm/proc-v6.S @@ -56,6 +56,11 @@ ENTRY(cpu_v6_proc_fin) */ .align 5 ENTRY(cpu_v6_reset)
mrc p15, 0, r1, c1, c0, 0 @ ctrl register
bic r1, r1, #0x1 @ ...............m
mcr p15, 0, r1, c1, c0, 0 @ disable MMU
mov r1, #0
mcr p15, 0, r1, c7, c5, 4 @ ISB mov pc, r0
/*
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index 089c0b5..15d6191 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -58,9 +58,16 @@ ENDPROC(cpu_v7_proc_fin)
- to what would be the reset vector.
- loc - location to jump to for soft reset
- This code must be executed using a flat identity mapping with
caches disabled.
Align the text body.
Bah, I somehow ended up with spaces instead of a tab. I've already sent a pull request with this patch in, so I'll leave it like it is but make a note to fix it next time I'm in there.
Will
On Thu, Jul 07, 2011 at 04:50:14PM +0100, Lorenzo Pieralisi wrote:
From: Will Deacon will.deacon@arm.com
This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7 cores, which disable the MMU via the SCTLR.
This really needs fixing properly, so that we have this well defined across all supported ARM cores. Requiring ARMv6 and ARMv7 to have this code called with a flat mapping (which may overlap a section boundary) vs ARMv5 and lower code which doesn't is just silly.
With any API, we need consistency. So if ARMv6 and v7 require a flat mapping, we need to ensure that ARMv5 and lower is happy to deal with that code being also called with a flat mapping.
Hi Russell,
On Sat, Jul 09, 2011 at 11:14:45AM +0100, Russell King - ARM Linux wrote:
On Thu, Jul 07, 2011 at 04:50:14PM +0100, Lorenzo Pieralisi wrote:
From: Will Deacon will.deacon@arm.com
This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7 cores, which disable the MMU via the SCTLR.
This really needs fixing properly, so that we have this well defined across all supported ARM cores. Requiring ARMv6 and ARMv7 to have this code called with a flat mapping (which may overlap a section boundary) vs ARMv5 and lower code which doesn't is just silly.
With any API, we need consistency. So if ARMv6 and v7 require a flat mapping, we need to ensure that ARMv5 and lower is happy to deal with that code being also called with a flat mapping.
I've had a look at a bunch of the cpu_*_reset definitions and I can't see any reason why they wouldn't be callable with the flat mapping in place. In fact, there's a scary comment for xscale:
@ CAUTION: MMU turned off from this point. We count on the pipeline @ already containing those two last instructions to survive.
which I think would disappear if the code was called via the ID map.
At the moment, the only caller [1] of these functions is arch_reset which is called from arm_machine_restart after putting a flat mapping in place. The extra work is actually to call the reset code via that mapping. I've been working on this in my kexec series, which I'll continue with during 3.1.
Will
[1] plat-s3c24xx/cpu.c is an exception, but cpu_reset is only entered if the watchdog fails during hard reboot. The logic could be easily fixed up here so that arm_machine_restart is called instead.
On Sun, Jul 10, 2011 at 12:00:24PM +0100, Will Deacon wrote:
I've had a look at a bunch of the cpu_*_reset definitions and I can't see any reason why they wouldn't be callable with the flat mapping in place. In fact, there's a scary comment for xscale:
However, that flat mapping doesn't save us, because this only covers space below PAGE_OFFSET virtual. We're executing these generally from virtual space at addresses greater than this, which means when the MMU is turned off, the instruction stream disappears.
@ CAUTION: MMU turned off from this point. We count on the pipeline @ already containing those two last instructions to survive.
which I think would disappear if the code was called via the ID map.
Yes, and everything pre-ARMv6 fundamentally relies upon the instructions following the MCR to turn the MMU off already being in the CPUs pipeline. Pre-ARMv6 relies upon this kind of behaviour from the CPU:
fetch decode execute mcr mov pc mcr nop mov pc mcr nop nop mov pc <inst0> <--- flushed --->
where inst0 is the instruction at the target of the "mov pc" branch. May not be well documented in the architecture manual, but there have been documentation of this level against CPUs in the past. Maybe back in ARMv3 times, but the trick has proven to work all the way up to ARMv5, and these are of course CPU specific files rather than architecture specific.
It's curious that with ARMs move to a more relaxed model, that turning the MMU off has visibly changed from a fairly weak operation to an effective strong instruction barrier. These now have visibly this behaviour:
fetch decode execute mcr mov pc mcr <inst0> mov pc mcr <------ flushed ------> mov pc* * - attempt to reload this instruction fails because MMU is now off.
I'm not saying there that the pipeline is flushed (it may be) - but this represents the _observed_ behaviour.
On Sun, Jul 10, 2011 at 12:52:06PM +0100, Russell King - ARM Linux wrote:
On Sun, Jul 10, 2011 at 12:00:24PM +0100, Will Deacon wrote:
I've had a look at a bunch of the cpu_*_reset definitions and I can't see any reason why they wouldn't be callable with the flat mapping in place. In fact, there's a scary comment for xscale:
However, that flat mapping doesn't save us, because this only covers space below PAGE_OFFSET virtual. We're executing these generally from virtual space at addresses greater than this, which means when the MMU is turned off, the instruction stream disappears.
Yes, although I have fixed this in my kexec branch. It's still very much a WIP, so feel free to comment when I post the next revision of that patch series.
@ CAUTION: MMU turned off from this point. We count on the pipeline @ already containing those two last instructions to survive.
which I think would disappear if the code was called via the ID map.
Yes, and everything pre-ARMv6 fundamentally relies upon the instructions following the MCR to turn the MMU off already being in the CPUs pipeline. Pre-ARMv6 relies upon this kind of behaviour from the CPU:
fetch decode execute mcr mov pc mcr nop mov pc mcr nop nop mov pc <inst0> <--- flushed --->
where inst0 is the instruction at the target of the "mov pc" branch. May not be well documented in the architecture manual, but there have been documentation of this level against CPUs in the past. Maybe back in ARMv3 times, but the trick has proven to work all the way up to ARMv5, and these are of course CPU specific files rather than architecture specific.
With more recent (>= ARMv6) CPUs, this really is implementation-specific behaviour and, as such, is not documented by the architecture. Instead you get:
`In addition, if the physical address of the code that enables or disables the MMU differs from its MVA, instruction prefetching can cause complications. Therefore, ARM strongly recommends that any code that enables or disables the MMU has identical virtual and physical addresses.'
This has the advantage of working across all CPU implementations, which is what we need for having architectural proc-vN.S files.
It's curious that with ARMs move to a more relaxed model, that turning the MMU off has visibly changed from a fairly weak operation to an effective strong instruction barrier. These now have visibly this behaviour:
fetch decode execute mcr mov pc mcr <inst0> mov pc mcr <------ flushed ------> mov pc*
- attempt to reload this instruction fails because MMU is now off.
I'm not saying there that the pipeline is flushed (it may be) - but this represents the _observed_ behaviour.
Sure. The new definitions for v6/v7 reset have an instruction barrier following the SCTLR write anyway since the architecture doesn't make any guarantees about immediacy of MMU disabling. The advantage is that it will work for all v6/v7 implementations. The disadvantage is that we're effectively coding for the worst-case scenario.
Will
From: Colin Cross ccross@android.com
During some CPU power modes entered during idle, hotplug and suspend, peripherals located in the CPU power domain, such as the GIC and VFP, may be powered down. Add a notifier chain that allows drivers for those peripherals to be notified before and after they may be reset.
Signed-off-by: Colin Cross ccross@android.com Tested-by: Kevin Hilman khilman@ti.com --- arch/arm/Kconfig | 7 ++ arch/arm/include/asm/cpu_pm.h | 54 ++++++++++++ arch/arm/kernel/Makefile | 1 + arch/arm/kernel/cpu_pm.c | 181 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 243 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/cpu_pm.h create mode 100644 arch/arm/kernel/cpu_pm.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 9adc278..356f266 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -183,6 +183,13 @@ config FIQ config ARCH_MTD_XIP bool
+config ARCH_USES_CPU_PM + bool + +config CPU_PM + def_bool y + depends on ARCH_USES_CPU_PM && (PM || CPU_IDLE) + config VECTORS_BASE hex default 0xffff0000 if MMU || CPU_HIGH_VECTOR diff --git a/arch/arm/include/asm/cpu_pm.h b/arch/arm/include/asm/cpu_pm.h new file mode 100644 index 0000000..b4bb715 --- /dev/null +++ b/arch/arm/include/asm/cpu_pm.h @@ -0,0 +1,54 @@ +/* + * Copyright (C) 2011 Google, Inc. + * + * Author: + * Colin Cross ccross@android.com + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _ASMARM_CPU_PM_H +#define _ASMARM_CPU_PM_H + +#include <linux/kernel.h> +#include <linux/notifier.h> + +/* Event codes passed as unsigned long val to notifier calls */ +enum cpu_pm_event { + /* A single cpu is entering a low power state */ + CPU_PM_ENTER, + + /* A single cpu failed to enter a low power state */ + CPU_PM_ENTER_FAILED, + + /* A single cpu is exiting a low power state */ + CPU_PM_EXIT, + + /* A cpu power domain is entering a low power state */ + CPU_COMPLEX_PM_ENTER, + + /* A cpu power domain failed to enter a low power state */ + CPU_COMPLEX_PM_ENTER_FAILED, + + /* A cpu power domain is exiting a low power state */ + CPU_COMPLEX_PM_EXIT, +}; + +int cpu_pm_register_notifier(struct notifier_block *nb); +int cpu_pm_unregister_notifier(struct notifier_block *nb); + +int cpu_pm_enter(void); +int cpu_pm_exit(void); + +int cpu_complex_pm_enter(void); +int cpu_complex_pm_exit(void); + +#endif diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index a5b31af..8b42d58 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -60,6 +60,7 @@ obj-$(CONFIG_CPU_PJ4) += pj4-cp0.o obj-$(CONFIG_IWMMXT) += iwmmxt.o obj-$(CONFIG_CPU_HAS_PMU) += pmu.o obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o +obj-$(CONFIG_CPU_PM) += cpu_pm.o AFLAGS_iwmmxt.o := -Wa,-mcpu=iwmmxt
ifneq ($(CONFIG_ARCH_EBSA110),y) diff --git a/arch/arm/kernel/cpu_pm.c b/arch/arm/kernel/cpu_pm.c new file mode 100644 index 0000000..48a5b53 --- /dev/null +++ b/arch/arm/kernel/cpu_pm.c @@ -0,0 +1,181 @@ +/* + * Copyright (C) 2011 Google, Inc. + * + * Author: + * Colin Cross ccross@android.com + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/notifier.h> +#include <linux/spinlock.h> + +#include <asm/cpu_pm.h> + +/* + * When a CPU goes to a low power state that turns off power to the CPU's + * power domain, the contents of some blocks (floating point coprocessors, + * interrutp controllers, caches, timers) in the same power domain can + * be lost. The cpm_pm notifiers provide a method for platform idle, suspend, + * and hotplug implementations to notify the drivers for these blocks that + * they may be reset. + * + * All cpu_pm notifications must be called with interrupts disabled. + * + * The notifications are split into two classes, CPU notifications and CPU + * complex notifications. + * + * CPU notifications apply to a single CPU, and must be called on the affected + * CPU. They are used to save per-cpu context for affected blocks. + * + * CPU complex notifications apply to all CPUs in a single power domain. They + * are used to save any global context for affected blocks, and must be called + * after all the CPUs in the power domain have been notified of the low power + * state. + * + */ + +static DEFINE_RWLOCK(cpu_pm_notifier_lock); +static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain); + +int cpu_pm_register_notifier(struct notifier_block *nb) +{ + unsigned long flags; + int ret; + + write_lock_irqsave(&cpu_pm_notifier_lock, flags); + ret = raw_notifier_chain_register(&cpu_pm_notifier_chain, nb); + write_unlock_irqrestore(&cpu_pm_notifier_lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(cpu_pm_register_notifier); + +int cpu_pm_unregister_notifier(struct notifier_block *nb) +{ + unsigned long flags; + int ret; + + write_lock_irqsave(&cpu_pm_notifier_lock, flags); + ret = raw_notifier_chain_unregister(&cpu_pm_notifier_chain, nb); + write_unlock_irqrestore(&cpu_pm_notifier_lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier); + +static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls) +{ + int ret; + + ret = __raw_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL, + nr_to_call, nr_calls); + + return notifier_to_errno(ret); +} + +/** + * cpm_pm_enter + * + * Notifies listeners that a single cpu is entering a low power state that may + * cause some blocks in the same power domain as the cpu to reset. + * + * Must be called on the affected cpu with interrupts disabled. Platform is + * responsible for ensuring that cpu_pm_enter is not called twice on the same + * cpu before cpu_pm_exit is called. + */ +int cpu_pm_enter(void) +{ + int nr_calls; + int ret = 0; + + read_lock(&cpu_pm_notifier_lock); + ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls); + if (ret) + cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL); + read_unlock(&cpu_pm_notifier_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(cpu_pm_enter); + +/** + * cpm_pm_exit + * + * Notifies listeners that a single cpu is exiting a low power state that may + * have caused some blocks in the same power domain as the cpu to reset. + * + * Must be called on the affected cpu with interrupts disabled. + */ +int cpu_pm_exit(void) +{ + int ret; + + read_lock(&cpu_pm_notifier_lock); + ret = cpu_pm_notify(CPU_PM_EXIT, -1, NULL); + read_unlock(&cpu_pm_notifier_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(cpu_pm_exit); + +/** + * cpm_complex_pm_enter + * + * Notifies listeners that all cpus in a power domain are entering a low power + * state that may cause some blocks in the same power domain to reset. + * + * Must be called after cpu_pm_enter has been called on all cpus in the power + * domain, and before cpu_pm_exit has been called on any cpu in the power + * domain. + * + * Must be called with interrupts disabled. + */ +int cpu_complex_pm_enter(void) +{ + int nr_calls; + int ret = 0; + + read_lock(&cpu_pm_notifier_lock); + ret = cpu_pm_notify(CPU_COMPLEX_PM_ENTER, -1, &nr_calls); + if (ret) + cpu_pm_notify(CPU_COMPLEX_PM_ENTER_FAILED, nr_calls - 1, NULL); + read_unlock(&cpu_pm_notifier_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(cpu_complex_pm_enter); + +/** + * cpm_pm_enter + * + * Notifies listeners that a single cpu is entering a low power state that may + * cause some blocks in the same power domain as the cpu to reset. + * + * Must be called after cpu_pm_enter has been called on all cpus in the power + * domain, and before cpu_pm_exit has been called on any cpu in the power + * domain. + * + * Must be called with interrupts disabled. + */ +int cpu_complex_pm_exit(void) +{ + int ret; + + read_lock(&cpu_pm_notifier_lock); + ret = cpu_pm_notify(CPU_COMPLEX_PM_EXIT, -1, NULL); + read_unlock(&cpu_pm_notifier_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(cpu_complex_pm_exit);
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
During some CPU power modes entered during idle, hotplug and suspend, peripherals located in the CPU power domain, such as the GIC and VFP, may be powered down. Add a notifier chain that allows drivers for those peripherals to be notified before and after they may be reset.
Signed-off-by: Colin Crossccross@android.com Tested-by: Kevin Hilmankhilman@ti.com
arch/arm/Kconfig | 7 ++ arch/arm/include/asm/cpu_pm.h | 54 ++++++++++++ arch/arm/kernel/Makefile | 1 + arch/arm/kernel/cpu_pm.c | 181 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 243 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/cpu_pm.h create mode 100644 arch/arm/kernel/cpu_pm.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 9adc278..356f266 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -183,6 +183,13 @@ config FIQ config ARCH_MTD_XIP bool
+config ARCH_USES_CPU_PM
bool
+config CPU_PM
def_bool y
depends on ARCH_USES_CPU_PM&& (PM || CPU_IDLE)
- config VECTORS_BASE hex default 0xffff0000 if MMU || CPU_HIGH_VECTOR
diff --git a/arch/arm/include/asm/cpu_pm.h b/arch/arm/include/asm/cpu_pm.h new file mode 100644 index 0000000..b4bb715 --- /dev/null +++ b/arch/arm/include/asm/cpu_pm.h @@ -0,0 +1,54 @@ +/*
- Copyright (C) 2011 Google, Inc.
- Author:
- Colin Crossccross@android.com
- This software is licensed under the terms of the GNU General Public
- License version 2, as published by the Free Software Foundation, and
- may be copied, distributed, and modified under those terms.
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- */
+#ifndef _ASMARM_CPU_PM_H +#define _ASMARM_CPU_PM_H
+#include<linux/kernel.h> +#include<linux/notifier.h>
+/* Event codes passed as unsigned long val to notifier calls */ +enum cpu_pm_event {
- /* A single cpu is entering a low power state */
- CPU_PM_ENTER,
- /* A single cpu failed to enter a low power state */
- CPU_PM_ENTER_FAILED,
- /* A single cpu is exiting a low power state */
- CPU_PM_EXIT,
- /* A cpu power domain is entering a low power state */
- CPU_COMPLEX_PM_ENTER,
- /* A cpu power domain failed to enter a low power state */
- CPU_COMPLEX_PM_ENTER_FAILED,
- /* A cpu power domain is exiting a low power state */
- CPU_COMPLEX_PM_EXIT,
+};
+int cpu_pm_register_notifier(struct notifier_block *nb); +int cpu_pm_unregister_notifier(struct notifier_block *nb);
+int cpu_pm_enter(void); +int cpu_pm_exit(void);
+int cpu_complex_pm_enter(void); +int cpu_complex_pm_exit(void);
Can "cpu_complex_pm*" renamed to "cpu_cluster_pm*" as discussed earlier on the list.
Regards Santosh
On Thu, Jul 07, 2011 at 04:50:15PM +0100, Lorenzo Pieralisi wrote:
During some CPU power modes entered during idle, hotplug and suspend, peripherals located in the CPU power domain, such as the GIC and VFP, may be powered down. Add a notifier chain that allows drivers for those peripherals to be notified before and after they may be reset.
I defer comment on this until I see what the CPU_COMPLEX_* stuff is used for.
On Sat, Jul 9, 2011 at 3:15 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Thu, Jul 07, 2011 at 04:50:15PM +0100, Lorenzo Pieralisi wrote:
During some CPU power modes entered during idle, hotplug and suspend, peripherals located in the CPU power domain, such as the GIC and VFP, may be powered down. Add a notifier chain that allows drivers for those peripherals to be notified before and after they may be reset.
I defer comment on this until I see what the CPU_COMPLEX_* stuff is used for.
Per previous discussions on this patch, CPU_COMPLEX_* will be renamed CPU_CLUSTER_*, and will refer to all the gunk around the CPU that may be shared by another CPU. In this series, it is only used to save and restore the GIC distributor registers, but it could be used for L2 as well, if any platforms can agree on what needs to happen to the L2, and when.
From: Colin Cross ccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Cross ccross@android.com --- arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c @@ -29,6 +29,7 @@ #include <linux/cpumask.h> #include <linux/io.h>
+#include <asm/cpu_pm.h> #include <asm/irq.h> #include <asm/mach/irq.h> #include <asm/hardware/gic.h> @@ -42,6 +43,17 @@ struct gic_chip_data { unsigned int irq_offset; void __iomem *dist_base; void __iomem *cpu_base; +#ifdef CONFIG_CPU_PM + u32 saved_spi_enable[DIV_ROUND_UP(1020, 32)]; + u32 saved_spi_conf[DIV_ROUND_UP(1020, 16)]; + u32 saved_spi_pri[DIV_ROUND_UP(1020, 4)]; + u32 saved_spi_target[DIV_ROUND_UP(1020, 4)]; + u32 __percpu *saved_ppi_enable; + u32 __percpu *saved_ppi_conf; + u32 __percpu *saved_ppi_pri; +#endif + + unsigned int gic_irqs; };
/* @@ -283,6 +295,8 @@ static void __init gic_dist_init(struct gic_chip_data *gic, if (gic_irqs > 1020) gic_irqs = 1020;
+ gic->gic_irqs = gic_irqs; + /* * Set all global interrupts to be level triggered, active low. */ @@ -350,6 +364,203 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data *gic) writel_relaxed(1, base + GIC_CPU_CTRL); }
+#ifdef CONFIG_CPU_PM +/* + * Saves the GIC distributor registers during suspend or idle. Must be called + * with interrupts disabled but before powering down the GIC. After calling + * this function, no interrupts will be delivered by the GIC, and another + * platform-specific wakeup source must be enabled. + */ +static void gic_dist_save(unsigned int gic_nr) +{ + unsigned int gic_irqs; + void __iomem *dist_base; + int i; + + if (gic_nr >= MAX_GIC_NR) + BUG(); + + gic_irqs = gic_data[gic_nr].gic_irqs; + dist_base = gic_data[gic_nr].dist_base; + + if (!dist_base) + return; + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 16); i++) + gic_data[gic_nr].saved_spi_conf[i] = + readl_relaxed(dist_base + GIC_DIST_CONFIG + i * 4); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++) + gic_data[gic_nr].saved_spi_pri[i] = + readl_relaxed(dist_base + GIC_DIST_PRI + i * 4); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++) + gic_data[gic_nr].saved_spi_target[i] = + readl_relaxed(dist_base + GIC_DIST_TARGET + i * 4); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 32); i++) + gic_data[gic_nr].saved_spi_enable[i] = + readl_relaxed(dist_base + GIC_DIST_ENABLE_SET + i * 4); + + writel_relaxed(0, dist_base + GIC_DIST_CTRL); +} + +/* + * Restores the GIC distributor registers during resume or when coming out of + * idle. Must be called before enabling interrupts. If a level interrupt + * that occured while the GIC was suspended is still present, it will be + * handled normally, but any edge interrupts that occured will not be seen by + * the GIC and need to be handled by the platform-specific wakeup source. + */ +static void gic_dist_restore(unsigned int gic_nr) +{ + unsigned int gic_irqs; + unsigned int i; + void __iomem *dist_base; + + if (gic_nr >= MAX_GIC_NR) + BUG(); + + gic_irqs = gic_data[gic_nr].gic_irqs; + dist_base = gic_data[gic_nr].dist_base; + + if (!dist_base) + return; + + writel_relaxed(0, dist_base + GIC_DIST_CTRL); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 16); i++) + writel_relaxed(gic_data[gic_nr].saved_spi_conf[i], + dist_base + GIC_DIST_CONFIG + i * 4); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++) + writel_relaxed(gic_data[gic_nr].saved_spi_pri[i], + dist_base + GIC_DIST_PRI + i * 4); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++) + writel_relaxed(gic_data[gic_nr].saved_spi_target[i], + dist_base + GIC_DIST_TARGET + i * 4); + + for (i = 0; i < DIV_ROUND_UP(gic_irqs, 32); i++) + writel_relaxed(gic_data[gic_nr].saved_spi_enable[i], + dist_base + GIC_DIST_ENABLE_SET + i * 4); + + writel_relaxed(1, dist_base + GIC_DIST_CTRL); +} + +static void gic_cpu_save(unsigned int gic_nr) +{ + int i; + u32 *ptr; + void __iomem *dist_base; + void __iomem *cpu_base; + + if (gic_nr >= MAX_GIC_NR) + BUG(); + + dist_base = gic_data[gic_nr].dist_base; + cpu_base = gic_data[gic_nr].cpu_base; + + if (!dist_base || !cpu_base) + return; + + ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_enable); + for (i = 0; i < DIV_ROUND_UP(32, 32); i++) + ptr[i] = readl_relaxed(dist_base + GIC_DIST_ENABLE_SET + i * 4); + + ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_conf); + for (i = 0; i < DIV_ROUND_UP(32, 16); i++) + ptr[i] = readl_relaxed(dist_base + GIC_DIST_CONFIG + i * 4); + + ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_pri); + for (i = 0; i < DIV_ROUND_UP(32, 4); i++) + ptr[i] = readl_relaxed(dist_base + GIC_DIST_PRI + i * 4); +} + +static void gic_cpu_restore(unsigned int gic_nr) +{ + int i; + u32 *ptr; + void __iomem *dist_base; + void __iomem *cpu_base; + + if (gic_nr >= MAX_GIC_NR) + BUG(); + + dist_base = gic_data[gic_nr].dist_base; + cpu_base = gic_data[gic_nr].cpu_base; + + if (!dist_base || !cpu_base) + return; + + ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_enable); + for (i = 0; i < DIV_ROUND_UP(32, 32); i++) + writel_relaxed(ptr[i], dist_base + GIC_DIST_ENABLE_SET + i * 4); + + ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_conf); + for (i = 0; i < DIV_ROUND_UP(32, 16); i++) + writel_relaxed(ptr[i], dist_base + GIC_DIST_CONFIG + i * 4); + + ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_pri); + for (i = 0; i < DIV_ROUND_UP(32, 4); i++) + writel_relaxed(ptr[i], dist_base + GIC_DIST_PRI + i * 4); + + writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK); + writel_relaxed(1, cpu_base + GIC_CPU_CTRL); +} + +static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{ + int i; + + for (i = 0; i < MAX_GIC_NR; i++) { + switch (cmd) { + case CPU_PM_ENTER: + gic_cpu_save(i); + break; + case CPU_PM_ENTER_FAILED: + case CPU_PM_EXIT: + gic_cpu_restore(i); + break; + case CPU_COMPLEX_PM_ENTER: + gic_dist_save(i); + break; + case CPU_COMPLEX_PM_ENTER_FAILED: + case CPU_COMPLEX_PM_EXIT: + gic_dist_restore(i); + break; + } + } + + return NOTIFY_OK; +} + +static struct notifier_block gic_notifier_block = { + .notifier_call = gic_notifier, +}; + +static void __init gic_cpu_pm_init(struct gic_chip_data *gic) +{ + gic->saved_ppi_enable = __alloc_percpu(DIV_ROUND_UP(32, 32) * 4, + sizeof(u32)); + BUG_ON(!gic->saved_ppi_enable); + + gic->saved_ppi_conf = __alloc_percpu(DIV_ROUND_UP(32, 16) * 4, + sizeof(u32)); + BUG_ON(!gic->saved_ppi_conf); + + gic->saved_ppi_pri = __alloc_percpu(DIV_ROUND_UP(32, 4) * 4, + sizeof(u32)); + BUG_ON(!gic->saved_ppi_pri); + + cpu_pm_register_notifier(&gic_notifier_block); +} +#else +static void __init gic_cpu_pm_init(struct gic_chip_data *gic) +{ +} +#endif + void __init gic_init(unsigned int gic_nr, unsigned int irq_start, void __iomem *dist_base, void __iomem *cpu_base) { @@ -367,6 +578,7 @@ void __init gic_init(unsigned int gic_nr, unsigned int irq_start,
gic_dist_init(gic, irq_start); gic_cpu_init(gic); + gic_cpu_pm_init(gic); }
void __cpuinit gic_secondary_init(unsigned int gic_nr)
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
break;
case CPU_PM_ENTER_FAILED:
case CPU_PM_EXIT:
gic_cpu_restore(i);
break;
case CPU_COMPLEX_PM_ENTER:
gic_dist_save(i);
break;
case CPU_COMPLEX_PM_ENTER_FAILED:
case CPU_COMPLEX_PM_EXIT:
gic_dist_restore(i);
break;
}
- }
- return NOTIFY_OK;
+}
Just to put forth OMAP requirements for GIC and see how much we can leverage these for OMAP.
OMAP support GP(general) and HS(secure) devices and implements the trustzone on these devices. on Secure devices the GIC save and restore is completely done by secure ROM code. There are API's for save and restore is automatic on CPU reset based on the last CPU cluster state.
On GP devices too, very few GIC registers needs to be saved in a pre-defined memory/register layout and restore is again done by boot-ROM code.
OMAP need to enable/disable distributor and CPU interfaces based on CPU power states and that is something can be useful. Would be good if there is a provision to over-write the gic save/restore function using function pointers so that OMAP PM code can use the notifiers.
Any more thoughts how we can handle this? We would like to use common ARM code as much as possible.
Regards Santosh
On Thu, Jul 7, 2011 at 6:35 PM, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
- switch (cmd) {
- case CPU_PM_ENTER:
- gic_cpu_save(i);
- break;
- case CPU_PM_ENTER_FAILED:
- case CPU_PM_EXIT:
- gic_cpu_restore(i);
- break;
- case CPU_COMPLEX_PM_ENTER:
- gic_dist_save(i);
- break;
- case CPU_COMPLEX_PM_ENTER_FAILED:
- case CPU_COMPLEX_PM_EXIT:
- gic_dist_restore(i);
- break;
- }
- }
- return NOTIFY_OK;
+}
Just to put forth OMAP requirements for GIC and see how much we can leverage these for OMAP.
OMAP support GP(general) and HS(secure) devices and implements the trustzone on these devices. on Secure devices the GIC save and restore is completely done by secure ROM code. There are API's for save and restore is automatic on CPU reset based on the last CPU cluster state.
On GP devices too, very few GIC registers needs to be saved in a pre-defined memory/register layout and restore is again done by boot-ROM code.
OMAP need to enable/disable distributor and CPU interfaces based on CPU power states and that is something can be useful. Would be good if there is a provision to over-write the gic save/restore function using function pointers so that OMAP PM code can use the notifiers.
Any more thoughts how we can handle this? We would like to use common ARM code as much as possible.
Is it strictly necessary to use the custom OMAP save and restore? Anything that was modified by the kernel is obviously writable, and could be saved and restored using the common code. Anything that can only be modified by TrustZone already has to be restored by the custom OMAP code. There aren't many registers in the GIC, so it shouldn't be much of a performance difference.
On 7/7/2011 6:41 PM, Colin Cross wrote:
On Thu, Jul 7, 2011 at 6:35 PM, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
int i;
for (i = 0; i< MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
break;
case CPU_PM_ENTER_FAILED:
case CPU_PM_EXIT:
gic_cpu_restore(i);
break;
case CPU_COMPLEX_PM_ENTER:
gic_dist_save(i);
break;
case CPU_COMPLEX_PM_ENTER_FAILED:
case CPU_COMPLEX_PM_EXIT:
gic_dist_restore(i);
break;
}
}
return NOTIFY_OK;
+}
Just to put forth OMAP requirements for GIC and see how much we can leverage these for OMAP.
OMAP support GP(general) and HS(secure) devices and implements the trustzone on these devices. on Secure devices the GIC save and restore is completely done by secure ROM code. There are API's for save and restore is automatic on CPU reset based on the last CPU cluster state.
On GP devices too, very few GIC registers needs to be saved in a pre-defined memory/register layout and restore is again done by boot-ROM code.
OMAP need to enable/disable distributor and CPU interfaces based on CPU power states and that is something can be useful. Would be good if there is a provision to over-write the gic save/restore function using function pointers so that OMAP PM code can use the notifiers.
Any more thoughts how we can handle this? We would like to use common ARM code as much as possible.
Is it strictly necessary to use the custom OMAP save and restore?
Yes. On secure devices there is no choice.
Anything that was modified by the kernel is obviously writable, and could be saved and restored using the common code. Anything that can only be modified by TrustZone already has to be restored by the custom OMAP code. There aren't many registers in the GIC, so it shouldn't be much of a performance difference.
In that case we will end up doing things two time un-necessary and that's not useful at all. You need to save all those extra cycles to have less latency on C-states. And otherside you can't skip the secure API save otherwise the boot-ROM code will end up re-initializing the GIC and all secure interrupt state will be lost.
From above code, today we use need dist/cpu interface disable/enable functions.
Regards Santosh
Regards Santosh
Santosh Shilimkar wrote:
On 7/7/2011 6:41 PM, Colin Cross wrote:
Hi all,
Now Samsung prepares updating PM on EXYNOS4210 and save/restore GIC is needed for that. Actually, it works fine on EXYNOS4210 with this.
But we have some concern about following.
+/*
- Saves the GIC distributor registers during suspend or idle. Must be
called
- with interrupts disabled but before powering down the GIC. After
calling
- this function, no interrupts will be delivered by the GIC, and another
- platform-specific wakeup source must be enabled.
- */
PMU needs to detect interrupt which can be a wakeup source via GIC before calling WFI. Seems there is no problem but I think that can be a hole during very short time...
So is it needed to disable GIC after saving context?
Thanks.
Best regards, Kgene. -- Kukjin Kim kgene.kim@samsung.com, Senior Engineer, SW Solution Development Team, Samsung Electronics Co., Ltd.
On Thu, Jul 07, 2011 at 04:50:16PM +0100, Lorenzo Pieralisi wrote:
From: Colin Cross ccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Cross ccross@android.com
Lost attributations and original author. This is based in part on code from Gary King, according to patch 6646/1 in the patch system.
Moreover, how now that we have genirq dealing with the suspend/restore issues, how much of this is actually required. And should this be GIC specific or should there be a way of asking genirq to take care of some of this for us?
We need to _reduce_ the amount of code needed to support this stuff, and if core code almost-but-not-quite does what we need then we need to talk to the maintainers of that code to see whether it can be changed.
Because adding 212 lines to save and restore the state of every interrupt controller that we may have just isn't on. We need this properly abstracted and dealt with in a generic way.
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c @@ -29,6 +29,7 @@ #include <linux/cpumask.h> #include <linux/io.h> +#include <asm/cpu_pm.h> #include <asm/irq.h> #include <asm/mach/irq.h> #include <asm/hardware/gic.h> @@ -42,6 +43,17 @@ struct gic_chip_data { unsigned int irq_offset; void __iomem *dist_base; void __iomem *cpu_base; +#ifdef CONFIG_CPU_PM
- u32 saved_spi_enable[DIV_ROUND_UP(1020, 32)];
- u32 saved_spi_conf[DIV_ROUND_UP(1020, 16)];
- u32 saved_spi_pri[DIV_ROUND_UP(1020, 4)];
- u32 saved_spi_target[DIV_ROUND_UP(1020, 4)];
- u32 __percpu *saved_ppi_enable;
- u32 __percpu *saved_ppi_conf;
- u32 __percpu *saved_ppi_pri;
+#endif
- unsigned int gic_irqs;
}; /* @@ -283,6 +295,8 @@ static void __init gic_dist_init(struct gic_chip_data *gic, if (gic_irqs > 1020) gic_irqs = 1020;
- gic->gic_irqs = gic_irqs;
- /*
*/
- Set all global interrupts to be level triggered, active low.
@@ -350,6 +364,203 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data *gic) writel_relaxed(1, base + GIC_CPU_CTRL); } +#ifdef CONFIG_CPU_PM +/*
- Saves the GIC distributor registers during suspend or idle. Must be called
- with interrupts disabled but before powering down the GIC. After calling
- this function, no interrupts will be delivered by the GIC, and another
- platform-specific wakeup source must be enabled.
- */
+static void gic_dist_save(unsigned int gic_nr) +{
- unsigned int gic_irqs;
- void __iomem *dist_base;
- int i;
- if (gic_nr >= MAX_GIC_NR)
BUG();
- gic_irqs = gic_data[gic_nr].gic_irqs;
- dist_base = gic_data[gic_nr].dist_base;
- if (!dist_base)
return;
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 16); i++)
gic_data[gic_nr].saved_spi_conf[i] =
readl_relaxed(dist_base + GIC_DIST_CONFIG + i * 4);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++)
gic_data[gic_nr].saved_spi_pri[i] =
readl_relaxed(dist_base + GIC_DIST_PRI + i * 4);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++)
gic_data[gic_nr].saved_spi_target[i] =
readl_relaxed(dist_base + GIC_DIST_TARGET + i * 4);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 32); i++)
gic_data[gic_nr].saved_spi_enable[i] =
readl_relaxed(dist_base + GIC_DIST_ENABLE_SET + i * 4);
- writel_relaxed(0, dist_base + GIC_DIST_CTRL);
+}
+/*
- Restores the GIC distributor registers during resume or when coming out of
- idle. Must be called before enabling interrupts. If a level interrupt
- that occured while the GIC was suspended is still present, it will be
- handled normally, but any edge interrupts that occured will not be seen by
- the GIC and need to be handled by the platform-specific wakeup source.
- */
+static void gic_dist_restore(unsigned int gic_nr) +{
- unsigned int gic_irqs;
- unsigned int i;
- void __iomem *dist_base;
- if (gic_nr >= MAX_GIC_NR)
BUG();
- gic_irqs = gic_data[gic_nr].gic_irqs;
- dist_base = gic_data[gic_nr].dist_base;
- if (!dist_base)
return;
- writel_relaxed(0, dist_base + GIC_DIST_CTRL);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 16); i++)
writel_relaxed(gic_data[gic_nr].saved_spi_conf[i],
dist_base + GIC_DIST_CONFIG + i * 4);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++)
writel_relaxed(gic_data[gic_nr].saved_spi_pri[i],
dist_base + GIC_DIST_PRI + i * 4);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 4); i++)
writel_relaxed(gic_data[gic_nr].saved_spi_target[i],
dist_base + GIC_DIST_TARGET + i * 4);
- for (i = 0; i < DIV_ROUND_UP(gic_irqs, 32); i++)
writel_relaxed(gic_data[gic_nr].saved_spi_enable[i],
dist_base + GIC_DIST_ENABLE_SET + i * 4);
- writel_relaxed(1, dist_base + GIC_DIST_CTRL);
+}
+static void gic_cpu_save(unsigned int gic_nr) +{
- int i;
- u32 *ptr;
- void __iomem *dist_base;
- void __iomem *cpu_base;
- if (gic_nr >= MAX_GIC_NR)
BUG();
- dist_base = gic_data[gic_nr].dist_base;
- cpu_base = gic_data[gic_nr].cpu_base;
- if (!dist_base || !cpu_base)
return;
- ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_enable);
- for (i = 0; i < DIV_ROUND_UP(32, 32); i++)
ptr[i] = readl_relaxed(dist_base + GIC_DIST_ENABLE_SET + i * 4);
- ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_conf);
- for (i = 0; i < DIV_ROUND_UP(32, 16); i++)
ptr[i] = readl_relaxed(dist_base + GIC_DIST_CONFIG + i * 4);
- ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_pri);
- for (i = 0; i < DIV_ROUND_UP(32, 4); i++)
ptr[i] = readl_relaxed(dist_base + GIC_DIST_PRI + i * 4);
+}
+static void gic_cpu_restore(unsigned int gic_nr) +{
- int i;
- u32 *ptr;
- void __iomem *dist_base;
- void __iomem *cpu_base;
- if (gic_nr >= MAX_GIC_NR)
BUG();
- dist_base = gic_data[gic_nr].dist_base;
- cpu_base = gic_data[gic_nr].cpu_base;
- if (!dist_base || !cpu_base)
return;
- ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_enable);
- for (i = 0; i < DIV_ROUND_UP(32, 32); i++)
writel_relaxed(ptr[i], dist_base + GIC_DIST_ENABLE_SET + i * 4);
- ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_conf);
- for (i = 0; i < DIV_ROUND_UP(32, 16); i++)
writel_relaxed(ptr[i], dist_base + GIC_DIST_CONFIG + i * 4);
- ptr = __this_cpu_ptr(gic_data[gic_nr].saved_ppi_pri);
- for (i = 0; i < DIV_ROUND_UP(32, 4); i++)
writel_relaxed(ptr[i], dist_base + GIC_DIST_PRI + i * 4);
- writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK);
- writel_relaxed(1, cpu_base + GIC_CPU_CTRL);
+}
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i < MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
break;
case CPU_PM_ENTER_FAILED:
case CPU_PM_EXIT:
gic_cpu_restore(i);
break;
case CPU_COMPLEX_PM_ENTER:
gic_dist_save(i);
break;
case CPU_COMPLEX_PM_ENTER_FAILED:
case CPU_COMPLEX_PM_EXIT:
gic_dist_restore(i);
break;
}
- }
- return NOTIFY_OK;
+}
+static struct notifier_block gic_notifier_block = {
- .notifier_call = gic_notifier,
+};
+static void __init gic_cpu_pm_init(struct gic_chip_data *gic) +{
- gic->saved_ppi_enable = __alloc_percpu(DIV_ROUND_UP(32, 32) * 4,
sizeof(u32));
- BUG_ON(!gic->saved_ppi_enable);
- gic->saved_ppi_conf = __alloc_percpu(DIV_ROUND_UP(32, 16) * 4,
sizeof(u32));
- BUG_ON(!gic->saved_ppi_conf);
- gic->saved_ppi_pri = __alloc_percpu(DIV_ROUND_UP(32, 4) * 4,
sizeof(u32));
- BUG_ON(!gic->saved_ppi_pri);
- cpu_pm_register_notifier(&gic_notifier_block);
+} +#else +static void __init gic_cpu_pm_init(struct gic_chip_data *gic) +{ +} +#endif
void __init gic_init(unsigned int gic_nr, unsigned int irq_start, void __iomem *dist_base, void __iomem *cpu_base) { @@ -367,6 +578,7 @@ void __init gic_init(unsigned int gic_nr, unsigned int irq_start, gic_dist_init(gic, irq_start); gic_cpu_init(gic);
- gic_cpu_pm_init(gic);
} void __cpuinit gic_secondary_init(unsigned int gic_nr) -- 1.7.4.4
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Sat, Jul 9, 2011 at 3:21 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Thu, Jul 07, 2011 at 04:50:16PM +0100, Lorenzo Pieralisi wrote:
From: Colin Cross ccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Cross ccross@android.com
Lost attributations and original author. This is based in part on code from Gary King, according to patch 6646/1 in the patch system.
You're right, I'll make sure the attribution goes back in.
Moreover, how now that we have genirq dealing with the suspend/restore issues, how much of this is actually required. And should this be GIC specific or should there be a way of asking genirq to take care of some of this for us?
We need to _reduce_ the amount of code needed to support this stuff, and if core code almost-but-not-quite does what we need then we need to talk to the maintainers of that code to see whether it can be changed.
Because adding 212 lines to save and restore the state of every interrupt controller that we may have just isn't on. We need this properly abstracted and dealt with in a generic way.
This is necessary for cpuidle states that lose the GIC registers, not just suspend, because the GIC is in the cpu's power domain. We could avoid saving and restoring all the GIC registers in suspend and idle by reusing the initialization functions, and then having the core irq code call the unmask, set_type, and set_affinity functions on each irq to reconfigure it, but that will be very inefficient - it will convert each register write in the restore functions to a read-modify-write per interrupt in that register. Santosh is already complaining that this commong GIC restore code will be slower than the automatic DMA to restore the GIC registers that OMAP4 supports.
On Sat, Jul 09, 2011 at 03:10:56PM -0700, Colin Cross wrote:
This is necessary for cpuidle states that lose the GIC registers, not just suspend, because the GIC is in the cpu's power domain. We could avoid saving and restoring all the GIC registers in suspend and idle by reusing the initialization functions, and then having the core irq code call the unmask, set_type, and set_affinity functions on each irq to reconfigure it, but that will be very inefficient - it will convert each register write in the restore functions to a read-modify-write per interrupt in that register. Santosh is already complaining that this commong GIC restore code will be slower than the automatic DMA to restore the GIC registers that OMAP4 supports.
Well, we need to come up with something sensible - a way of doing this which doesn't require every interrupt controller driver (of which we as an architecture have many) to have lots of support added.
If the current way is inefficient and is noticably so, then let's talk to Thomas about finding a way around that - maybe having the generic code make one suspend/resume callback per irq gc chip rather than doing it per-IRQ. We can then reuse the same paths for suspend/resume as for idle state saving.
On Sat, Jul 9, 2011 at 3:33 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Sat, Jul 09, 2011 at 03:10:56PM -0700, Colin Cross wrote:
This is necessary for cpuidle states that lose the GIC registers, not just suspend, because the GIC is in the cpu's power domain. We could avoid saving and restoring all the GIC registers in suspend and idle by reusing the initialization functions, and then having the core irq code call the unmask, set_type, and set_affinity functions on each irq to reconfigure it, but that will be very inefficient - it will convert each register write in the restore functions to a read-modify-write per interrupt in that register. Santosh is already complaining that this commong GIC restore code will be slower than the automatic DMA to restore the GIC registers that OMAP4 supports.
Well, we need to come up with something sensible - a way of doing this which doesn't require every interrupt controller driver (of which we as an architecture have many) to have lots of support added.
If the current way is inefficient and is noticably so, then let's talk to Thomas about finding a way around that - maybe having the generic code make one suspend/resume callback per irq gc chip rather than doing it per-IRQ. We can then reuse the same paths for suspend/resume as for idle state saving.
Are you referring to moving the gic driver to be gc chip? Otherwise, I don't understand your suggestion - how is callback per chip any different than what this patch implements? It just gets it's notification through a cpu_pm notifier, which works in idle and suspend, instead of a syscore op like the gc driver does.
This patch does save and restore some registers that are never modified after init, so they don't need to be saved.
On Sat, Jul 09, 2011 at 04:01:19PM -0700, Colin Cross wrote:
On Sat, Jul 9, 2011 at 3:33 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Sat, Jul 09, 2011 at 03:10:56PM -0700, Colin Cross wrote:
This is necessary for cpuidle states that lose the GIC registers, not just suspend, because the GIC is in the cpu's power domain. We could avoid saving and restoring all the GIC registers in suspend and idle by reusing the initialization functions, and then having the core irq code call the unmask, set_type, and set_affinity functions on each irq to reconfigure it, but that will be very inefficient - it will convert each register write in the restore functions to a read-modify-write per interrupt in that register. Santosh is already complaining that this commong GIC restore code will be slower than the automatic DMA to restore the GIC registers that OMAP4 supports.
Well, we need to come up with something sensible - a way of doing this which doesn't require every interrupt controller driver (of which we as an architecture have many) to have lots of support added.
If the current way is inefficient and is noticably so, then let's talk to Thomas about finding a way around that - maybe having the generic code make one suspend/resume callback per irq gc chip rather than doing it per-IRQ. We can then reuse the same paths for suspend/resume as for idle state saving.
Are you referring to moving the gic driver to be gc chip? Otherwise, I don't understand your suggestion - how is callback per chip any different than what this patch implements? It just gets it's notification through a cpu_pm notifier, which works in idle and suspend, instead of a syscore op like the gc driver does.
This patch does save and restore some registers that are never modified after init, so they don't need to be saved.
The point is that we should aim to get to the point where, if an interrupt controller supports PM, then it supports _all_ PM out the box and doesn't require additional code for cpu idle PM vs system suspend PM.
In other words, all we should need to do is provide genirq with a couple of functions for 'save state' and 'restore state'.
On Sat, Jul 9, 2011 at 4:05 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Sat, Jul 09, 2011 at 04:01:19PM -0700, Colin Cross wrote:
On Sat, Jul 9, 2011 at 3:33 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Sat, Jul 09, 2011 at 03:10:56PM -0700, Colin Cross wrote:
This is necessary for cpuidle states that lose the GIC registers, not just suspend, because the GIC is in the cpu's power domain. We could avoid saving and restoring all the GIC registers in suspend and idle by reusing the initialization functions, and then having the core irq code call the unmask, set_type, and set_affinity functions on each irq to reconfigure it, but that will be very inefficient - it will convert each register write in the restore functions to a read-modify-write per interrupt in that register. Santosh is already complaining that this commong GIC restore code will be slower than the automatic DMA to restore the GIC registers that OMAP4 supports.
Well, we need to come up with something sensible - a way of doing this which doesn't require every interrupt controller driver (of which we as an architecture have many) to have lots of support added.
If the current way is inefficient and is noticably so, then let's talk to Thomas about finding a way around that - maybe having the generic code make one suspend/resume callback per irq gc chip rather than doing it per-IRQ. We can then reuse the same paths for suspend/resume as for idle state saving.
Are you referring to moving the gic driver to be gc chip? Otherwise, I don't understand your suggestion - how is callback per chip any different than what this patch implements? It just gets it's notification through a cpu_pm notifier, which works in idle and suspend, instead of a syscore op like the gc driver does.
This patch does save and restore some registers that are never modified after init, so they don't need to be saved.
The point is that we should aim to get to the point where, if an interrupt controller supports PM, then it supports _all_ PM out the box and doesn't require additional code for cpu idle PM vs system suspend PM.
I agree 100%, and everything added in this patch is used for both idle and suspend on Tegra, through a single entry point - cpu_pm notifiers.
In other words, all we should need to do is provide genirq with a couple of functions for 'save state' and 'restore state'.
It's not so simple.
genirq doesn't know anything about idle. The PM states in idle are very SoC specific, so the SoC idle code would need to tell genirq that the irq chip is going idle - using something like cpu_pm notiifers. genirq would then just call the 'save state' and 'restore state' functions, so what's the point?
The gic is very tightly bound to both a cpu and a cpu cluster. There are parts (the gic cpu interface) that must be saved and restored when a single cpu powers down, and can only be accessed from that cpu. Then there are parts (the gic distributor interface) that can only be saved and restored when all cpus in a cluster power down, as well as the cluster itself. And then to make things more complicated, there are per-cpu banked registers in the gic distributor, so no single cpu can save and restore the entire gic distributor. A single pair of save restore functions is not sufficient for the gic, and putting the complexity of save and restore for the gic into genirq, when all it would be doing is passing through to the gic driver, seems unnecessary.
Since the SoC cpu idle and suspend code generally ends up in the same final function, it would call the same cpu_pm notifier for both idle and suspend, and there is no duplication of code.
On 7/9/2011 4:05 PM, Russell King - ARM Linux wrote:
On Sat, Jul 09, 2011 at 04:01:19PM -0700, Colin Cross wrote:
On Sat, Jul 9, 2011 at 3:33 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Sat, Jul 09, 2011 at 03:10:56PM -0700, Colin Cross wrote:
This is necessary for cpuidle states that lose the GIC registers, not just suspend, because the GIC is in the cpu's power domain. We could avoid saving and restoring all the GIC registers in suspend and idle by reusing the initialization functions, and then having the core irq code call the unmask, set_type, and set_affinity functions on each irq to reconfigure it, but that will be very inefficient - it will convert each register write in the restore functions to a read-modify-write per interrupt in that register. Santosh is already complaining that this commong GIC restore code will be slower than the automatic DMA to restore the GIC registers that OMAP4 supports.
Well, we need to come up with something sensible - a way of doing this which doesn't require every interrupt controller driver (of which we as an architecture have many) to have lots of support added.
If the current way is inefficient and is noticably so, then let's talk to Thomas about finding a way around that - maybe having the generic code make one suspend/resume callback per irq gc chip rather than doing it per-IRQ. We can then reuse the same paths for suspend/resume as for idle state saving.
Are you referring to moving the gic driver to be gc chip? Otherwise, I don't understand your suggestion - how is callback per chip any different than what this patch implements? It just gets it's notification through a cpu_pm notifier, which works in idle and suspend, instead of a syscore op like the gc driver does.
This patch does save and restore some registers that are never modified after init, so they don't need to be saved.
The point is that we should aim to get to the point where, if an interrupt controller supports PM, then it supports _all_ PM out the box and doesn't require additional code for cpu idle PM vs system suspend PM.
In other words, all we should need to do is provide genirq with a couple of functions for 'save state' and 'restore state'.
Agreed. But how generic irq code will know about when the interrupt controller looses it's context without notifiers. This still depend on the SOC power domain partitions and hence platform code At least for the GIC case, it seems depends on the CPU cluster low power state.
Regards Santosh
Lorenzo, Colin,
On 7/7/2011 9:20 PM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
I missed one more comment in the last review.
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
On OMAP, GIC cpu interface context is lost only when CPU cluster is powered down.
break;
case CPU_PM_ENTER_FAILED:
case CPU_PM_EXIT:
gic_cpu_restore(i);
break;
case CPU_COMPLEX_PM_ENTER:
gic_dist_save(i);
break;
case CPU_COMPLEX_PM_ENTER_FAILED:
case CPU_COMPLEX_PM_EXIT:
gic_dist_restore(i);
break;
}
- }
- return NOTIFY_OK;
+}
Entire GIC is kept in CPU cluster power domain and hence GIC CPU interface context won't be lost whenever CPU alone enters in the deepest power state.
If it is different on other SOC's then the common notifiers won't match to all the SOC designs.
Looks like exporting these functions directly or adding them to gen irq and then invoking them from platform code based on power sequence need might be better.
Regards Santosh
On Thu, Jul 21, 2011 at 09:32:12AM +0100, Santosh Shilimkar wrote:
Lorenzo, Colin,
On 7/7/2011 9:20 PM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
I missed one more comment in the last review.
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
On OMAP, GIC cpu interface context is lost only when CPU cluster is powered down.
Yes, it's true, but that's the only chance we have to save the GIC CPU IF state if the GIC context is lost, right ? It is a private memory map per processor; I agree, it might be useless if just one CPU is shutdown, but at that point in time you do not know the state of other CPUs. If the cluster moves to a state where GIC context is lost at least you had the GIC CPU IF state saved. If we do not save it, well, there is no way to do that anymore since the last CPU cannot access other CPUs GIC CPU IF registers (or better, banked GIC distributor registers). If you force hotplug on CPUs other than 0 (that's the way it is done on OMAP4 in cpuidle, right ?) to hit deep low-power states you reinit the GIC CPU IF state as per cold boot, so yes, it is useless there.
break;
case CPU_PM_ENTER_FAILED:
case CPU_PM_EXIT:
gic_cpu_restore(i);
break;
case CPU_COMPLEX_PM_ENTER:
gic_dist_save(i);
break;
case CPU_COMPLEX_PM_ENTER_FAILED:
case CPU_COMPLEX_PM_EXIT:
gic_dist_restore(i);
break;
}
- }
- return NOTIFY_OK;
+}
Entire GIC is kept in CPU cluster power domain and hence GIC CPU interface context won't be lost whenever CPU alone enters in the deepest power state.
Already commented above, it is the same on the dev board I am using.
If it is different on other SOC's then the common notifiers won't match to all the SOC designs.
Ditto.
Looks like exporting these functions directly or adding them to gen irq and then invoking them from platform code based on power sequence need might be better.
I will have a look into that for the next version.
Thank you very much, Lorenzo
On 7/21/2011 3:57 PM, Lorenzo Pieralisi wrote:
On Thu, Jul 21, 2011 at 09:32:12AM +0100, Santosh Shilimkar wrote:
Lorenzo, Colin,
On 7/7/2011 9:20 PM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
I missed one more comment in the last review.
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
On OMAP, GIC cpu interface context is lost only when CPU cluster is powered down.
Yes, it's true, but that's the only chance we have to save the GIC CPU IF state if the GIC context is lost, right ? It is a private memory map per processor; I agree, it might be useless if just one CPU is shutdown, but at that point in time you do not know the state of other CPUs. If the cluster moves to a state where GIC context is lost at least you had the GIC CPU IF state saved. If we do not save it, well, there is no way to do that anymore since the last CPU cannot access other CPUs GIC CPU IF registers (or better, banked GIC distributor registers). If you force hotplug on CPUs other than 0 (that's the way it is done on OMAP4 in cpuidle, right ?) to hit deep low-power states you reinit the GIC CPU IF state as per cold boot, so yes, it is useless there.
Actually, on OMAP there is no need to save any CPU interface registers.
For my OMAP4 PM rebasing, for time-being I will go with exported GIC functions so that I don't have too many redundancies with GIC save/restore code.
Regards Santosh
On Thu, Jul 21, 2011 at 3:46 AM, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On 7/21/2011 3:57 PM, Lorenzo Pieralisi wrote:
On Thu, Jul 21, 2011 at 09:32:12AM +0100, Santosh Shilimkar wrote:
Lorenzo, Colin,
On 7/7/2011 9:20 PM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
I missed one more comment in the last review.
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
- switch (cmd) {
- case CPU_PM_ENTER:
- gic_cpu_save(i);
On OMAP, GIC cpu interface context is lost only when CPU cluster is powered down.
Yes, it's true, but that's the only chance we have to save the GIC CPU IF state if the GIC context is lost, right ? It is a private memory map per processor; I agree, it might be useless if just one CPU is shutdown, but at that point in time you do not know the state of other CPUs. If the cluster moves to a state where GIC context is lost at least you had the GIC CPU IF state saved. If we do not save it, well, there is no way to do that anymore since the last CPU cannot access other CPUs GIC CPU IF registers (or better, banked GIC distributor registers). If you force hotplug on CPUs other than 0 (that's the way it is done on OMAP4 in cpuidle, right ?) to hit deep low-power states you reinit the GIC CPU IF state as per cold boot, so yes, it is useless there.
Actually, on OMAP there is no need to save any CPU interface registers.
For my OMAP4 PM rebasing, for time-being I will go with exported GIC functions so that I don't have too many redundancies with GIC save/restore code.
I think you should try to balance cpu idle latency with reuse of common code. In this case, you are avoiding restoring 7 registers by reimplementing the bare minimum that is necessary for OMAP4, which is unlikely to make a measurable impact on wakeup latency. Can you try starting with reusing all the common code, and add some timestamps during wakeup to measure where the longest delays are, to determine where you should diverge from the common code and use omap-optimized code?
On 7/22/2011 12:36 AM, Colin Cross wrote:
On Thu, Jul 21, 2011 at 3:46 AM, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On 7/21/2011 3:57 PM, Lorenzo Pieralisi wrote:
On Thu, Jul 21, 2011 at 09:32:12AM +0100, Santosh Shilimkar wrote:
Lorenzo, Colin,
On 7/7/2011 9:20 PM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
I missed one more comment in the last review.
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
int i;
for (i = 0; i< MAX_GIC_NR; i++) {
switch (cmd) {
case CPU_PM_ENTER:
gic_cpu_save(i);
On OMAP, GIC cpu interface context is lost only when CPU cluster is powered down.
Yes, it's true, but that's the only chance we have to save the GIC CPU IF state if the GIC context is lost, right ? It is a private memory map per processor; I agree, it might be useless if just one CPU is shutdown, but at that point in time you do not know the state of other CPUs. If the cluster moves to a state where GIC context is lost at least you had the GIC CPU IF state saved. If we do not save it, well, there is no way to do that anymore since the last CPU cannot access other CPUs GIC CPU IF registers (or better, banked GIC distributor registers). If you force hotplug on CPUs other than 0 (that's the way it is done on OMAP4 in cpuidle, right ?) to hit deep low-power states you reinit the GIC CPU IF state as per cold boot, so yes, it is useless there.
Actually, on OMAP there is no need to save any CPU interface registers.
For my OMAP4 PM rebasing, for time-being I will go with exported GIC functions so that I don't have too many redundancies with GIC save/restore code.
I think you should try to balance cpu idle latency with reuse of common code. In this case, you are avoiding restoring 7 registers by reimplementing the bare minimum that is necessary for OMAP4, which is unlikely to make a measurable impact on wakeup latency. Can you try starting with reusing all the common code, and add some timestamps during wakeup to measure where the longest delays are, to determine where you should diverge from the common code and use omap-optimized code?
I am going to use all the common code but having them exported functions gives more flexibility to call them in right and needed places. As discussed earlier, I plan to use the common GIC code wherever it's needed on OMAP.
My main point was we are saving and restoring GIC CPU interface registers for a case where they are actually not lost.
Regards Santosh
On Thu, Jul 21, 2011 at 10:10 PM, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On 7/22/2011 12:36 AM, Colin Cross wrote:
On Thu, Jul 21, 2011 at 3:46 AM, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On 7/21/2011 3:57 PM, Lorenzo Pieralisi wrote:
On Thu, Jul 21, 2011 at 09:32:12AM +0100, Santosh Shilimkar wrote:
Lorenzo, Colin,
On 7/7/2011 9:20 PM, Lorenzo Pieralisi wrote:
From: Colin Crossccross@android.com
When the cpu is powered down in a low power mode, the gic cpu interface may be reset, and when the cpu complex is powered down, the gic distributor may also be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the gic cpu interface registers, and the CPU_COMPLEX_PM_ENTER and CPU_COMPLEX_PM_EXIT notifiers to save and restore the gic distributor registers.
Signed-off-by: Colin Crossccross@android.com
arch/arm/common/gic.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 212 insertions(+), 0 deletions(-)
diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c index 4ddd0a6..8d62e07 100644 --- a/arch/arm/common/gic.c +++ b/arch/arm/common/gic.c
[...]
I missed one more comment in the last review.
+static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{
- int i;
- for (i = 0; i< MAX_GIC_NR; i++) {
- switch (cmd) {
- case CPU_PM_ENTER:
- gic_cpu_save(i);
On OMAP, GIC cpu interface context is lost only when CPU cluster is powered down.
Yes, it's true, but that's the only chance we have to save the GIC CPU IF state if the GIC context is lost, right ? It is a private memory map per processor; I agree, it might be useless if just one CPU is shutdown, but at that point in time you do not know the state of other CPUs. If the cluster moves to a state where GIC context is lost at least you had the GIC CPU IF state saved. If we do not save it, well, there is no way to do that anymore since the last CPU cannot access other CPUs GIC CPU IF registers (or better, banked GIC distributor registers). If you force hotplug on CPUs other than 0 (that's the way it is done on OMAP4 in cpuidle, right ?) to hit deep low-power states you reinit the GIC CPU IF state as per cold boot, so yes, it is useless there.
Actually, on OMAP there is no need to save any CPU interface registers.
For my OMAP4 PM rebasing, for time-being I will go with exported GIC functions so that I don't have too many redundancies with GIC save/restore code.
I think you should try to balance cpu idle latency with reuse of common code. In this case, you are avoiding restoring 7 registers by reimplementing the bare minimum that is necessary for OMAP4, which is unlikely to make a measurable impact on wakeup latency. Can you try starting with reusing all the common code, and add some timestamps during wakeup to measure where the longest delays are, to determine where you should diverge from the common code and use omap-optimized code?
I am going to use all the common code but having them exported functions gives more flexibility to call them in right and needed places. As discussed earlier, I plan to use the common GIC code wherever it's needed on OMAP.
My main point was we are saving and restoring GIC CPU interface registers for a case where they are actually not lost.
Yes, but you're still avoiding 7 registers, which is unlikely to be worth the complexity of calling these functions differently from every other platform.
Colin,
On Friday 22 July 2011 10:51 AM, Colin Cross wrote:
On Thu, Jul 21, 2011 at 10:10 PM, Santosh Shilimkar
[....]
For my OMAP4 PM rebasing, for time-being I will go with exported GIC functions so that I don't have too many redundancies with GIC save/restore code.
I think you should try to balance cpu idle latency with reuse of common code. In this case, you are avoiding restoring 7 registers by reimplementing the bare minimum that is necessary for OMAP4, which is unlikely to make a measurable impact on wakeup latency. Can you try starting with reusing all the common code, and add some timestamps during wakeup to measure where the longest delays are, to determine where you should diverge from the common code and use omap-optimized code?
I am going to use all the common code but having them exported functions gives more flexibility to call them in right and needed places. As discussed earlier, I plan to use the common GIC code wherever it's needed on OMAP.
My main point was we are saving and restoring GIC CPU interface registers for a case where they are actually not lost.
Yes, but you're still avoiding 7 registers, which is unlikely to be worth the complexity of calling these functions differently from every other platform.
I managed to use pm notifiers for GIC and VFP on OMAP4 and get that working. As discussed here, I decided to take the hit on the latency in favour of re-use of the code for GIC considering it's helping other platforms.
I did update notifiers patches for couple of things.
- VFP code now make use 'vfp_current_hw_state' instead of 'last_VFP_context'
- I have renamed CPU_COMPLEX to more appropriate CPU_CLUSTER
- I have dropped GIC dist. disable as part of GIC dist save code. I saw lock ups with CPUIDLE and then tracked down, to an issue where for some reason if cluster doesn't hit the targeted low power state, CPU gets locked up since GIC dist. remains disabled. The GIC restore is done only if the cluster did hit the deeper state and GIC lost it's context. As such this change should not impact anything.
What's your plan on these notifiers patches considering there is a request to make them generic and not just ARM specific?
Sorry for asking this but now I have dependency on this series :) If you want, I can participate here to get this moving. Let me know.
Regards Santosh
From: Colin Cross ccross@android.com
When the cpu is powered down in a low power mode, the vfp registers may be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the cpu's vfp registers.
Signed-off-by: Colin Cross ccross@android.com --- arch/arm/vfp/vfpmodule.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-)
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index f25e7ec..6f08dbe 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -21,6 +21,7 @@ #include <asm/cputype.h> #include <asm/thread_notify.h> #include <asm/vfp.h> +#include <asm/cpu_pm.h>
#include "vfpinstr.h" #include "vfp.h" @@ -169,6 +170,44 @@ static struct notifier_block vfp_notifier_block = { .notifier_call = vfp_notifier, };
+#ifdef CONFIG_CPU_PM +static int vfp_cpu_pm_notifier(struct notifier_block *self, unsigned long cmd, + void *v) +{ + u32 fpexc = fmrx(FPEXC); + unsigned int cpu = smp_processor_id(); + + switch (cmd) { + case CPU_PM_ENTER: + if (last_VFP_context[cpu]) { + fmxr(FPEXC, fpexc | FPEXC_EN); + vfp_save_state(last_VFP_context[cpu], fpexc); + /* force a reload when coming back from idle */ + last_VFP_context[cpu] = NULL; + fmxr(FPEXC, fpexc & ~FPEXC_EN); + } + break; + case CPU_PM_ENTER_FAILED: + case CPU_PM_EXIT: + /* make sure VFP is disabled when leaving idle */ + fmxr(FPEXC, fpexc & ~FPEXC_EN); + break; + } + return NOTIFY_OK; +} + +static struct notifier_block vfp_cpu_pm_notifier_block = { + .notifier_call = vfp_cpu_pm_notifier, +}; + +static void vfp_cpu_pm_init(void) +{ + cpu_pm_register_notifier(&vfp_cpu_pm_notifier_block); +} +#else +static inline void vfp_cpu_pm_init(void) { } +#endif + /* * Raise a SIGFPE for the current process. * sicode describes the signal being raised. @@ -563,6 +602,7 @@ static int __init vfp_init(void) vfp_vector = vfp_support_entry;
thread_register_notifier(&vfp_notifier_block); + vfp_cpu_pm_init(); vfp_pm_init();
/*
On Thu, Jul 07, 2011 at 04:50:17PM +0100, Lorenzo Pieralisi wrote:
From: Colin Cross ccross@android.com
When the cpu is powered down in a low power mode, the vfp registers may be reset.
This patch uses CPU_PM_ENTER and CPU_PM_EXIT notifiers to save and restore the cpu's vfp registers.
Signed-off-by: Colin Cross ccross@android.com
arch/arm/vfp/vfpmodule.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-)
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index f25e7ec..6f08dbe 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -21,6 +21,7 @@ #include <asm/cputype.h> #include <asm/thread_notify.h> #include <asm/vfp.h> +#include <asm/cpu_pm.h> #include "vfpinstr.h" #include "vfp.h" @@ -169,6 +170,44 @@ static struct notifier_block vfp_notifier_block = { .notifier_call = vfp_notifier, }; +#ifdef CONFIG_CPU_PM +static int vfp_cpu_pm_notifier(struct notifier_block *self, unsigned long cmd,
- void *v)
+{
- u32 fpexc = fmrx(FPEXC);
- unsigned int cpu = smp_processor_id();
- switch (cmd) {
- case CPU_PM_ENTER:
if (last_VFP_context[cpu]) {
fmxr(FPEXC, fpexc | FPEXC_EN);
vfp_save_state(last_VFP_context[cpu], fpexc);
/* force a reload when coming back from idle */
last_VFP_context[cpu] = NULL;
fmxr(FPEXC, fpexc & ~FPEXC_EN);
}
break;
This doesn't look right. On SMP setups, we always save the state of an enabled VFP on thread switches. That means the saved context in every thread is always up to date for all threads, except _possibly_ for the currently executing thread on the CPU.
On UP setups, we only save the state when we need to, so we need to do something like the above.
However, we're growing more and more functions in the VFP code dealing with saving and restoring state, and it's starting to become really silly and confusing about which function is called for what and why, and why it's different from another function doing something similar.
We need to sort this out so we have a _sane_ approach to this, rather than inventing more and more creative ways to save VFP state and restore it later.
On Sat, Jul 09, 2011 at 11:44:08AM +0100, Russell King - ARM Linux wrote:
We need to sort this out so we have a _sane_ approach to this, rather than inventing more and more creative ways to save VFP state and restore it later.
And here, let's prove that the current code is just soo bloody complex that it needs redoing. In the following code, 'last_VFP_context' is renamed to 'vfp_current_hw_state' for clarity.
void vfp_sync_hwstate(struct thread_info *thread) { unsigned int cpu = get_cpu();
/* * If the thread we're interested in is the current owner of the * hardware VFP state, then we need to save its state. */ if (vfp_current_hw_state[cpu] == &thread->vfpstate) { u32 fpexc = fmrx(FPEXC);
/* * Save the last VFP state on this CPU. */ fmxr(FPEXC, fpexc | FPEXC_EN); vfp_save_state(&thread->vfpstate, fpexc | FPEXC_EN); fmxr(FPEXC, fpexc); }
Here, 'thread' is the thread we're interested in ensuring that we have up to date context in thread->vfpstate. On entry to this function, we can be running on any CPU in the system, and 'thread' could have been running on any other CPU in the system.
What this code is saying is: if the currrent CPU's hardware VFP state was owned by this thread, then update the current VFP state. So far it looks sane.
Now, lets consider what with thread migration. First, lets define three threads.
Thread 1, we'll call 'interesting_thread' which is a thread which is running on CPU0, using VFP (so vfp_current_hw_state[0] = &interesting_thread->vfpstate) and gets migrated off to CPU1, where it continues execution of VFP instructions.
Thread 2, we'll call 'new_cpu0_thread' which is the thread which takes over on CPU0. This has also been using VFP, and last used VFP on CPU0, but doesn't use it again.
The following code will be executed twice:
cpu = thread->cpu;
/* * On SMP, if VFP is enabled, save the old state in * case the thread migrates to a different CPU. The * restoring is done lazily. */ if ((fpexc & FPEXC_EN) && vfp_current_hw_state[cpu]) { vfp_save_state(vfp_current_hw_state[cpu], fpexc); vfp_current_hw_state[cpu]->hard.cpu = cpu; } /* * Thread migration, just force the reloading of the * state on the new CPU in case the VFP registers * contain stale data. */ if (thread->vfpstate.hard.cpu != cpu) vfp_current_hw_state[cpu] = NULL;
The first execution will be on CPU0 to switch away from 'interesting_thread'. interesting_thread->cpu will be 0.
So, vfp_current_hw_state[0] points at interesting_thread->vfpstate. The hardware state will be saved, along with the CPU number (0) that it was executing on.
'thread' will be 'new_cpu0_thread' with new_cpu0_thread->cpu = 0. Also, because it was executing on CPU0, new_cpu0_thread->vfpstate.hard.cpu = 0, and so the thread migration check is not triggered.
This means that vfp_current_hw_state[0] remains pointing at interesting_thread.
The second execution will be on CPU1 to switch _to_ 'interesting_thread'. So, 'thread' will be 'interesting_thread' and interesting_thread->cpu now will be 1. The previous thread executing on CPU1 is not relevant to this so we shall ignore that.
We get to the thread migration check. Here, we discover that interesting_thread->vfpstate.hard.cpu = 0, yet interesting_thread->cpu is now 1, indicating thread migration. We set vfp_current_hw_state[1] to NULL.
So, at this point vfp_current_hw_state[] contains the following:
[0] = interesting_thread [1] = NULL
Our interesting thread now executes a VFP instruction, takes a fault which loads the state into the VFP hardware. Now, through the assembly we now have:
[0] = interesting_thread [1] = interesting_thread
CPU1 stops due to ptrace (and so saves its VFP state) using the thread switch code above), and CPU0 calls vfp_sync_hwstate().
if (vfp_current_hw_state[cpu] == &thread->vfpstate) { vfp_save_state(&thread->vfpstate, fpexc | FPEXC_EN);
BANG, we corrupt interesting_thread's VFP state by overwriting the more up-to-date state saved by CPU1 with the old VFP state from CPU0.
I think this is not the only problem with this code, and it's in desperate need of being cleaned up. Until such time, adding more state saving code is just going to be extremely hairy.
Finally, as far as saving state for _idle_ goes (in other words, while the CPU's idle loop in cpu_idle() is running), take a moment to consider the following: the idle thread being a kernel thread does not use VFP. It has no useful VFP state. So:
1. On SMP, because we've switched away from any userland thread, we have already saved its state when we switched away.
If VFP hardware state is lost across an idle, the only thing that needs doing is that fact noted by setting vfp_current_hw_state[cpu] for the CPUs which lost VFP state to NULL. No state saving is required.
2. On UP, the VFP hardware may contain the current threads state, which, if state is lost, would need to be saved _IF_ vfp_current_hw_state[cpu] is non-NULL. Again, if state is lost, then vfp_current_hw_state[cpu] needs to be NULL'd.
These conditions won't change as a result of cleaning up the hairyness of the existing code.
In order to define a common idle interface for the kernel to enter low power modes, this patch provides include files and code that manages OS calls for low power entry and exit.
In ARM world processor HW is categorized as CPU and Cluster.
Corresponding states defined by this common IF are:
C-state [CPU state]:
0 - RUN MODE 1 - STANDBY 2 - DORMANT (not supported by this patch) 3 - SHUTDOWN
R-state [CLUSTER state]
0 - RUN 1 - STANDBY (not supported by this patch) 2 - L2 RAM retention 3 - SHUTDOWN
idle modes are entered through
cpu_enter_idle(cstate, rstate, flags) [sr_entry.S]
which could replace the current processor.idle entry in proc info, since it just executes wfi for shallow C-states.
Cluster low-power states are reached if and only if all the CPUs in the cluster are in low-power mode.
Only one cluster is supported at present, and the kernel infrastructure should be improved to allow multiple clusters to be defined and enumerated.
Current page table dir and stack pointers are saved using a per-cpu variable; this scheme breaks as soon as clusters are added to the kernel.
The code keeps a cpumask of alive CPUs and manages the state transitions accordingly.
Most of the variables needed when the CPU is powered down (MMU off) are allocated through a platform hook:
platform_context_pointer(unsigned int size)
that returns memory flat-mapped by this patchset as strongly ordered to avoid toying with L2 cleaning when a single CPU enters lowpower.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/include/asm/sr_platform_api.h | 28 ++++ arch/arm/kernel/sr_api.c | 197 +++++++++++++++++++++++++++++ arch/arm/kernel/sr_entry.S | 213 ++++++++++++++++++++++++++++++++ 3 files changed, 438 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/sr_platform_api.h create mode 100644 arch/arm/kernel/sr_api.c create mode 100644 arch/arm/kernel/sr_entry.S
diff --git a/arch/arm/include/asm/sr_platform_api.h b/arch/arm/include/asm/sr_platform_api.h new file mode 100644 index 0000000..32367be --- /dev/null +++ b/arch/arm/include/asm/sr_platform_api.h @@ -0,0 +1,28 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#ifndef ASMARM_SR_PLATFORM_API_H +#define ASMARM_SR_PLATFORM_API_H + +#define SR_SAVE_L2 (1 << 31) +#define SR_SAVE_SCU (1 << 30) +#define SR_SAVE_ALL (SR_SAVE_L2 | SR_SAVE_SCU) + +struct lp_state { + u16 cpu; + u16 cluster; +}; + +extern void (*sr_sleep)(void); +extern void (*arch_reset_handler(void))(void); +extern int cpu_enter_idle(unsigned cstate, unsigned rstate, unsigned flags); +extern void *platform_context_pointer(unsigned int); +#endif diff --git a/arch/arm/kernel/sr_api.c b/arch/arm/kernel/sr_api.c new file mode 100644 index 0000000..4e48f60 --- /dev/null +++ b/arch/arm/kernel/sr_api.c @@ -0,0 +1,197 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/slab.h> +#include <linux/errno.h> +#include <linux/module.h> +#include <linux/pm.h> +#include <linux/sched.h> +#include <linux/cache.h> +#include <linux/cpu.h> + +#include <asm/cacheflush.h> +#include <asm/tlbflush.h> +#include <asm/system.h> +#include <asm/cpu_pm.h> +#include <asm/lb_lock.h> +#include <asm/sr_platform_api.h> + +#include "sr_helpers.h" +#include "sr.h" + + +struct ____cacheline_aligned sr_main_table main_table = { + .num_clusters = SR_NR_CLUSTERS, + .cpu_idle_mask = { { CPU_BITS_NONE }, }, +}; + +static int late_init(void); + +int sr_runtime_init(void) +{ + int ret; + + context_memory_uncached = + platform_context_pointer(CONTEXT_SPACE_UNCACHED); + + if (!context_memory_uncached) + return -ENOMEM; + + ret = linux_sr_setup_translation_tables(); + + if (ret < 0) + return ret; + + ret = sr_context_init(); + + return ret; +} + +/* return the warm-boot entry point virtual address */ +void (*arch_reset_handler(void))(void) +{ + return (void (*)(void)) arch->reset; +} + +static int late_init(void) +{ + int rc; + struct sr_cluster *cluster; + int cluster_index, cpu_index = sr_platform_get_cpu_index(); + + cluster_index = sr_platform_get_cluster_index(); + cluster = main_table.cluster_table + cluster_index; + main_table.os_mmu_context[cluster_index][cpu_index] = + current->active_mm->pgd; + cpu_switch_mm(main_table.fw_mmu_context, current->active_mm); + rc = sr_platform_init(); + cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index], + current->active_mm); + return rc; +} + +void (*sr_sleep)(void) = default_sleep; + +void enter_idle(unsigned cstate, unsigned rstate, unsigned flags) +{ + struct sr_cpu *cpu; + struct sr_cluster *cluster; + cpumask_t *cpuidle_mask; + int cpu_index, cluster_index; + + cluster_index = sr_platform_get_cluster_index(); + cpu_index = sr_platform_get_cpu_index(); + cpuidle_mask = &main_table.cpu_idle_mask[cluster_index]; + /* + * WARNING: cluster support will break if multiple clusters are + * instantiated within the kernel. The current version works + * with just one cluster and cpu_index is the hardware processor + * id in cluster index 0. + */ + main_table.os_mmu_context[cluster_index][cpu_index] = + current->active_mm->pgd; + cpu_switch_mm(main_table.fw_mmu_context, current->active_mm); + local_flush_tlb_all(); + + cluster = main_table.cluster_table + cluster_index; + cpu = cluster->cpu_table + cpu_index; + + get_spinlock(cpu_index, cluster->lock); + + __cpu_set(cpu_index, cpuidle_mask); + + if (cpumask_weight(cpuidle_mask) == num_online_cpus()) + cluster->power_state = rstate; + + cluster->cluster_down = (cluster->power_state >= 2); + + cpu->power_state = cstate; + + cpu_pm_enter(); + + if (cluster->cluster_down) + cpu_complex_pm_enter(); + + sr_platform_enter_cstate(cpu_index, cpu, cluster); + + sr_save_context(cluster, cpu, flags); + + release_spinlock(cpu_index, cluster->lock); + + /* Point of no return */ + (*sr_sleep)(); + + /* + * In case we wanted sr_sleep to return + * here is code to turn MMU off and go + * the whole hog on the resume path + */ + + cpu_reset((virt_to_phys((void *) arch->reset))); +} + +void exit_idle(struct sr_main_table *mt) +{ + struct sr_cpu *cpu; + struct sr_cluster *cluster; + int cpu_index, cluster_index; + + cpu_index = sr_platform_get_cpu_index(); + + cluster_index = sr_platform_get_cluster_index(); + + cluster = mt->cluster_table + cluster_index; + cpu = cluster->cpu_table + cpu_index; + + PA(get_spinlock)(cpu_index, cluster->lock); + + PA(sr_restore_context)(cluster, cpu); + + sr_platform_leave_cstate(cpu_index, cpu, cluster); + + if (cluster->cluster_down) { + cpu_complex_pm_exit(); + cluster->cluster_down = 0; + } + + cpu_pm_exit(); + + cpu_clear(cpu_index, main_table.cpu_idle_mask[cluster_index]); + + cpu->power_state = 0; + cluster->power_state = 0; + + release_spinlock(cpu_index, cluster->lock); + cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index], + current->active_mm); + local_flush_tlb_all(); +} + + +int sr_init(void) +{ + if (lookup_arch()) { + printk(KERN_EMERG "SR INIT: Undetected architecture id\n"); + BUG(); + } + + if (sr_runtime_init()) { + printk(KERN_EMERG "SR INIT: runtime init error\n"); + BUG(); + } + + if (late_init()) { + printk(KERN_EMERG "SR INIT: late init error\n"); + BUG(); + } + + return 0; +} +arch_initcall(sr_init); diff --git a/arch/arm/kernel/sr_entry.S b/arch/arm/kernel/sr_entry.S new file mode 100644 index 0000000..4fa9bef --- /dev/null +++ b/arch/arm/kernel/sr_entry.S @@ -0,0 +1,213 @@ +/* + * Copyright (c) 2008-2011 ARM Ltd + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/linkage.h> +#include <generated/asm-offsets.h> +#include <asm/thread_info.h> +#include <asm/memory.h> +#include <asm/ptrace.h> +#include <asm/glue-proc.h> +#include <asm/assembler.h> +#include <asm-generic/errno-base.h> +#include <mach/entry-macro.S> + + .text + +ENTRY(default_sleep) + b out @ BTAC allocates branch and enters loop mode +idle: @ power down is entered with GIC CPU IF still on which + dsb @ might get wfi instruction to complete before the + wfi @ CPU is shut down -- infinite loop +out: + b idle +ENDPROC(default_sleep) + + +ENTRY(sr_suspend) + b cpu_do_suspend +ENDPROC(sr_suspend) + +ENTRY(sr_resume) + add lr, lr, #(PAGE_OFFSET - PLAT_PHYS_OFFSET) + stmfd sp!, {r4 - r11, lr} + ldr lr, =mmu_on + b cpu_do_resume +mmu_on: + ldmfd sp!, {r4 - r11, pc} +ENDPROC(sr_resume) + +/* + * This code is in the .data section to retrieve stack pointers stored in + * platform_cpu_stacks and platform_cpu_nc_stacks with a pc relative load. + * It cannot live in .text since that section can be treated as read-only + * and would break the code, which requires stack pointers to be saved on + * idle entry. + */ + .data + .align + .global idle_save_context + .global idle_restore_context + .global idle_mt + .global platform_cpu_stacks + .global platform_cpu_nc_stacks + +/* + * idle entry point + * Must be called with IRQ disabled + * Idle states are differentiated between CPU and Cluster states + * + * r0 = cstate defines the CPU power state + * r1 = rstate defines the Cluster power state + * r2 = flags define what has to be saved + * + * C-STATE mapping + * 0 - run + * 1 - wfi (aka standby) + * 2 - dormant (not supported) + * 3 - shutdown + * + * R-STATE mapping + * 0 - run + * 1 - not supported + * 2 - L2 retention + * 3 - Off mode (every platform defines it, e.g. GIC power domain) + * + * Cluster low-power states might be hit if and only if all the CPUs making up + * the clusters are in some deep C-STATE + * + */ + +ENTRY(cpu_enter_idle) + cmp r0, #2 @ this function can replace the idle function + wfilt @ in the processor struct. If targeted power + movlt r0, #0 @ states are shallow ones it just executes wfi + movlt pc, lr @ and returns + cmp r0, #3 + cmpls r1, #3 + mvnhi r0, #EINVAL + movhi pc, lr + stmfd sp!, {r4 - r12, lr} + stmfd sp, {r0, r1} +#ifdef CONFIG_SMP + adr r0, platform_cpu_stacks + ALT_SMP(mrc p15, 0, r1, c0, c0, 5) + ALT_UP(mov r1, #0) + and r1, r1, #15 + str sp, [r0, r1, lsl #2] @ stack phys addr - save it for resume +#else + str sp, platform_cpu_stacks +#endif + sub sp, sp, #8 + ldmfd sp!,{r0, r1} + bl enter_idle + mov r0, #0 + ldmfd sp!, {r4 - r12, pc} +ENDPROC(cpu_enter_idle) + +/* + * This hook, though not strictly necessary, provides an entry point where, if + * needed, stack pointers can be switched in case it is needed to improve L2 + * retention management (uncached stack). + */ +ENTRY(sr_save_context) + adr r12, idle_save_context + ldr r12, [r12] + bx r12 +ENDPROC(sr_save_context) + +ENTRY(sr_reset_entry_point) + @ This is the entry point from the platform warm start code + @ It runs with MMU off straight from reset + setmode PSR_I_BIT | PSR_F_BIT | SVC_MODE, r0 @ set SVC, irqs off +#ifdef CONFIG_SMP + adr r0, platform_cpu_nc_stacks + ALT_SMP(mrc p15, 0, r1, c0, c0, 5) + ALT_UP(mov r1, #0) + and r1, r1, #15 + ldr r0, [r0, r1, lsl #2] @ stack phys addr +#else + ldr r0, platform_cpu_nc_stacks @ stack phys addr +#endif + mov sp, r0 + adr r0, idle_mt @ get phys address of main table and pass it on + ldr r0, [r0] + ldr lr, =return_from_idle + adr r1, resume + ldr r1, [r1] + bx r1 +return_from_idle: + @ return to enter_idle caller, with success + mov r0, #0 + ldmfd sp!, {r4 - r12, pc} @ return from idle - registers saved in +ENDPROC(sr_reset_entry_point) @ cpu_enter_idle() are still there + + +ENTRY(sr_restore_context) + add lr, lr, #(PAGE_OFFSET - PLAT_PHYS_OFFSET) + stmfd sp!, {r4, lr} + adr r12, idle_restore_context + ldr r12, [r12] + ldr lr, =switch_stack + bx r12 +switch_stack: + @ CPU context restored, time to switch to Linux stack and pop out +#ifdef CONFIG_SMP + adr r0, platform_cpu_stacks + ALT_SMP(mrc p15, 0, r1, c0, c0, 5) + ALT_UP(mov r1, #0) + and r1, r1, #15 + ldr r0, [r0, r1, lsl #2] @ top stack addr +#else + ldr r0, platform_cpu_stacks @ top stack addr +#endif + mov r3, r0 +#ifdef CONFIG_SMP + adr r0, platform_cpu_nc_stacks + ALT_SMP(mrc p15, 0, r1, c0, c0, 5) + ALT_UP(mov r1, #0) + and r1, r1, #15 + ldr r0, [r0, r1, lsl #2] @ non-cacheable stack phys addr +#else + ldr r0, platform_cpu_nc_stacks @ non-cacheable stack phys addr +#endif + sub r2, r0, sp + sub r0, r3, r2 + mov r1, sp + mov r4, r0 + bl memcpy @ copy stack used in resume to current stack + mov sp, r4 + bl cpu_init @ init banked registers + ldmfd sp!, {r4, pc} +ENDPROC(sr_restore_context) + +idle_save_context: + .long 0 +idle_restore_context: + .long 0 + +idle_mt: + .long main_table - PAGE_OFFSET + PLAT_PHYS_OFFSET + +resume: + .long exit_idle - PAGE_OFFSET + PLAT_PHYS_OFFSET + +platform_cpu_stacks: + .rept CONFIG_NR_CPUS + .long 0 @ preserve stack phys ptr here + .endr + +platform_cpu_nc_stacks: + .rept CONFIG_NR_CPUS + .long 0 @ preserve uncached + @ stack phys ptr here + .endr + + .end
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
In order to define a common idle interface for the kernel to enter low power modes, this patch provides include files and code that manages OS calls for low power entry and exit.
[....]
diff --git a/arch/arm/kernel/sr_entry.S b/arch/arm/kernel/sr_entry.S new file mode 100644 index 0000000..4fa9bef --- /dev/null +++ b/arch/arm/kernel/sr_entry.S @@ -0,0 +1,213 @@ +/*
- Copyright (c) 2008-2011 ARM Ltd
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include<linux/linkage.h> +#include<generated/asm-offsets.h> +#include<asm/thread_info.h> +#include<asm/memory.h> +#include<asm/ptrace.h> +#include<asm/glue-proc.h> +#include<asm/assembler.h> +#include<asm-generic/errno-base.h> +#include<mach/entry-macro.S>
- .text
+ENTRY(default_sleep)
- b out @ BTAC allocates branch and enters loop mode
+idle: @ power down is entered with GIC CPU IF still on which
- dsb @ might get wfi instruction to complete before the
- wfi @ CPU is shut down -- infinite loop
+out:
b idle
+ENDPROC(default_sleep)
Q: What happens for some reason CPU didn't hit targeted state in IDLE. Does CPU keep looping here forever.
On OMAP4, we need to issue additional interconnect barrier before WFI. How can we make provision for the same
Regards Santosh
Hi Santosh,
Thanks for looking at this series.
On Fri, Jul 08, 2011 at 02:45:43AM +0100, Santosh Shilimkar wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
In order to define a common idle interface for the kernel to enter low power modes, this patch provides include files and code that manages OS calls for low power entry and exit.
[....]
diff --git a/arch/arm/kernel/sr_entry.S b/arch/arm/kernel/sr_entry.S new file mode 100644 index 0000000..4fa9bef --- /dev/null +++ b/arch/arm/kernel/sr_entry.S @@ -0,0 +1,213 @@ +/*
- Copyright (c) 2008-2011 ARM Ltd
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include<linux/linkage.h> +#include<generated/asm-offsets.h> +#include<asm/thread_info.h> +#include<asm/memory.h> +#include<asm/ptrace.h> +#include<asm/glue-proc.h> +#include<asm/assembler.h> +#include<asm-generic/errno-base.h> +#include<mach/entry-macro.S>
- .text
+ENTRY(default_sleep)
- b out @ BTAC allocates branch and enters loop mode
+idle: @ power down is entered with GIC CPU IF still on which
- dsb @ might get wfi instruction to complete before the
- wfi @ CPU is shut down -- infinite loop
+out:
b idle
+ENDPROC(default_sleep)
Q: What happens for some reason CPU didn't hit targeted state in IDLE. Does CPU keep looping here forever.
On OMAP4, we need to issue additional interconnect barrier before WFI. How can we make provision for the same
That's why I added a function pointer, (*sr_sleep) as a way to override the default loop behaviour which is there to prevent the cpu to exit wfi after a point of no return (GIC CPU IF is still on). It is just a tentative solution, so please feel free to comment on this. If you pop out from (*sr_sleep) the current code jumps through cpu_reset and emulates a reset, which may not be optimal. You might also want to execute from SRAM. I have to cater for that.
Again, comments more than welcome.
Lorenzo
Hi Lorenzo,
only a few comments at this stage.
The sr_entry.S code is both exclusively .arm (using conditionals and long-distance adr, i.e. not Thumb2-clean), and it uses post-armv5 instructions (like wfi). Same for the other *.S code in the patch series. It's non-generic assembly within arch/arch/kernel/, wouldn't one better place this into arch/arm/mm/...-v[67].S ?
Then, sr_suspend/sr_resume; these functions are "C-exported" and are directly calling cpu_do_suspend/do_resume to pass a supplied buffer; I've done that for one iteration of the hibernation patch, yes, but that was a bit sneaky and Russell stated then the interface is cpu_suspend/cpu_resume not the proc funcs directly. Unless _those_ have been changed they're also unsafe to call from C funcs (clobber all regs). Couldn't you simply use cpu_suspend/resume directly ?
How much memory do all the pagedirs require that are being kept around ? Why does each core need a separate one, what would happen to just use a single "identity table" for all ? I understand you can't use swapper_pg_dir for idle, so a separate one has to be allocated, yet the question remains why per-cpu required ?
I'm currently transitioning between jobs; will re-subscribe to arm-kernel under a different email address soon, this one is likely to stop working in August. Sorry the inconvenience and high-latency responses till then :(
FrankH.
On Thu, 7 Jul 2011, Lorenzo Pieralisi wrote:
In order to define a common idle interface for the kernel to enter low power modes, this patch provides include files and code that manages OS calls for low power entry and exit.
In ARM world processor HW is categorized as CPU and Cluster.
Corresponding states defined by this common IF are:
C-state [CPU state]:
0 - RUN MODE 1 - STANDBY 2 - DORMANT (not supported by this patch) 3 - SHUTDOWN
R-state [CLUSTER state]
0 - RUN 1 - STANDBY (not supported by this patch) 2 - L2 RAM retention 3 - SHUTDOWN
idle modes are entered through
cpu_enter_idle(cstate, rstate, flags) [sr_entry.S]
which could replace the current processor.idle entry in proc info, since it just executes wfi for shallow C-states.
Cluster low-power states are reached if and only if all the CPUs in the cluster are in low-power mode.
Only one cluster is supported at present, and the kernel infrastructure should be improved to allow multiple clusters to be defined and enumerated.
Current page table dir and stack pointers are saved using a per-cpu variable; this scheme breaks as soon as clusters are added to the kernel.
The code keeps a cpumask of alive CPUs and manages the state transitions accordingly.
Most of the variables needed when the CPU is powered down (MMU off) are allocated through a platform hook:
platform_context_pointer(unsigned int size)
that returns memory flat-mapped by this patchset as strongly ordered to avoid toying with L2 cleaning when a single CPU enters lowpower.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/include/asm/sr_platform_api.h | 28 ++++ arch/arm/kernel/sr_api.c | 197 +++++++++++++++++++++++++++++ arch/arm/kernel/sr_entry.S | 213 ++++++++++++++++++++++++++++++++ 3 files changed, 438 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/sr_platform_api.h create mode 100644 arch/arm/kernel/sr_api.c create mode 100644 arch/arm/kernel/sr_entry.S
diff --git a/arch/arm/include/asm/sr_platform_api.h b/arch/arm/include/asm/sr_platform_api.h new file mode 100644 index 0000000..32367be --- /dev/null +++ b/arch/arm/include/asm/sr_platform_api.h @@ -0,0 +1,28 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#ifndef ASMARM_SR_PLATFORM_API_H +#define ASMARM_SR_PLATFORM_API_H
+#define SR_SAVE_L2 (1 << 31) +#define SR_SAVE_SCU (1 << 30) +#define SR_SAVE_ALL (SR_SAVE_L2 | SR_SAVE_SCU)
+struct lp_state {
- u16 cpu;
- u16 cluster;
+};
+extern void (*sr_sleep)(void); +extern void (*arch_reset_handler(void))(void); +extern int cpu_enter_idle(unsigned cstate, unsigned rstate, unsigned flags); +extern void *platform_context_pointer(unsigned int); +#endif diff --git a/arch/arm/kernel/sr_api.c b/arch/arm/kernel/sr_api.c new file mode 100644 index 0000000..4e48f60 --- /dev/null +++ b/arch/arm/kernel/sr_api.c @@ -0,0 +1,197 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/slab.h> +#include <linux/errno.h> +#include <linux/module.h> +#include <linux/pm.h> +#include <linux/sched.h> +#include <linux/cache.h> +#include <linux/cpu.h>
+#include <asm/cacheflush.h> +#include <asm/tlbflush.h> +#include <asm/system.h> +#include <asm/cpu_pm.h> +#include <asm/lb_lock.h> +#include <asm/sr_platform_api.h>
+#include "sr_helpers.h" +#include "sr.h"
+struct ____cacheline_aligned sr_main_table main_table = {
- .num_clusters = SR_NR_CLUSTERS,
- .cpu_idle_mask = { { CPU_BITS_NONE }, },
+};
+static int late_init(void);
+int sr_runtime_init(void) +{
- int ret;
- context_memory_uncached =
platform_context_pointer(CONTEXT_SPACE_UNCACHED);
- if (!context_memory_uncached)
return -ENOMEM;
- ret = linux_sr_setup_translation_tables();
- if (ret < 0)
return ret;
- ret = sr_context_init();
- return ret;
+}
+/* return the warm-boot entry point virtual address */ +void (*arch_reset_handler(void))(void) +{
- return (void (*)(void)) arch->reset;
+}
+static int late_init(void) +{
- int rc;
- struct sr_cluster *cluster;
- int cluster_index, cpu_index = sr_platform_get_cpu_index();
- cluster_index = sr_platform_get_cluster_index();
- cluster = main_table.cluster_table + cluster_index;
- main_table.os_mmu_context[cluster_index][cpu_index] =
current->active_mm->pgd;
- cpu_switch_mm(main_table.fw_mmu_context, current->active_mm);
- rc = sr_platform_init();
- cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index],
current->active_mm);
- return rc;
+}
+void (*sr_sleep)(void) = default_sleep;
+void enter_idle(unsigned cstate, unsigned rstate, unsigned flags) +{
- struct sr_cpu *cpu;
- struct sr_cluster *cluster;
- cpumask_t *cpuidle_mask;
- int cpu_index, cluster_index;
- cluster_index = sr_platform_get_cluster_index();
- cpu_index = sr_platform_get_cpu_index();
- cpuidle_mask = &main_table.cpu_idle_mask[cluster_index];
- /*
* WARNING: cluster support will break if multiple clusters are
* instantiated within the kernel. The current version works
* with just one cluster and cpu_index is the hardware processor
* id in cluster index 0.
*/
- main_table.os_mmu_context[cluster_index][cpu_index] =
current->active_mm->pgd;
- cpu_switch_mm(main_table.fw_mmu_context, current->active_mm);
- local_flush_tlb_all();
- cluster = main_table.cluster_table + cluster_index;
- cpu = cluster->cpu_table + cpu_index;
- get_spinlock(cpu_index, cluster->lock);
- __cpu_set(cpu_index, cpuidle_mask);
- if (cpumask_weight(cpuidle_mask) == num_online_cpus())
cluster->power_state = rstate;
- cluster->cluster_down = (cluster->power_state >= 2);
- cpu->power_state = cstate;
- cpu_pm_enter();
- if (cluster->cluster_down)
cpu_complex_pm_enter();
- sr_platform_enter_cstate(cpu_index, cpu, cluster);
- sr_save_context(cluster, cpu, flags);
- release_spinlock(cpu_index, cluster->lock);
- /* Point of no return */
- (*sr_sleep)();
- /*
* In case we wanted sr_sleep to return
* here is code to turn MMU off and go
* the whole hog on the resume path
*/
- cpu_reset((virt_to_phys((void *) arch->reset)));
+}
+void exit_idle(struct sr_main_table *mt) +{
- struct sr_cpu *cpu;
- struct sr_cluster *cluster;
- int cpu_index, cluster_index;
- cpu_index = sr_platform_get_cpu_index();
- cluster_index = sr_platform_get_cluster_index();
- cluster = mt->cluster_table + cluster_index;
- cpu = cluster->cpu_table + cpu_index;
- PA(get_spinlock)(cpu_index, cluster->lock);
- PA(sr_restore_context)(cluster, cpu);
- sr_platform_leave_cstate(cpu_index, cpu, cluster);
- if (cluster->cluster_down) {
cpu_complex_pm_exit();
cluster->cluster_down = 0;
- }
- cpu_pm_exit();
- cpu_clear(cpu_index, main_table.cpu_idle_mask[cluster_index]);
- cpu->power_state = 0;
- cluster->power_state = 0;
- release_spinlock(cpu_index, cluster->lock);
- cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index],
current->active_mm);
- local_flush_tlb_all();
+}
+int sr_init(void) +{
- if (lookup_arch()) {
printk(KERN_EMERG "SR INIT: Undetected architecture id\n");
BUG();
- }
- if (sr_runtime_init()) {
printk(KERN_EMERG "SR INIT: runtime init error\n");
BUG();
- }
- if (late_init()) {
printk(KERN_EMERG "SR INIT: late init error\n");
BUG();
- }
- return 0;
+} +arch_initcall(sr_init); diff --git a/arch/arm/kernel/sr_entry.S b/arch/arm/kernel/sr_entry.S new file mode 100644 index 0000000..4fa9bef --- /dev/null +++ b/arch/arm/kernel/sr_entry.S @@ -0,0 +1,213 @@ +/*
- Copyright (c) 2008-2011 ARM Ltd
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/linkage.h> +#include <generated/asm-offsets.h> +#include <asm/thread_info.h> +#include <asm/memory.h> +#include <asm/ptrace.h> +#include <asm/glue-proc.h> +#include <asm/assembler.h> +#include <asm-generic/errno-base.h> +#include <mach/entry-macro.S>
- .text
+ENTRY(default_sleep)
- b out @ BTAC allocates branch and enters loop mode
+idle: @ power down is entered with GIC CPU IF still on which
- dsb @ might get wfi instruction to complete before the
- wfi @ CPU is shut down -- infinite loop
+out:
b idle
+ENDPROC(default_sleep)
+ENTRY(sr_suspend)
- b cpu_do_suspend
+ENDPROC(sr_suspend)
+ENTRY(sr_resume)
- add lr, lr, #(PAGE_OFFSET - PLAT_PHYS_OFFSET)
- stmfd sp!, {r4 - r11, lr}
- ldr lr, =mmu_on
- b cpu_do_resume
+mmu_on:
- ldmfd sp!, {r4 - r11, pc}
+ENDPROC(sr_resume)
+/*
- This code is in the .data section to retrieve stack pointers stored in
- platform_cpu_stacks and platform_cpu_nc_stacks with a pc relative load.
- It cannot live in .text since that section can be treated as read-only
- and would break the code, which requires stack pointers to be saved on
- idle entry.
- */
- .data
- .align
- .global idle_save_context
- .global idle_restore_context
- .global idle_mt
- .global platform_cpu_stacks
- .global platform_cpu_nc_stacks
+/*
- idle entry point
- Must be called with IRQ disabled
- Idle states are differentiated between CPU and Cluster states
- r0 = cstate defines the CPU power state
- r1 = rstate defines the Cluster power state
- r2 = flags define what has to be saved
- C-STATE mapping
- 0 - run
- 1 - wfi (aka standby)
- 2 - dormant (not supported)
- 3 - shutdown
- R-STATE mapping
- 0 - run
- 1 - not supported
- 2 - L2 retention
- 3 - Off mode (every platform defines it, e.g. GIC power domain)
- Cluster low-power states might be hit if and only if all the CPUs making up
- the clusters are in some deep C-STATE
- */
+ENTRY(cpu_enter_idle)
- cmp r0, #2 @ this function can replace the idle function
- wfilt @ in the processor struct. If targeted power
- movlt r0, #0 @ states are shallow ones it just executes wfi
- movlt pc, lr @ and returns
- cmp r0, #3
- cmpls r1, #3
- mvnhi r0, #EINVAL
- movhi pc, lr
- stmfd sp!, {r4 - r12, lr}
- stmfd sp, {r0, r1}
+#ifdef CONFIG_SMP
- adr r0, platform_cpu_stacks
- ALT_SMP(mrc p15, 0, r1, c0, c0, 5)
- ALT_UP(mov r1, #0)
- and r1, r1, #15
- str sp, [r0, r1, lsl #2] @ stack phys addr - save it for resume
+#else
- str sp, platform_cpu_stacks
+#endif
- sub sp, sp, #8
- ldmfd sp!,{r0, r1}
- bl enter_idle
- mov r0, #0
- ldmfd sp!, {r4 - r12, pc}
+ENDPROC(cpu_enter_idle)
+/*
- This hook, though not strictly necessary, provides an entry point where, if
- needed, stack pointers can be switched in case it is needed to improve L2
- retention management (uncached stack).
- */
+ENTRY(sr_save_context)
- adr r12, idle_save_context
- ldr r12, [r12]
- bx r12
+ENDPROC(sr_save_context)
+ENTRY(sr_reset_entry_point)
- @ This is the entry point from the platform warm start code
- @ It runs with MMU off straight from reset
- setmode PSR_I_BIT | PSR_F_BIT | SVC_MODE, r0 @ set SVC, irqs off
+#ifdef CONFIG_SMP
- adr r0, platform_cpu_nc_stacks
- ALT_SMP(mrc p15, 0, r1, c0, c0, 5)
- ALT_UP(mov r1, #0)
- and r1, r1, #15
- ldr r0, [r0, r1, lsl #2] @ stack phys addr
+#else
- ldr r0, platform_cpu_nc_stacks @ stack phys addr
+#endif
- mov sp, r0
- adr r0, idle_mt @ get phys address of main table and pass it on
- ldr r0, [r0]
- ldr lr, =return_from_idle
- adr r1, resume
- ldr r1, [r1]
- bx r1
+return_from_idle:
- @ return to enter_idle caller, with success
- mov r0, #0
- ldmfd sp!, {r4 - r12, pc} @ return from idle - registers saved in
+ENDPROC(sr_reset_entry_point) @ cpu_enter_idle() are still there
+ENTRY(sr_restore_context)
- add lr, lr, #(PAGE_OFFSET - PLAT_PHYS_OFFSET)
- stmfd sp!, {r4, lr}
- adr r12, idle_restore_context
- ldr r12, [r12]
- ldr lr, =switch_stack
- bx r12
+switch_stack:
- @ CPU context restored, time to switch to Linux stack and pop out
+#ifdef CONFIG_SMP
- adr r0, platform_cpu_stacks
- ALT_SMP(mrc p15, 0, r1, c0, c0, 5)
- ALT_UP(mov r1, #0)
- and r1, r1, #15
- ldr r0, [r0, r1, lsl #2] @ top stack addr
+#else
- ldr r0, platform_cpu_stacks @ top stack addr
+#endif
- mov r3, r0
+#ifdef CONFIG_SMP
- adr r0, platform_cpu_nc_stacks
- ALT_SMP(mrc p15, 0, r1, c0, c0, 5)
- ALT_UP(mov r1, #0)
- and r1, r1, #15
- ldr r0, [r0, r1, lsl #2] @ non-cacheable stack phys addr
+#else
- ldr r0, platform_cpu_nc_stacks @ non-cacheable stack phys addr
+#endif
- sub r2, r0, sp
- sub r0, r3, r2
- mov r1, sp
- mov r4, r0
- bl memcpy @ copy stack used in resume to current stack
- mov sp, r4
- bl cpu_init @ init banked registers
- ldmfd sp!, {r4, pc}
+ENDPROC(sr_restore_context)
+idle_save_context:
- .long 0
+idle_restore_context:
- .long 0
+idle_mt:
- .long main_table - PAGE_OFFSET + PLAT_PHYS_OFFSET
+resume:
- .long exit_idle - PAGE_OFFSET + PLAT_PHYS_OFFSET
+platform_cpu_stacks:
- .rept CONFIG_NR_CPUS
- .long 0 @ preserve stack phys ptr here
- .endr
+platform_cpu_nc_stacks:
- .rept CONFIG_NR_CPUS
- .long 0 @ preserve uncached
@ stack phys ptr here
- .endr
- .end
-- 1.7.4.4
On Fri, Jul 08, 2011 at 05:12:22PM +0100, Frank Hofmann wrote:
Hi Lorenzo,
only a few comments at this stage.
The sr_entry.S code is both exclusively .arm (using conditionals and long-distance adr, i.e. not Thumb2-clean), and it uses post-armv5 instructions (like wfi). Same for the other *.S code in the patch series. It's non-generic assembly within arch/arch/kernel/, wouldn't one better place this into arch/arm/mm/...-v[67].S ?
Yes, it is certainly something I should improve to make it more generic.
Then, sr_suspend/sr_resume; these functions are "C-exported" and are directly calling cpu_do_suspend/do_resume to pass a supplied buffer; I've done that for one iteration of the hibernation patch, yes, but that was a bit sneaky and Russell stated then the interface is cpu_suspend/cpu_resume not the proc funcs directly. Unless _those_ have been changed they're also unsafe to call from C funcs (clobber all regs). Couldn't you simply use cpu_suspend/resume directly ?
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should. That's why I use cpu_do_{suspend,resume}, so that I can choose the memory used to save registers (uncached), but that's an abuse of API. I agree this is sneaky, but I wanted to avoid duplicating code that just saves registers. Maybe moving to an uncached stack might solve this problem if we want to reuse cpu_suspend from cpu idle, which is still an open point.
How much memory do all the pagedirs require that are being kept around ? Why does each core need a separate one, what would happen to just use a single "identity table" for all ? I understand you can't use swapper_pg_dir for idle, so a separate one has to be allocated, yet the question remains why per-cpu required ?
I just allocate a 1:1 mapping once cloned from init_mm, reused for all CPUs, with an additional mapping. The array of pointers is there to save pgdir on idle entry, one per-cpu.
I'm currently transitioning between jobs; will re-subscribe to arm-kernel under a different email address soon, this one is likely to stop working in August. Sorry the inconvenience and high-latency responses till then :(
FrankH.
May I thank you very much for the review in the interim,
Lorenzo
On Mon, 11 Jul 2011, Lorenzo Pieralisi wrote:
On Fri, Jul 08, 2011 at 05:12:22PM +0100, Frank Hofmann wrote:
Hi Lorenzo,
only a few comments at this stage.
The sr_entry.S code is both exclusively .arm (using conditionals and long-distance adr, i.e. not Thumb2-clean), and it uses post-armv5 instructions (like wfi). Same for the other *.S code in the patch series. It's non-generic assembly within arch/arch/kernel/, wouldn't one better place this into arch/arm/mm/...-v[67].S ?
Yes, it is certainly something I should improve to make it more generic.
Then, sr_suspend/sr_resume; these functions are "C-exported" and are directly calling cpu_do_suspend/do_resume to pass a supplied buffer; I've done that for one iteration of the hibernation patch, yes, but that was a bit sneaky and Russell stated then the interface is cpu_suspend/cpu_resume not the proc funcs directly. Unless _those_ have been changed they're also unsafe to call from C funcs (clobber all regs). Couldn't you simply use cpu_suspend/resume directly ?
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should. That's why I use cpu_do_{suspend,resume}, so that I can choose the memory used to save registers (uncached), but that's an abuse of API.
There's the option to use a switch_stack() as Will Deacon has provided via his kexec series. Agreed not mainline yet but good for ideas. Will's diff is here:
http://www.spinics.net/lists/arm-kernel/msg127951.html
and that one restores the original sp after the "stack switched" function returns.
If you use that to switch to a different (uncached) stack before doing cpu_suspend (and the 'idle' finisher through that), wouldn't that solve the problem ? When the suspend returns, the above would restore your cached stackpointer.
I agree this is sneaky, but I wanted to avoid duplicating code that just saves registers. Maybe moving to an uncached stack might solve this problem if we want to reuse cpu_suspend from cpu idle, which is still an open point.
(me speedtyping above ...) As you say ;-)
How much memory do all the pagedirs require that are being kept around ? Why does each core need a separate one, what would happen to just use a single "identity table" for all ? I understand you can't use swapper_pg_dir for idle, so a separate one has to be allocated, yet the question remains why per-cpu required ?
I just allocate a 1:1 mapping once cloned from init_mm, reused for all CPUs, with an additional mapping.
Have started to wonder whether this facility (a 1:1-mapped pagedir for kernel text/data, or maybe even "non-user text/data") could/should be made available on a global scale; after all, both kexec, reset, hibernate, some forms of idle/suspend do require "some sort of that" to go through MMU reinitialization. I'm unfortunately not deep enough in the VM subsystem to say how exactly this best would have to look like.
Merely mentioning this because it looks while everyone creates 1:1 mappings, there's ever so slight differences between how the 1:1 mapping(s) are created; we've seen:
* (re)using current->active_mm->pgd * (re)using swapper_pg_dir (different lengths for the 1:1 section) * using a separately-allocated/initialized pgdir
Such proliferation usually means there's a justified need to do that kind of thing. Just the gritty details haven't been sorted how everyone with that need could do _the same_ thing instead of reinventing various square wheels.
The main reason why I've used swapper_pg_dir in the hibernation patch is because it's the only static one in the system (the hibernation hooks are in irqs-off codepaths and pgd_alloc isn't a good idea then), and happens to have "clear" lower sections (in the user area) so that one's not actually substituting anything when creating 1:1 mappings (and getting rid of them restores the "pristine" state). But this assumption only holds as long as swapper_pg_dir's use isn't changed. So a little creepy feeling remains.
A generic "reset_mmu_pg_dir", wouldn't that be a good thing to have ?
The array of pointers is there to save pgdir on idle entry, one per-cpu.
If you're going through cpu_{do_}suspend/resume, the TTBRs are saved/restored anyway, what do you need to keep the virtual addresses around for ?
I'm currently transitioning between jobs; will re-subscribe to arm-kernel under a different email address soon, this one is likely to stop working in August. Sorry the inconvenience and high-latency responses till then :(
FrankH.
May I thank you very much for the review in the interim,
You're welcome. FrankH.
Lorenzo
On Mon, Jul 11, 2011 at 03:31:30PM +0100, Frank Hofmann wrote:
On Mon, 11 Jul 2011, Lorenzo Pieralisi wrote:
On Fri, Jul 08, 2011 at 05:12:22PM +0100, Frank Hofmann wrote:
Hi Lorenzo,
only a few comments at this stage.
[...]
How much memory do all the pagedirs require that are being kept around ? Why does each core need a separate one, what would happen to just use a single "identity table" for all ? I understand you can't use swapper_pg_dir for idle, so a separate one has to be allocated, yet the question remains why per-cpu required ?
I just allocate a 1:1 mapping once cloned from init_mm, reused for all CPUs, with an additional mapping.
Have started to wonder whether this facility (a 1:1-mapped pagedir for kernel text/data, or maybe even "non-user text/data") could/should be made available on a global scale; after all, both kexec, reset, hibernate, some forms of idle/suspend do require "some sort of that" to go through MMU reinitialization. I'm unfortunately not deep enough in the VM subsystem to say how exactly this best would have to look like.
Merely mentioning this because it looks while everyone creates 1:1 mappings, there's ever so slight differences between how the 1:1 mapping(s) are created; we've seen:
- (re)using current->active_mm->pgd
- (re)using swapper_pg_dir (different lengths for the 1:1 section)
- using a separately-allocated/initialized pgdir
Such proliferation usually means there's a justified need to do that kind of thing. Just the gritty details haven't been sorted how everyone with that need could do _the same_ thing instead of reinventing various square wheels.
The main reason why I've used swapper_pg_dir in the hibernation patch is because it's the only static one in the system (the hibernation hooks are in irqs-off codepaths and pgd_alloc isn't a good idea then), and happens to have "clear" lower sections (in the user area) so that one's not actually substituting anything when creating 1:1 mappings (and getting rid of them restores the "pristine" state). But this assumption only holds as long as swapper_pg_dir's use isn't changed. So a little creepy feeling remains.
A generic "reset_mmu_pg_dir", wouldn't that be a good thing to have ?
It is a fair point, and certainly worth discussing. I will give it some thought and get back to you.
The array of pointers is there to save pgdir on idle entry, one per-cpu.
If you're going through cpu_{do_}suspend/resume, the TTBRs are saved/restored anyway, what do you need to keep the virtual addresses around for ?
Because I switch mm before calling suspend, which is called with a cloned pgdir. I am not sure I can avoid that.
I'm currently transitioning between jobs; will re-subscribe to arm-kernel under a different email address soon, this one is likely to stop working in August. Sorry the inconvenience and high-latency responses till then :(
FrankH.
May I thank you very much for the review in the interim,
You're welcome. FrankH.
Thank you, Lorenzo
On Mon, 11 Jul 2011, Lorenzo Pieralisi wrote:
[ ... ]
The array of pointers is there to save pgdir on idle entry, one per-cpu.
If you're going through cpu_{do_}suspend/resume, the TTBRs are saved/restored anyway, what do you need to keep the virtual addresses around for ?
Because I switch mm before calling suspend, which is called with a cloned pgdir. I am not sure I can avoid that.
On resume, you'll be restoring the same thread as was previously running, right ? If so, all you do there is copying current->active_mm->pgd to some other place ?
Also, if you'd be using cpu_suspend(), would there still be a need for cpu_switch_mm() before ? It'd rather be a case of possibly calling that before the MMU-off sequence / cpu_resume() ?
Or is it that you use the new pgdir to make a memory region uncacheable ?
FrankH.
On Mon, Jul 11, 2011 at 05:57:29PM +0100, Frank Hofmann wrote:
On Mon, 11 Jul 2011, Lorenzo Pieralisi wrote:
[ ... ]
The array of pointers is there to save pgdir on idle entry, one per-cpu.
If you're going through cpu_{do_}suspend/resume, the TTBRs are saved/restored anyway, what do you need to keep the virtual addresses around for ?
Because I switch mm before calling suspend, which is called with a cloned pgdir. I am not sure I can avoid that.
On resume, you'll be restoring the same thread as was previously running, right ? If so, all you do there is copying current->active_mm->pgd to some other place ?
You are right, I will remove that code unless I can use the saved value on resume and I cannot rely on current to be retrieved at that point.
Also, if you'd be using cpu_suspend(), would there still be a need for cpu_switch_mm() before ? It'd rather be a case of possibly calling that before the MMU-off sequence / cpu_resume() ?
Or is it that you use the new pgdir to make a memory region uncacheable ?
Yes, that's the reason why I switch mm (eg uncacheable memory used to save registers through cpu_do_suspend and mapped in the cloned pgdir at boot + some control variables used when MMU is off).
Thanks, Lorenzo
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
However, lets recap. What we do in cpu_suspend() in order is:
- Save the ABI registers onto the stack, and some private state - Save the CPU specific state onto the stack - Flush the L1 cache - Call the platform specific suspend finisher
On resume, with the MMU and caches off:
- Platform defined entry point is called, which _may_ be cpu_resume directly. - Platform initial code is executed to do whatever that requires - cpu_resume will be called - cpu_resume loads the previously saved private state - The CPU specific state is restored - Page table is modified to permit 1:1 mapping for MMU enable - MMU is enabled with caches disabled - Page table modification is undone - Caches are enabled in the main control register - CPU exception mode stacks are reinitialized - CPU specific init function is called - ABI registers are popped off and 'cpu_suspend' function returns
So, as far as L2 goes, in the suspend finisher:
- If L2 state is lost, the finisher needs to clean dirty data from L2 to ensure that it is preserved in RAM. Note: There is no need to disable or even invalidate the L2 cache as we should not be writing any data in the finisher function which we later need after resume.
- If L2 state is not lost, the finisher needs to clean the saved state as a minimum, to sure that this is visible when the main control register C bit is clear. The easiest way to do that is to find the top of stack via current_thread_info() - we have a macro for that, and then add THREAD_SIZE to find the top of stack. 'sp' will be the current bottom of stack.
In the resume initial code:
- If L2 state was lost, the L2 configuration needs to be restored. This generally needs to happen before cpu_resume is called: - there are CPUs which need L2 setup before the MMU is enabled. - OMAP3 currently does this in its assembly, which is convenient to allow it to make the SMI calls to the secure world. The same will be true of any CPU running in non-secure mode.
- If L2 state was not lost, and the platform choses not to clean and invalidate the ABI registers from the stack, and the platform restores the L2 configuration before calling cpu_resume, then the ABI registers will be read out of L2 on return if that's where they are - at that point everything will be setup correctly.
This will give the greatest performance, which is important for CPU idle use of these code paths.
Now, considering SMP, there's an issue here: do we know at the point where one CPU goes down whether L2 state will be lost?
If the answer is that state will not be lost, we can do the minimum. If all L2 state is lost, we need to do as above. If we don't know the answer, then we have to assume that L2 state will be lost.
But wait - L2 cache (or more accurately, the outer cache) is common between CPUs in a SMP system. So, if we're _not_ the last CPU to go down, then we assume that L2 state will not be lost. It is the last CPUs responsibility to deal with L2 state when it goes into a PM mode that results in L2 state being lost.
Lastly, should generic code deal with flushing L2 and setting it back up on resume? A couple of points there:
1. Will generic code know whether L2 state will be lost, or should it assume that L2 state is always lost and do a full L2 flush. That seems wasteful if we have another CPU still running (which would also have to flush L2.)
2. L2 configuration registers are not accessible to CPUs operating in non-secure mode like OMAPs. Generic code on these CPUs has _no_ way to restore and re-enable the L2 cache. It needs to make implementation specific SMI calls to achieve that.
So, I believe the answer to that is no. However, I think we can still do a change to improve the situation:
1. Pass in r2 and r3 to the suspend finisher the bottom and top of the stack which needs to be cleaned from L2. This covers the saved state but not the ABI registers.
2. Mandate that L2 configuration is to be restored by platforms in their pre-cpu_resume code so L2 is available when the C bit is set.
On Mon, Jul 11, 2011 at 11:40 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
OMAP is very different, because it doesn't use cpu_suspend. It saves it's state to SAR ram, which is mapped uncached, which avoids L2 problems.
However, lets recap. What we do in cpu_suspend() in order is:
- Save the ABI registers onto the stack, and some private state
- Save the CPU specific state onto the stack
- Flush the L1 cache
- Call the platform specific suspend finisher
On resume, with the MMU and caches off:
- Platform defined entry point is called, which _may_ be cpu_resume
directly.
- Platform initial code is executed to do whatever that requires
- cpu_resume will be called
- cpu_resume loads the previously saved private state
- The CPU specific state is restored
- Page table is modified to permit 1:1 mapping for MMU enable
- MMU is enabled with caches disabled
- Page table modification is undone
- Caches are enabled in the main control register
- CPU exception mode stacks are reinitialized
- CPU specific init function is called
- ABI registers are popped off and 'cpu_suspend' function returns
So, as far as L2 goes, in the suspend finisher:
- If L2 state is lost, the finisher needs to clean dirty data from L2
to ensure that it is preserved in RAM. Note: There is no need to disable or even invalidate the L2 cache as we should not be writing any data in the finisher function which we later need after resume.
- If L2 state is not lost, the finisher needs to clean the saved state
as a minimum, to sure that this is visible when the main control register C bit is clear. The easiest way to do that is to find the top of stack via current_thread_info() - we have a macro for that, and then add THREAD_SIZE to find the top of stack. 'sp' will be the current bottom of stack.
The sleep_save_sp location also needs to be cleaned.
In the resume initial code:
- If L2 state was lost, the L2 configuration needs to be restored.
This generally needs to happen before cpu_resume is called: - there are CPUs which need L2 setup before the MMU is enabled. - OMAP3 currently does this in its assembly, which is convenient to allow it to make the SMI calls to the secure world. The same will be true of any CPU running in non-secure mode.
- If L2 state was not lost, and the platform choses not to clean and
invalidate the ABI registers from the stack, and the platform restores the L2 configuration before calling cpu_resume, then the ABI registers will be read out of L2 on return if that's where they are - at that point everything will be setup correctly.
This will give the greatest performance, which is important for CPU idle use of these code paths.
Now, considering SMP, there's an issue here: do we know at the point where one CPU goes down whether L2 state will be lost?
CPU idle will need a voting mechanism between the two CPUs, and the second CPU to go down needs to determine the minimum state supported by both CPUs. Any time a CPU goes down, we should know for sure that either the L2 will not be lost, or the L2 might be lost if the other CPU goes idle.
In practice, I don't think many SoCs will support low power modes that lose the L2 during idle, only during suspend. Tegra and OMAP4 both don't support the modes where the L2 is powered down in idle.
If the answer is that state will not be lost, we can do the minimum. If all L2 state is lost, we need to do as above. If we don't know the answer, then we have to assume that L2 state will be lost.
But wait - L2 cache (or more accurately, the outer cache) is common between CPUs in a SMP system. So, if we're _not_ the last CPU to go down, then we assume that L2 state will not be lost. It is the last CPUs responsibility to deal with L2 state when it goes into a PM mode that results in L2 state being lost.
Lastly, should generic code deal with flushing L2 and setting it back up on resume? A couple of points there:
- Will generic code know whether L2 state will be lost, or should it
assume that L2 state is always lost and do a full L2 flush. That seems wasteful if we have another CPU still running (which would also have to flush L2.)
- L2 configuration registers are not accessible to CPUs operating in
non-secure mode like OMAPs. Generic code on these CPUs has _no_ way to restore and re-enable the L2 cache. It needs to make implementation specific SMI calls to achieve that.
So, I believe the answer to that is no. However, I think we can still do a change to improve the situation:
- Pass in r2 and r3 to the suspend finisher the bottom and top of the
stack which needs to be cleaned from L2. This covers the saved state but not the ABI registers.
- Mandate that L2 configuration is to be restored by platforms in their
pre-cpu_resume code so L2 is available when the C bit is set.
On Mon, Jul 11, 2011 at 11:51:00AM -0700, Colin Cross wrote:
On Mon, Jul 11, 2011 at 11:40 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
OMAP is very different, because it doesn't use cpu_suspend. It saves it's state to SAR ram, which is mapped uncached, which avoids L2 problems.
I'm afraid your information is out of date. See:
http://ftp.arm.linux.org.uk/git/?p=linux-2.6-arm.git%3Ba=commitdiff%3Bh=076f...
arch/arm/mach-omap2/pm34xx.c | 47 +++---------- arch/arm/mach-omap2/sleep34xx.S | 143 +-------------------------------------- 2 files changed, 13 insertions(+), 177 deletions(-)
for the trivial conversion, and:
http://ftp.arm.linux.org.uk/git/?p=linux-2.6-arm.git%3Ba=commit%3Bh=46e130d2...
arch/arm/mach-omap2/pm.h | 20 ++- arch/arm/mach-omap2/pm34xx.c | 20 ++-- arch/arm/mach-omap2/sleep34xx.S | 303 ++++++++++++++++++++++---------------- arch/arm/plat-omap/sram.c | 15 +-- 4 files changed, 206 insertions(+), 152 deletions(-)
for the cleanup of the SRAM code, most of which was found not to need being in SRAM. That's all tested as working (see the tested-by's) on a range of OMAP3 platforms.
The whole series is at:
http://ftp.arm.linux.org.uk/git/?p=linux-2.6-arm.git%3Ba=shortlog%3Bh=refs/h...
There is only one currently merged SoC which hasn't been converted: shmobile. That's in progress, and there may already be patches for it. There's no blockers on that as far as I know other than availability of time.
The sleep_save_sp location also needs to be cleaned.
Right.
In the resume initial code:
- If L2 state was lost, the L2 configuration needs to be restored.
This generally needs to happen before cpu_resume is called: - there are CPUs which need L2 setup before the MMU is enabled. - OMAP3 currently does this in its assembly, which is convenient to allow it to make the SMI calls to the secure world. The same will be true of any CPU running in non-secure mode.
- If L2 state was not lost, and the platform choses not to clean and
invalidate the ABI registers from the stack, and the platform restores the L2 configuration before calling cpu_resume, then the ABI registers will be read out of L2 on return if that's where they are - at that point everything will be setup correctly.
This will give the greatest performance, which is important for CPU idle use of these code paths.
Now, considering SMP, there's an issue here: do we know at the point where one CPU goes down whether L2 state will be lost?
CPU idle will need a voting mechanism between the two CPUs, and the second CPU to go down needs to determine the minimum state supported by both CPUs. Any time a CPU goes down, we should know for sure that either the L2 will not be lost, or the L2 might be lost if the other CPU goes idle.
No it doesn't. As I've said: L2 is shared between the two CPUs.
If L2 state is preserved, then the only thing that needs cleaning from L2 is the state which must be accessible to bring the system back up, which is the CPU specific state, the cpu_suspend state and the sleep_save_sp store.
If L2 state is lost, the _first_ CPU to go down in a two-CPU system must do as above. If nothing else happens L2 state will be preserved. So there's absolutely no point what so ever the first CPU cleaning the entire L2 state - the second CPU will still be scribbling into L2 at that point.
However, when the second CPU goes down, that's when the L2 state would need to be preserved if the L2.
So, let's recap. In a two CPU system, when precisely one CPU goes into sleep mode, only the minimum L2 state needs cleaning. Nothing else. Only when the second CPU goes down does it matter whether L2 state needs to be preserved by cleaning or not because that is when L2 data loss may occur.
In practice, I don't think many SoCs will support low power modes that lose the L2 during idle, only during suspend. Tegra and OMAP4 both don't support the modes where the L2 is powered down in idle.
That's good, that means less to worry about. For idle, all we need to care about is the CPU specific state, the cpu_suspend state and the sleep_save_sp store. Great, that means things can be even simpler - cpu_suspend() just needs to know whether we're calling it because of idle, or whether we're calling it because of system suspend.
On Mon, Jul 11, 2011 at 12:19 PM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Jul 11, 2011 at 11:51:00AM -0700, Colin Cross wrote:
On Mon, Jul 11, 2011 at 11:40 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
OMAP is very different, because it doesn't use cpu_suspend. It saves it's state to SAR ram, which is mapped uncached, which avoids L2 problems.
I'm afraid your information is out of date. See:
Oops, you're right - I'm working with a TI branch that has OMAP4 cpuidle support, which has not been converted.
http://ftp.arm.linux.org.uk/git/?p=linux-2.6-arm.git%3Ba=commitdiff%3Bh=076f...
arch/arm/mach-omap2/pm34xx.c | 47 +++---------- arch/arm/mach-omap2/sleep34xx.S | 143 +-------------------------------------- 2 files changed, 13 insertions(+), 177 deletions(-)
for the trivial conversion, and:
http://ftp.arm.linux.org.uk/git/?p=linux-2.6-arm.git%3Ba=commit%3Bh=46e130d2...
arch/arm/mach-omap2/pm.h | 20 ++- arch/arm/mach-omap2/pm34xx.c | 20 ++-- arch/arm/mach-omap2/sleep34xx.S | 303 ++++++++++++++++++++++---------------- arch/arm/plat-omap/sram.c | 15 +-- 4 files changed, 206 insertions(+), 152 deletions(-)
for the cleanup of the SRAM code, most of which was found not to need being in SRAM. That's all tested as working (see the tested-by's) on a range of OMAP3 platforms.
The whole series is at:
http://ftp.arm.linux.org.uk/git/?p=linux-2.6-arm.git%3Ba=shortlog%3Bh=refs/h...
There is only one currently merged SoC which hasn't been converted: shmobile. That's in progress, and there may already be patches for it. There's no blockers on that as far as I know other than availability of time.
The sleep_save_sp location also needs to be cleaned.
Right.
In the resume initial code:
- If L2 state was lost, the L2 configuration needs to be restored.
This generally needs to happen before cpu_resume is called: - there are CPUs which need L2 setup before the MMU is enabled. - OMAP3 currently does this in its assembly, which is convenient to allow it to make the SMI calls to the secure world. The same will be true of any CPU running in non-secure mode.
- If L2 state was not lost, and the platform choses not to clean and
invalidate the ABI registers from the stack, and the platform restores the L2 configuration before calling cpu_resume, then the ABI registers will be read out of L2 on return if that's where they are - at that point everything will be setup correctly.
This will give the greatest performance, which is important for CPU idle use of these code paths.
Now, considering SMP, there's an issue here: do we know at the point where one CPU goes down whether L2 state will be lost?
CPU idle will need a voting mechanism between the two CPUs, and the second CPU to go down needs to determine the minimum state supported by both CPUs. Any time a CPU goes down, we should know for sure that either the L2 will not be lost, or the L2 might be lost if the other CPU goes idle.
No it doesn't. As I've said: L2 is shared between the two CPUs.
If L2 state is preserved, then the only thing that needs cleaning from L2 is the state which must be accessible to bring the system back up, which is the CPU specific state, the cpu_suspend state and the sleep_save_sp store.
If L2 state is lost, the _first_ CPU to go down in a two-CPU system must do as above. If nothing else happens L2 state will be preserved. So there's absolutely no point what so ever the first CPU cleaning the entire L2 state - the second CPU will still be scribbling into L2 at that point.
However, when the second CPU goes down, that's when the L2 state would need to be preserved if the L2.
So, let's recap. In a two CPU system, when precisely one CPU goes into sleep mode, only the minimum L2 state needs cleaning. Nothing else. Only when the second CPU goes down does it matter whether L2 state needs to be preserved by cleaning or not because that is when L2 data loss may occur.
Sorry, I was confusing cleaning state from L2 with cleaning the whole L2. The first cpu always needs to clean its saved state from L2, the second cpu might need to clean the whole L2.
In practice, I don't think many SoCs will support low power modes that lose the L2 during idle, only during suspend. Tegra and OMAP4 both don't support the modes where the L2 is powered down in idle.
That's good, that means less to worry about. For idle, all we need to care about is the CPU specific state, the cpu_suspend state and the sleep_save_sp store. Great, that means things can be even simpler - cpu_suspend() just needs to know whether we're calling it because of idle, or whether we're calling it because of system suspend.
On 7/11/2011 12:19 PM, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 11:51:00AM -0700, Colin Cross wrote:
On Mon, Jul 11, 2011 at 11:40 AM, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
OMAP is very different, because it doesn't use cpu_suspend. It saves it's state to SAR ram, which is mapped uncached, which avoids L2 problems.
I'm afraid your information is out of date. See:
I think the confusion is OMAP3 and OMAP4. Colin was talking about OMAP4 which isn't merged in mainline yet where as you were referring OMAP3 clean-ups happened recently.
Regards Santosh
(Just to add few more points on top of what Colin already commented)
On 7/11/2011 11:40 AM, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
This part is not done yet Russell. Infact the cpu_resume() function need an update to work with L2 enable configuration.
However, lets recap. What we do in cpu_suspend() in order is:
- Save the ABI registers onto the stack, and some private state
- Save the CPU specific state onto the stack
We need to disable C bit here to avoid any speculative artifacts during further operations before WFI.
- Flush the L1 cache
- Call the platform specific suspend finisher
Also finisher function should issue the WFI and just in case for some reason CPU doesn't hit the targeted low power state, finisher function takes care of things like enabling C bit, SMP bit etc.
On resume, with the MMU and caches off:
- Platform defined entry point is called, which _may_ be cpu_resume directly.
- Platform initial code is executed to do whatever that requires
- cpu_resume will be called
- cpu_resume loads the previously saved private state
- The CPU specific state is restored
- Page table is modified to permit 1:1 mapping for MMU enable
1:1 idmap used here should be mapped as non-cached to avoid L2 related issues. I faced similar issue in OMAP sleep code earlier and later thanks to Colin, we got the fixed my making use of non-cached idmap.
- MMU is enabled with caches disabled
- Page table modification is undone
- Caches are enabled in the main control register
- CPU exception mode stacks are reinitialized
- CPU specific init function is called
- ABI registers are popped off and 'cpu_suspend' function returns
So, as far as L2 goes, in the suspend finisher:
If L2 state is lost, the finisher needs to clean dirty data from L2 to ensure that it is preserved in RAM. Note: There is no need to disable or even invalidate the L2 cache as we should not be writing any data in the finisher function which we later need after resume.
If L2 state is not lost, the finisher needs to clean the saved state as a minimum, to sure that this is visible when the main control register C bit is clear. The easiest way to do that is to find the top of stack via current_thread_info() - we have a macro for that, and then add THREAD_SIZE to find the top of stack. 'sp' will be the current bottom of stack.
In the resume initial code:
- If L2 state was lost, the L2 configuration needs to be restored. This generally needs to happen before cpu_resume is called:
- there are CPUs which need L2 setup before the MMU is enabled.
- OMAP3 currently does this in its assembly, which is convenient to allow it to make the SMI calls to the secure world. The same will be true of any CPU running in non-secure mode.
This is indeed good approach. It does help to handle the platform specific requirements like trustzone, secure restore/overwrite etc.
If L2 state was not lost, and the platform choses not to clean and invalidate the ABI registers from the stack, and the platform restores the L2 configuration before calling cpu_resume, then the ABI registers will be read out of L2 on return if that's where they are - at that point everything will be setup correctly.
This will give the greatest performance, which is important for CPU idle use of these code paths.
Now, considering SMP, there's an issue here: do we know at the point where one CPU goes down whether L2 state will be lost?
If the answer is that state will not be lost, we can do the minimum. If all L2 state is lost, we need to do as above. If we don't know the answer, then we have to assume that L2 state will be lost.
But wait - L2 cache (or more accurately, the outer cache) is common between CPUs in a SMP system. So, if we're _not_ the last CPU to go down, then we assume that L2 state will not be lost. It is the last CPUs responsibility to deal with L2 state when it goes into a PM mode that results in L2 state being lost.
This is exactly the point. Unless and until CPU cluster is going down, L2 cache won't be lost. At least that's how OMAP is implemented.
Lastly, should generic code deal with flushing L2 and setting it back up on resume? A couple of points there:
Will generic code know whether L2 state will be lost, or should it assume that L2 state is always lost and do a full L2 flush. That seems wasteful if we have another CPU still running (which would also have to flush L2.)
L2 configuration registers are not accessible to CPUs operating in non-secure mode like OMAPs. Generic code on these CPUs has _no_ way to restore and re-enable the L2 cache. It needs to make implementation specific SMI calls to achieve that.
So, I believe the answer to that is no. However, I think we can still do a change to improve the situation:
Pass in r2 and r3 to the suspend finisher the bottom and top of the stack which needs to be cleaned from L2. This covers the saved state but not the ABI registers.
Mandate that L2 configuration is to be restored by platforms in their pre-cpu_resume code so L2 is available when the C bit is set.
This seems a workable approach to me.
Thanks Russell for describing this clearly.
Regards Santosh
On Mon, Jul 11, 2011 at 01:05:20PM -0700, Santosh Shilimkar wrote:
(Just to add few more points on top of what Colin already commented)
On 7/11/2011 11:40 AM, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
This part is not done yet Russell. Infact the cpu_resume() function need an update to work with L2 enable configuration.
The code does deal with L2 cache enable in the resume path...
However, lets recap. What we do in cpu_suspend() in order is:
- Save the ABI registers onto the stack, and some private state
- Save the CPU specific state onto the stack
We need to disable C bit here to avoid any speculative artifacts during further operations before WFI.
Which you are doing.
- Flush the L1 cache
- Call the platform specific suspend finisher
Also finisher function should issue the WFI and just in case for some reason CPU doesn't hit the targeted low power state, finisher function takes care of things like enabling C bit, SMP bit etc.
You're restoring the C bit in the non-off paths already which follow a failed WFI. You're not touching the SMP bit there though - was it ever reset? I don't think so.
On resume, with the MMU and caches off:
- Platform defined entry point is called, which _may_ be cpu_resume directly.
- Platform initial code is executed to do whatever that requires
- cpu_resume will be called
- cpu_resume loads the previously saved private state
- The CPU specific state is restored
- Page table is modified to permit 1:1 mapping for MMU enable
1:1 idmap used here should be mapped as non-cached to avoid L2 related issues. I faced similar issue in OMAP sleep code earlier and later thanks to Colin, we got the fixed my making use of non-cached idmap.
It is specified that if the main control register C bit is clear, accesses will be uncached.
Whether you get cache hits or not is probably implementation dependent, and provided that the state information is cleaned from the L2 cache, it doesn't matter if we hit the L2 cached copy or the RAM copy. It's the same data.
On 7/11/2011 1:14 PM, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 01:05:20PM -0700, Santosh Shilimkar wrote:
(Just to add few more points on top of what Colin already commented)
On 7/11/2011 11:40 AM, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
This part is not done yet Russell. Infact the cpu_resume() function need an update to work with L2 enable configuration.
The code does deal with L2 cache enable in the resume path...
However, lets recap. What we do in cpu_suspend() in order is:
- Save the ABI registers onto the stack, and some private state
- Save the CPU specific state onto the stack
We need to disable C bit here to avoid any speculative artifacts during further operations before WFI.
Which you are doing.
- Flush the L1 cache
- Call the platform specific suspend finisher
Also finisher function should issue the WFI and just in case for some reason CPU doesn't hit the targeted low power state, finisher function takes care of things like enabling C bit, SMP bit etc.
You're restoring the C bit in the non-off paths already which follow a failed WFI. You're not touching the SMP bit there though - was it ever reset? I don't think so.
Probably it's not in the code what you have seen but it's being used in the code base. One tricky issue there is SMP bit access is protected on OMAP4430 GP silicon where as it is opened up on OMAP4460. We handle that using NSACR register read and that's what I pointed to Lorenzo.
On resume, with the MMU and caches off:
- Platform defined entry point is called, which _may_ be cpu_resume directly.
- Platform initial code is executed to do whatever that requires
- cpu_resume will be called
- cpu_resume loads the previously saved private state
- The CPU specific state is restored
- Page table is modified to permit 1:1 mapping for MMU enable
1:1 idmap used here should be mapped as non-cached to avoid L2 related issues. I faced similar issue in OMAP sleep code earlier and later thanks to Colin, we got the fixed my making use of non-cached idmap.
It is specified that if the main control register C bit is clear, accesses will be uncached.
Does it apply for page table walks as well because that's the case which fails.
Whether you get cache hits or not is probably implementation dependent, and provided that the state information is cleaned from the L2 cache, it doesn't matter if we hit the L2 cached copy or the RAM copy. It's the same data.
I am not sure. Because restored TTRB0 still tells processor that the entry is cached and then CPU instead of reading the entry from main memory(written before MMU OFF) it reads a stale entry from L2 which is the problem.
Regards Santosh
Thank you very much Russell for this recap.
On Mon, Jul 11, 2011 at 07:40:10PM +0100, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
OMAP4, it is SMP configs I am talking about.
However, lets recap. What we do in cpu_suspend() in order is:
- Save the ABI registers onto the stack, and some private state
- Save the CPU specific state onto the stack
As Santosh said, L1 should be cleaned with C bit cleared. We are still in coherency and if L1 D$ keeps allocating we might run into issues here when for instance a single CPU is going down. It is the stack, as usual. The finisher should be written in assembly (I think that's the case) and should not use the stack (eg thread_info). If I am not mistaken thread_info might be written by other CPUs and DDI might pull in it as a dirty line. We must avoid having dirty lines in L1 D$ when we pull the power.
On top of that, if the C bit is cleared I think we need to clean the L2 cache in assembly in the finisher, and avoid using spinlocks because this does not work when C bit is cleared. This means that this code will become racy by definition or I am missing something.
- Flush the L1 cache
- Call the platform specific suspend finisher
On resume, with the MMU and caches off:
- Platform defined entry point is called, which _may_ be cpu_resume directly.
- Platform initial code is executed to do whatever that requires
- cpu_resume will be called
- cpu_resume loads the previously saved private state
- The CPU specific state is restored
- Page table is modified to permit 1:1 mapping for MMU enable
- MMU is enabled with caches disabled
- Page table modification is undone
- Caches are enabled in the main control register
- CPU exception mode stacks are reinitialized
- CPU specific init function is called
- ABI registers are popped off and 'cpu_suspend' function returns
So, as far as L2 goes, in the suspend finisher:
- If L2 state is lost, the finisher needs to clean dirty data from L2 to ensure that it is preserved in RAM. Note: There is no need to disable or even invalidate the L2 cache as we should not be writing any data in the finisher function which we later need after resume.
I agree that "not writing" should be sufficient but want to raise a point anyway. If L2 is shutdown or put in L2 RAM retention on idle (I read the reply from Colin about idle support for L2 shutdown but I think we have to cater for it anyway) it has to be disabled before issuing wfi (we have to be in control of L2 to make sure it is not fetching lines behind our back). It is about avoiding transactions on the AXI bus when the power is yanked from L2. Also L2 prefetch bits should be cleared, I am checking with HW guys how and when this might create issues.
- If L2 state is not lost, the finisher needs to clean the saved state as a minimum, to sure that this is visible when the main control register C bit is clear. The easiest way to do that is to find the top of stack via current_thread_info() - we have a macro for that, and then add THREAD_SIZE to find the top of stack. 'sp' will be the current bottom of stack.
Spot-on.
In the resume initial code:
If L2 state was lost, the L2 configuration needs to be restored. This generally needs to happen before cpu_resume is called:
- there are CPUs which need L2 setup before the MMU is enabled.
- OMAP3 currently does this in its assembly, which is convenient to allow it to make the SMI calls to the secure world. The same will be true of any CPU running in non-secure mode.
If L2 state was not lost, and the platform choses not to clean and invalidate the ABI registers from the stack, and the platform restores the L2 configuration before calling cpu_resume, then the ABI registers will be read out of L2 on return if that's where they are - at that point everything will be setup correctly.
This will give the greatest performance, which is important for CPU idle use of these code paths.
Now, considering SMP, there's an issue here: do we know at the point where one CPU goes down whether L2 state will be lost?
That is what I am tracking in the patch, meaning that we have to know when the executing CPU is the last running. If we mandate control within CPU idle to control that for all platforms I think we are all set.
If the answer is that state will not be lost, we can do the minimum. If all L2 state is lost, we need to do as above. If we don't know the answer, then we have to assume that L2 state will be lost.
I know I am a pain in the neck, but please consider L2 RAM retention in this picture where logic is lost but RAM is retained, so it should not be cleaned.
But wait - L2 cache (or more accurately, the outer cache) is common between CPUs in a SMP system. So, if we're _not_ the last CPU to go down, then we assume that L2 state will not be lost. It is the last CPUs responsibility to deal with L2 state when it goes into a PM mode that results in L2 state being lost.
That's correct.
Lastly, should generic code deal with flushing L2 and setting it back up on resume? A couple of points there:
- Will generic code know whether L2 state will be lost, or should it assume that L2 state is always lost and do a full L2 flush. That seems wasteful if we have another CPU still running (which would also have to flush L2.)
on SMP, single CPU shutdown, as you stated, only the minimum should be cleaned from L2. The same goes for system shutdown (all CPUs down) L2 RAM retention.
- L2 configuration registers are not accessible to CPUs operating in non-secure mode like OMAPs. Generic code on these CPUs has _no_ way to restore and re-enable the L2 cache. It needs to make implementation specific SMI calls to achieve that.
So, I believe the answer to that is no. However, I think we can still do a change to improve the situation:
Pass in r2 and r3 to the suspend finisher the bottom and top of the stack which needs to be cleaned from L2. This covers the saved state but not the ABI registers.
Mandate that L2 configuration is to be restored by platforms in their pre-cpu_resume code so L2 is available when the C bit is set.
On top of that, if we can also define and mandate a warm-boot protocol (eg CPU0 is always the one coming up from complete system shutdown) and the other(s) are put into wfi or a platform specific procedure that would be grand.
Lorenzo
On Tue, Jul 12, 2011 at 11:12:57AM +0100, Lorenzo Pieralisi wrote:
Thank you very much Russell for this recap.
On Mon, Jul 11, 2011 at 07:40:10PM +0100, Russell King - ARM Linux wrote:
On Mon, Jul 11, 2011 at 03:00:47PM +0100, Lorenzo Pieralisi wrote:
Well, short answer is no. On SMP we do need to save CPU registers but if just one single cpu is shutdown L2 is still on. cpu_suspend saves regs on the stack that has to be cleaned from L2 before shutting a CPU down which make things more complicated than they should.
Hang on. Please explain something to me here. You've mentioned a few times that cpu_suspend() can't be used because of the L2 cache. Why is this the case?
OMAP appears to have code in its sleep path - which has been converted to cpu_suspend() support - to deal with the L2 issues.
OMAP4, it is SMP configs I am talking about.
Seriously, that's not something which really concerns me at present because suspend/resume in any form is not supported there in any form in mainline.
All my comments are based on the mainline kernel. That's what I work with. Everything elsewhere is not my concern.
So, let me say again. OMAP suspend/resume support _in_ _mainline_ will be converted at the next merge window. As that's only OMAP3 _in_ _mainline_ which has the need for saving state etc, that's the only OMAP code I have access to, and therefore that's the only thing which I've been able to fix.
OMAP4 suspend/resume support doesn't exist in mainline and therefore doesn't exist for me.
On Thu, Jul 07, 2011 at 04:50:18PM +0100, Lorenzo Pieralisi wrote:
+static int late_init(void) +{
- int rc;
- struct sr_cluster *cluster;
- int cluster_index, cpu_index = sr_platform_get_cpu_index();
Stop this madness, and use the standard linux APIs like smp_processor_id here. It might actually help you find bugs, like the fact that you're in a preemptible context here, and so could be rescheduled onto any other CPU in the system _after_ you've read the MPIDR register.
- cluster_index = sr_platform_get_cluster_index();
- cluster = main_table.cluster_table + cluster_index;
- main_table.os_mmu_context[cluster_index][cpu_index] =
current->active_mm->pgd;
- cpu_switch_mm(main_table.fw_mmu_context, current->active_mm);
- rc = sr_platform_init();
- cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index],
current->active_mm);
CPU numbers are unique in the system, why do you need a 'cluster_index' to save this? In fact why do you even need to save it in a structure at all?
Plus, "cluster" is not used, please get rid of it.
Plus, here you just switch the page tables without a MMU flush. Further down in this file you call cpu_switch_mm() but also flush the TLB. Why that difference?
If this is the state of just the first bit of this I've looked at, plus the comments from Frank on your use of internal functions like cpu_do_suspend and cpu_do_resume, I don't want to look at it any further. Can you please clean it up as best you can first.
On Sat, Jul 09, 2011 at 09:38:15AM +0100, Russell King - ARM Linux wrote:
On Thu, Jul 07, 2011 at 04:50:18PM +0100, Lorenzo Pieralisi wrote:
+static int late_init(void) +{
- int rc;
- struct sr_cluster *cluster;
- int cluster_index, cpu_index = sr_platform_get_cpu_index();
Stop this madness, and use the standard linux APIs like smp_processor_id here. It might actually help you find bugs, like the fact that you're in a preemptible context here, and so could be rescheduled onto any other CPU in the system _after_ you've read the MPIDR register.
- cluster_index = sr_platform_get_cluster_index();
- cluster = main_table.cluster_table + cluster_index;
- main_table.os_mmu_context[cluster_index][cpu_index] =
current->active_mm->pgd;
- cpu_switch_mm(main_table.fw_mmu_context, current->active_mm);
- rc = sr_platform_init();
- cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index],
current->active_mm);
CPU numbers are unique in the system, why do you need a 'cluster_index' to save this? In fact why do you even need to save it in a structure at all?
Plus, "cluster" is not used, please get rid of it.
Oh, that's another thing. This thread is about introducing idle support not cluster support. Cluster support is surely something different (esp. as it seems from the above code that you're trying to support several clusters of MPCore CPUs, each with physical 0-N CPU numbers.)
Cluster support should be an entirely separate patch series.
If that is not what this cluster stuff in this patch is about, then it's just written badly and reinforces the need for it to be rewritten - in that case there's no need for a 2D array.
In any case, smp_processor_id() will be (and must be) unique in any given running kernel across all CPUs, even if you have clusters of N CPUs all physically numbered 0-N.
On Sat, Jul 09, 2011 at 09:45:08AM +0100, Russell King - ARM Linux wrote:
On Sat, Jul 09, 2011 at 09:38:15AM +0100, Russell King - ARM Linux wrote:
On Thu, Jul 07, 2011 at 04:50:18PM +0100, Lorenzo Pieralisi wrote:
+static int late_init(void) +{
- int rc;
- struct sr_cluster *cluster;
- int cluster_index, cpu_index = sr_platform_get_cpu_index();
Stop this madness, and use the standard linux APIs like smp_processor_id here. It might actually help you find bugs, like the fact that you're in a preemptible context here, and so could be rescheduled onto any other CPU in the system _after_ you've read the MPIDR register.
You are right, wanted to make all code uniform and rushed in a change which is bound to create issues.
- cluster_index = sr_platform_get_cluster_index();
- cluster = main_table.cluster_table + cluster_index;
- main_table.os_mmu_context[cluster_index][cpu_index] =
current->active_mm->pgd;
- cpu_switch_mm(main_table.fw_mmu_context, current->active_mm);
- rc = sr_platform_init();
- cpu_switch_mm(main_table.os_mmu_context[cluster_index][cpu_index],
current->active_mm);
CPU numbers are unique in the system, why do you need a 'cluster_index' to save this? In fact why do you even need to save it in a structure at all?
Plus, "cluster" is not used, please get rid of it.
Oh, that's another thing. This thread is about introducing idle support not cluster support. Cluster support is surely something different (esp. as it seems from the above code that you're trying to support several clusters of MPCore CPUs, each with physical 0-N CPU numbers.)
Cluster support should be an entirely separate patch series.
Yes it is cluster support, within idle though, that has to be said; I will introduce it in a different patch as suggested and this patch will have to rely on it.
[...]
In any case, smp_processor_id() will be (and must be) unique in any given running kernel across all CPUs, even if you have clusters of N CPUs all physically numbered 0-N.
Indeed.
This patch provides the code infrastructure needed to maintain a generic per-cpu architecture implementation of idle code.
sr_platform.c : - code manages patchset initialization and memory management
sr_context.c: - code initializes run-time context save/restore generic support
sr_power.c: - provides the generic infrastructure to enter exit low power modes and communicate with Power Control Unit (PCU)
v7 support hinges on the basic infrastructure to provide per-cpu arch implementation basically through standard function pointers signatures.
Preprocessor defines include size of data needed to save/restore L2 state. This define value should be moved to the respective subsystem (PL310) once the patchset IF to that subsystem is settled.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/sr.h | 162 +++++++++++++++++++++++++++++++++++++++++ arch/arm/kernel/sr_context.c | 23 ++++++ arch/arm/kernel/sr_helpers.h | 56 ++++++++++++++ arch/arm/kernel/sr_platform.c | 48 ++++++++++++ arch/arm/kernel/sr_power.c | 26 +++++++ 5 files changed, 315 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr.h create mode 100644 arch/arm/kernel/sr_context.c create mode 100644 arch/arm/kernel/sr_helpers.h create mode 100644 arch/arm/kernel/sr_platform.c create mode 100644 arch/arm/kernel/sr_power.c
diff --git a/arch/arm/kernel/sr.h b/arch/arm/kernel/sr.h new file mode 100644 index 0000000..6b24e53 --- /dev/null +++ b/arch/arm/kernel/sr.h @@ -0,0 +1,162 @@ +#define SR_NR_CLUSTERS 1 + +#define STACK_SIZE 512 + +#define CPU_A5 0x4100c050 +#define CPU_A8 0x4100c080 +#define CPU_A9 0x410fc090 +#define L2_DATA_SIZE 16 +#define CONTEXT_SPACE_UNCACHED (2 * PAGE_SIZE) +#define PA(f) ((typeof(f) *)virt_to_phys((void *)f)) + +#ifndef __ASSEMBLY__ + +#include <linux/types.h> +#include <linux/threads.h> +#include <linux/cpumask.h> +#include <asm/page.h> + +/* + * Structures we hide from the OS API + */ + +struct sr_cpu_context { + u32 flags; + u32 saved_items; + u32 *mmu_data; +}; + +struct sr_cluster_context { + u32 saved_items; + u32 *l2_data; +}; + +struct sr_main_table { + pgd_t *os_mmu_context[SR_NR_CLUSTERS][CONFIG_NR_CPUS]; + cpumask_t cpu_idle_mask[SR_NR_CLUSTERS]; + pgd_t *fw_mmu_context; + u32 num_clusters; + struct sr_cluster *cluster_table; +}; + + +/* + * A cluster is a container for CPUs, typically either a single CPU or a + * coherent cluster. + * We assume the CPUs in the cluster can be switched off independently. + */ +struct sr_cluster { + u32 cpu_type; /* A9mpcore, A5mpcore, etc */ + u32 num_cpus; + struct sr_cluster_context *context; + struct sr_cpu *cpu_table; + u32 power_state; + u32 cluster_down; + void __iomem *scu_address; + void *lock; +}; + +struct sr_cpu { + struct sr_cpu_context *context; + u32 power_state; +}; + +/* + * arch infrastructure + */ +struct sr_arch { + unsigned int cpu_val; + unsigned int cpu_mask; + + int (*init)(void); + + int (*save_context)(struct sr_cluster *, struct sr_cpu *, + unsigned); + int (*restore_context)(struct sr_cluster *, struct sr_cpu *); + int (*enter_cstate)(unsigned cpu_index, + struct sr_cpu *cpu, + struct sr_cluster *cluster); + int (*leave_cstate)(unsigned, struct sr_cpu *, + struct sr_cluster *); + void (*reset)(void); + +}; + +extern struct sr_arch *arch; +extern int lookup_arch(void); + +/* + * Global variables + */ +extern struct sr_main_table main_table; +extern unsigned long idle_save_context; +extern unsigned long idle_restore_context; +extern unsigned long idle_mt; +extern void *context_memory_uncached; + +/* + * Context save/restore + */ +typedef u32 (sr_save_context_t) + (struct sr_cluster *, + struct sr_cpu*, u32); +typedef u32 (sr_restore_context_t) + (struct sr_cluster *, + struct sr_cpu*); + +extern sr_save_context_t sr_save_context; +extern sr_restore_context_t sr_restore_context; + + +extern struct sr_arch *get_arch(void); + + +/* + * 1:1 mappings + */ + +extern int linux_sr_setup_translation_tables(void); + +/* + * dumb memory allocator + */ + +extern void *get_memory(unsigned int size); + +/* + * Entering/Leaving C-states function entries + */ + +extern int sr_platform_enter_cstate(unsigned cpu_index, struct sr_cpu *cpu, + struct sr_cluster *cluster); +extern int sr_platform_leave_cstate(unsigned cpu_index, struct sr_cpu *cpu, + struct sr_cluster *cluster); + +/* save/restore main table */ +extern struct sr_main_table main_table; + +/* + * Init functions + */ + +extern int sr_platform_runtime_init(void); +extern int sr_platform_init(void); +extern int sr_context_init(void); + + +/* + * v7 specific + */ + +extern char *cpu_v7_suspend_size; +extern void scu_cpu_mode(void __iomem *base, int state); + +/* + * These arrays keep suitable stack pointers for CPUs. + * + * The memory must be 8-byte aligned. + */ + +extern unsigned long platform_cpu_stacks[CONFIG_NR_CPUS]; +extern unsigned long platform_cpu_nc_stacks[CONFIG_NR_CPUS]; +#endif diff --git a/arch/arm/kernel/sr_context.c b/arch/arm/kernel/sr_context.c new file mode 100644 index 0000000..25eaa43 --- /dev/null +++ b/arch/arm/kernel/sr_context.c @@ -0,0 +1,23 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/cache.h> +#include <asm/cacheflush.h> +#include "sr.h" + +int sr_context_init(void) +{ + idle_save_context = (unsigned long) arch->save_context; + idle_restore_context = __pa(arch->restore_context); + __cpuc_flush_dcache_area(&idle_restore_context, sizeof(unsigned long)); + outer_clean_range(__pa(&idle_restore_context), + __pa(&idle_restore_context + 1)); + return 0; +} diff --git a/arch/arm/kernel/sr_helpers.h b/arch/arm/kernel/sr_helpers.h new file mode 100644 index 0000000..1ae3a9a --- /dev/null +++ b/arch/arm/kernel/sr_helpers.h @@ -0,0 +1,56 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +static inline int sr_platform_get_cpu_index(void) +{ + unsigned int cpu; + __asm__ __volatile__( + "mrc p15, 0, %0, c0, c0, 5\n\t" + : "=r" (cpu)); + return cpu & 0xf; +} + +/* + * Placeholder for further extensions + */ +static inline int sr_platform_get_cluster_index(void) +{ + return 0; +} + +static inline void __iomem *sr_platform_cbar(void) +{ + void __iomem *base; + __asm__ __volatile__( + "mrc p15, 4, %0, c15, c0, 0\n\t" + : "=r" (base)); + return base; +} + +#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{ + unsigned int v; + asm volatile ( + "mrc p15, 0, %0, c1, c0, 1\n" + "bic %0, %0, %1\n" + "mcr p15, 0, %0, c1, c0, 1\n" + : "=&r" (v) + : "Ir" (0x40) + : ); +} +#else +static inline void exit_coherency(void) { } +#endif + +extern void default_sleep(void); +extern void sr_suspend(void *); +extern void sr_resume(void *, int); +extern void disable_clean_inv_dcache_v7_all(void); diff --git a/arch/arm/kernel/sr_platform.c b/arch/arm/kernel/sr_platform.c new file mode 100644 index 0000000..530aa1b --- /dev/null +++ b/arch/arm/kernel/sr_platform.c @@ -0,0 +1,48 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/memory.h> +#include <asm/page.h> +#include <asm/sr_platform_api.h> +#include "sr.h" + +void *context_memory_uncached; + +/* + * Simple memory allocator function. + * Returns start address of allocated region + * Memory is zero-initialized. + */ + +static unsigned int watermark; + +void *get_memory(unsigned int size) +{ + unsigned ret; + void *vmem = NULL; + + ret = watermark; + watermark += size; + BUG_ON(watermark >= CONTEXT_SPACE_UNCACHED); + vmem = (context_memory_uncached + ret); + watermark = ALIGN(watermark, sizeof(long long)); + + return vmem; +} + +int sr_platform_init(void) +{ + memset(context_memory_uncached, 0, CONTEXT_SPACE_UNCACHED); + return arch->init(); +} diff --git a/arch/arm/kernel/sr_power.c b/arch/arm/kernel/sr_power.c new file mode 100644 index 0000000..2585559 --- /dev/null +++ b/arch/arm/kernel/sr_power.c @@ -0,0 +1,26 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include "sr.h" + +int sr_platform_enter_cstate(unsigned cpu_index, + struct sr_cpu *cpu, + struct sr_cluster *cluster) +{ + return arch->enter_cstate(cpu_index, cpu, cluster); +} + +int sr_platform_leave_cstate(unsigned cpu_index, + struct sr_cpu *cpu, + struct sr_cluster *cluster) +{ + return arch->leave_cstate(cpu_index, cpu, cluster); +}
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
This patch provides the code infrastructure needed to maintain a generic per-cpu architecture implementation of idle code.
sr_platform.c :
- code manages patchset initialization and memory management
sr_context.c:
- code initializes run-time context save/restore generic support
sr_power.c:
- provides the generic infrastructure to enter exit low power modes and communicate with Power Control Unit (PCU)
v7 support hinges on the basic infrastructure to provide per-cpu arch implementation basically through standard function pointers signatures.
Preprocessor defines include size of data needed to save/restore L2 state. This define value should be moved to the respective subsystem (PL310) once the patchset IF to that subsystem is settled.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
[...]
diff --git a/arch/arm/kernel/sr_helpers.h b/arch/arm/kernel/sr_helpers.h new file mode 100644 index 0000000..1ae3a9a --- /dev/null +++ b/arch/arm/kernel/sr_helpers.h @@ -0,0 +1,56 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+static inline int sr_platform_get_cpu_index(void) +{
- unsigned int cpu;
- __asm__ __volatile__(
"mrc p15, 0, %0, c0, c0, 5\n\t"
: "=r" (cpu));
- return cpu& 0xf;
+}
+/*
- Placeholder for further extensions
- */
+static inline int sr_platform_get_cluster_index(void) +{
- return 0;
+}
+static inline void __iomem *sr_platform_cbar(void) +{
- void __iomem *base;
- __asm__ __volatile__(
"mrc p15, 4, %0, c15, c0, 0\n\t"
: "=r" (base));
- return base;
+}
+#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{
- unsigned int v;
- asm volatile (
"mrc p15, 0, %0, c1, c0, 1\n"
"bic %0, %0, %1\n"
"mcr p15, 0, %0, c1, c0, 1\n"
You should have a isb here.
: "=&r" (v)
: "Ir" (0x40)
: );
+}
To avoid aborts on platform which doesn't provide access to SMP bit, NSACR bit 18 should be read. Something like....
mrc p15, 0, r0, c1, c1, 2 tst r0, #(1 << 18) mrcne p15, 0, r0, c1, c0, 1 bicne r0, r0, #(1 << 6) mcrne p15, 0, r0, c1, c0, 1
Regards Santosh
On Fri, Jul 08, 2011 at 02:58:19AM +0100, Santosh Shilimkar wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
This patch provides the code infrastructure needed to maintain a generic per-cpu architecture implementation of idle code.
sr_platform.c :
- code manages patchset initialization and memory management
sr_context.c:
- code initializes run-time context save/restore generic support
sr_power.c:
- provides the generic infrastructure to enter exit low power modes and communicate with Power Control Unit (PCU)
v7 support hinges on the basic infrastructure to provide per-cpu arch implementation basically through standard function pointers signatures.
Preprocessor defines include size of data needed to save/restore L2 state. This define value should be moved to the respective subsystem (PL310) once the patchset IF to that subsystem is settled.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
[...]
diff --git a/arch/arm/kernel/sr_helpers.h b/arch/arm/kernel/sr_helpers.h new file mode 100644 index 0000000..1ae3a9a --- /dev/null +++ b/arch/arm/kernel/sr_helpers.h @@ -0,0 +1,56 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+static inline int sr_platform_get_cpu_index(void) +{
- unsigned int cpu;
- __asm__ __volatile__(
"mrc p15, 0, %0, c0, c0, 5\n\t"
: "=r" (cpu));
- return cpu& 0xf;
+}
+/*
- Placeholder for further extensions
- */
+static inline int sr_platform_get_cluster_index(void) +{
- return 0;
+}
+static inline void __iomem *sr_platform_cbar(void) +{
- void __iomem *base;
- __asm__ __volatile__(
"mrc p15, 4, %0, c15, c0, 0\n\t"
: "=r" (base));
- return base;
+}
+#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{
- unsigned int v;
- asm volatile (
"mrc p15, 0, %0, c1, c0, 1\n"
"bic %0, %0, %1\n"
"mcr p15, 0, %0, c1, c0, 1\n"
You should have a isb here.
Yes, I think it is safer.
: "=&r" (v)
: "Ir" (0x40)
: );
+}
To avoid aborts on platform which doesn't provide access to SMP bit, NSACR bit 18 should be read. Something like....
mrc p15, 0, r0, c1, c1, 2 tst r0, #(1 << 18) mrcne p15, 0, r0, c1, c0, 1 bicne r0, r0, #(1 << 6) mcrne p15, 0, r0, c1, c0, 1
I will merge that code in for v2.
Thanks, Lorenzo
The idea of splitting a large patch up into smaller patches is to do it in a logical way so that:
1. Each patch is self-contained, adding a single new - and where possible complete - feature or bug fix. 2. Ease of review.
Carving your big patch up by file does not do either of this, because all patches interact with each other. There's people within ARM Ltd who have been dealing with patch sets for some time who you could ask advice from - or who you could ask to review your patches before you send them out on the mailing list.
On Thu, Jul 07, 2011 at 04:50:19PM +0100, Lorenzo Pieralisi wrote:
diff --git a/arch/arm/kernel/sr.h b/arch/arm/kernel/sr.h new file mode 100644 index 0000000..6b24e53 --- /dev/null +++ b/arch/arm/kernel/sr.h @@ -0,0 +1,162 @@ +#define SR_NR_CLUSTERS 1
+#define STACK_SIZE 512
+#define CPU_A5 0x4100c050
Not used in this patch - should it be in some other patch?
+#define CPU_A8 0x4100c080
Not used in this patch - should it be in some other patch?
+#define CPU_A9 0x410fc090
Not used in this patch - and if this is generic code then it should not be specific to any CPU. Please get rid of all these definitions, and if they're used you need to rework these patches to have proper separation of generic code from CPU specific code.
+#define L2_DATA_SIZE 16
Not used in this patch.
+#define CONTEXT_SPACE_UNCACHED (2 * PAGE_SIZE) +#define PA(f) ((typeof(f) *)virt_to_phys((void *)f))
This is just messed up. Physical addresses aren't pointers, they're numeric. PA() is not used in this patch either, so it appears it can be deleted.
+extern int lookup_arch(void);
This doesn't exist in this patch, should it be in another patch or deleted?
+/*
- Global variables
- */
+extern struct sr_main_table main_table; +extern unsigned long idle_save_context; +extern unsigned long idle_restore_context; +extern unsigned long idle_mt; +extern void *context_memory_uncached;
Why does this need to be a global?
+/*
- Context save/restore
- */
+typedef u32 (sr_save_context_t)
- (struct sr_cluster *,
- struct sr_cpu*, u32);
Fits on one line so should be on one line.
+typedef u32 (sr_restore_context_t)
- (struct sr_cluster *,
- struct sr_cpu*);
Fits on one line so should be on one line.
+extern sr_save_context_t sr_save_context;
This doesn't exist in this patch, should it be in another patch?
+extern sr_restore_context_t sr_restore_context;
Ditto.
+extern struct sr_arch *get_arch(void);
Ditto.
+/*
- 1:1 mappings
- */
+extern int linux_sr_setup_translation_tables(void);
Ditto.
+/*
- dumb memory allocator
- */
+extern void *get_memory(unsigned int size);
+/*
- Entering/Leaving C-states function entries
- */
+extern int sr_platform_enter_cstate(unsigned cpu_index, struct sr_cpu *cpu,
struct sr_cluster *cluster);
+extern int sr_platform_leave_cstate(unsigned cpu_index, struct sr_cpu *cpu,
struct sr_cluster *cluster);
See comment at the bottom - would inline functions here be better or maybe even place them at the callsite to make the code easier to understand if they're only used at one location.
+/* save/restore main table */ +extern struct sr_main_table main_table;
Why do we have two 'main_table' declarations in this same header file?
+/*
- Init functions
- */
+extern int sr_platform_runtime_init(void);
Not defined in this patch.
+extern int sr_platform_init(void); +extern int sr_context_init(void);
+/*
- v7 specific
- */
+extern char *cpu_v7_suspend_size;
No - stop going underneath the covers of existing APIs to get what you want. Use cpu_suspend() and cpu_resume() directly. Note that they've changed to be more flexible and those patches have been on this mailing list, and will be going in for 3.1.
If they still don't do what you need, I'm going to be *pissed* because you've obviously known that they don't yet you haven't taken the time to get involved on this mailing list with the discussions over it.
+extern void scu_cpu_mode(void __iomem *base, int state);
Not defined in this patch - should it be in another patch or deleted?
+/*
- These arrays keep suitable stack pointers for CPUs.
- The memory must be 8-byte aligned.
- */
+extern unsigned long platform_cpu_stacks[CONFIG_NR_CPUS];
Ditto.
+extern unsigned long platform_cpu_nc_stacks[CONFIG_NR_CPUS];
Ditto.
And should these be per-cpu variables? In any case, CONFIG_NR_CPUS doesn't look right here, NR_CPUS is probably what you want.
+#endif diff --git a/arch/arm/kernel/sr_context.c b/arch/arm/kernel/sr_context.c new file mode 100644 index 0000000..25eaa43 --- /dev/null +++ b/arch/arm/kernel/sr_context.c @@ -0,0 +1,23 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/cache.h> +#include <asm/cacheflush.h> +#include "sr.h"
+int sr_context_init(void) +{
- idle_save_context = (unsigned long) arch->save_context;
This looks wrong. idle_save_context probably has the wrong type.
- idle_restore_context = __pa(arch->restore_context);
- __cpuc_flush_dcache_area(&idle_restore_context, sizeof(unsigned long));
- outer_clean_range(__pa(&idle_restore_context),
__pa(&idle_restore_context + 1));
This kind of thing needs rethinking - calling these internal functions directly just isn't on.
- return 0;
+}
And why have a single .c file for just one function? With all the copyright headers on each file this just results in extra LOC bloat, which given the situation we find ourselves in with mainline is *definitely* a bad idea.
diff --git a/arch/arm/kernel/sr_helpers.h b/arch/arm/kernel/sr_helpers.h new file mode 100644 index 0000000..1ae3a9a --- /dev/null +++ b/arch/arm/kernel/sr_helpers.h @@ -0,0 +1,56 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+static inline int sr_platform_get_cpu_index(void) +{
- unsigned int cpu;
- __asm__ __volatile__(
"mrc p15, 0, %0, c0, c0, 5\n\t"
: "=r" (cpu));
- return cpu & 0xf;
+}
Use smp_processor_id() for indexes into kernel based structures, which has to be a unique CPU number for any CPU in the system. You only need the physical CPU ID when talking to the hardware.
+/*
- Placeholder for further extensions
- */
+static inline int sr_platform_get_cluster_index(void) +{
- return 0;
+}
+static inline void __iomem *sr_platform_cbar(void) +{
- void __iomem *base;
- __asm__ __volatile__(
"mrc p15, 4, %0, c15, c0, 0\n\t"
: "=r" (base));
- return base;
+}
This I imagine is another CPU specific register. On ARM926 its a register which controls the cache mode (writeback vs writethrough).
+#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{
- unsigned int v;
- asm volatile (
"mrc p15, 0, %0, c1, c0, 1\n"
"bic %0, %0, %1\n"
"mcr p15, 0, %0, c1, c0, 1\n"
: "=&r" (v)
: "Ir" (0x40)
: );
+}
Firstly, this is specific to ARM CPUs. Secondly, the bit you require is very CPU specific even then. Some it's 0x40, others its 0x20. So this just does not deserve to be in generic code.
+#else +static inline void exit_coherency(void) { } +#endif
+extern void default_sleep(void); +extern void sr_suspend(void *); +extern void sr_resume(void *, int); +extern void disable_clean_inv_dcache_v7_all(void); diff --git a/arch/arm/kernel/sr_platform.c b/arch/arm/kernel/sr_platform.c new file mode 100644 index 0000000..530aa1b --- /dev/null +++ b/arch/arm/kernel/sr_platform.c @@ -0,0 +1,48 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/memory.h> +#include <asm/page.h> +#include <asm/sr_platform_api.h> +#include "sr.h"
+void *context_memory_uncached;
+/*
- Simple memory allocator function.
- Returns start address of allocated region
- Memory is zero-initialized.
- */
+static unsigned int watermark;
+void *get_memory(unsigned int size) +{
- unsigned ret;
- void *vmem = NULL;
- ret = watermark;
- watermark += size;
- BUG_ON(watermark >= CONTEXT_SPACE_UNCACHED);
- vmem = (context_memory_uncached + ret);
- watermark = ALIGN(watermark, sizeof(long long));
- return vmem;
+}
+int sr_platform_init(void) +{
- memset(context_memory_uncached, 0, CONTEXT_SPACE_UNCACHED);
- return arch->init();
+}
sr_platform_init() looks like its a pointless additional function just here to obfuscate the code. This code could very well be at the sr_platform_init() callsite, which would help make the code more understandable.
And why the initialization of your simple memory allocator is part of the platform code I've no idea.
diff --git a/arch/arm/kernel/sr_power.c b/arch/arm/kernel/sr_power.c new file mode 100644 index 0000000..2585559 --- /dev/null +++ b/arch/arm/kernel/sr_power.c @@ -0,0 +1,26 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include "sr.h"
+int sr_platform_enter_cstate(unsigned cpu_index,
struct sr_cpu *cpu,
struct sr_cluster *cluster)
+{
- return arch->enter_cstate(cpu_index, cpu, cluster);
+}
+int sr_platform_leave_cstate(unsigned cpu_index,
struct sr_cpu *cpu,
struct sr_cluster *cluster)
+{
- return arch->leave_cstate(cpu_index, cpu, cluster);
+}
Is this really worth being a new separate .c file when all it does is call other functions via pointers. What about an inline function doing this in a header file.
On Sat, Jul 09, 2011 at 11:01:23AM +0100, Russell King - ARM Linux wrote:
The idea of splitting a large patch up into smaller patches is to do it in a logical way so that:
- Each patch is self-contained, adding a single new - and where possible complete - feature or bug fix.
- Ease of review.
Carving your big patch up by file does not do either of this, because all patches interact with each other. There's people within ARM Ltd who have been dealing with patch sets for some time who you could ask advice from - or who you could ask to review your patches before you send them out on the mailing list.
Thanks for looking at it anyway. My apologies Russell, point taken, and it is all my fault. Consider all the comments on the patch splitting below as taken into account from now onwards.
On Thu, Jul 07, 2011 at 04:50:19PM +0100, Lorenzo Pieralisi wrote:
diff --git a/arch/arm/kernel/sr.h b/arch/arm/kernel/sr.h new file mode 100644 index 0000000..6b24e53 --- /dev/null +++ b/arch/arm/kernel/sr.h @@ -0,0 +1,162 @@ +#define SR_NR_CLUSTERS 1
+#define STACK_SIZE 512
+#define CPU_A5 0x4100c050
Not used in this patch - should it be in some other patch?
+#define CPU_A8 0x4100c080
Not used in this patch - should it be in some other patch?
+#define CPU_A9 0x410fc090
Not used in this patch - and if this is generic code then it should not be specific to any CPU. Please get rid of all these definitions, and if they're used you need to rework these patches to have proper separation of generic code from CPU specific code.
+#define L2_DATA_SIZE 16
Not used in this patch.
+#define CONTEXT_SPACE_UNCACHED (2 * PAGE_SIZE) +#define PA(f) ((typeof(f) *)virt_to_phys((void *)f))
This is just messed up. Physical addresses aren't pointers, they're numeric. PA() is not used in this patch either, so it appears it can be deleted.
Just a bad name for a macro. It is used to call a function through its physical address pointer. I will rework it.
+extern int lookup_arch(void);
This doesn't exist in this patch, should it be in another patch or deleted?
+/*
- Global variables
- */
+extern struct sr_main_table main_table; +extern unsigned long idle_save_context; +extern unsigned long idle_restore_context; +extern unsigned long idle_mt; +extern void *context_memory_uncached;
Why does this need to be a global?
+/*
- Context save/restore
- */
+typedef u32 (sr_save_context_t)
- (struct sr_cluster *,
- struct sr_cpu*, u32);
Fits on one line so should be on one line.
Ok.
+typedef u32 (sr_restore_context_t)
- (struct sr_cluster *,
- struct sr_cpu*);
Fits on one line so should be on one line.
Ok.
+extern sr_save_context_t sr_save_context;
This doesn't exist in this patch, should it be in another patch?
+extern sr_restore_context_t sr_restore_context;
Ditto.
+extern struct sr_arch *get_arch(void);
Ditto.
+/*
- 1:1 mappings
- */
+extern int linux_sr_setup_translation_tables(void);
Ditto.
+/*
- dumb memory allocator
- */
+extern void *get_memory(unsigned int size);
+/*
- Entering/Leaving C-states function entries
- */
+extern int sr_platform_enter_cstate(unsigned cpu_index, struct sr_cpu *cpu,
struct sr_cluster *cluster);
+extern int sr_platform_leave_cstate(unsigned cpu_index, struct sr_cpu *cpu,
struct sr_cluster *cluster);
See comment at the bottom - would inline functions here be better or maybe even place them at the callsite to make the code easier to understand if they're only used at one location.
Ok.
+/* save/restore main table */ +extern struct sr_main_table main_table;
Why do we have two 'main_table' declarations in this same header file?
Sorry, my mistake.
+/*
- Init functions
- */
+extern int sr_platform_runtime_init(void);
Not defined in this patch.
+extern int sr_platform_init(void); +extern int sr_context_init(void);
+/*
- v7 specific
- */
+extern char *cpu_v7_suspend_size;
No - stop going underneath the covers of existing APIs to get what you want. Use cpu_suspend() and cpu_resume() directly. Note that they've changed to be more flexible and those patches have been on this mailing list, and will be going in for 3.1.
If they still don't do what you need, I'm going to be *pissed* because you've obviously known that they don't yet you haven't taken the time to get involved on this mailing list with the discussions over it.
No Russell, by using cpu_do_suspend and cpu_do_resume I wanted to avoid reusing cpu_suspend/cpu_resume in a way that might clash with cpu idle required behaviour. We talked about this and I decided to avoid using it for cpuidle. The reason is two-fold:
- cpu_suspend/cpu_resume use the stack to save cpu registers and I would like to avoid that to use non-cacheable memory. By using the *_do_* functions I can pass a pointer to the memory I want, but if you consider it an abuse I have to change it. - cpu_suspend flush the cache but it does not clear the C bit in SCTLR, which should be done when powering down a single CPU
I did follow the discussions and I tried to reuse the bits of code I can reuse for carrying out what I need, but I did not want to ask for changes in cpu_suspend that might not be the way to go forward and might disrupt other bits of code relying on it.
+extern void scu_cpu_mode(void __iomem *base, int state);
Not defined in this patch - should it be in another patch or deleted?
+/*
- These arrays keep suitable stack pointers for CPUs.
- The memory must be 8-byte aligned.
- */
+extern unsigned long platform_cpu_stacks[CONFIG_NR_CPUS];
Ditto.
+extern unsigned long platform_cpu_nc_stacks[CONFIG_NR_CPUS];
Ditto.
And should these be per-cpu variables? In any case, CONFIG_NR_CPUS doesn't look right here, NR_CPUS is probably what you want.
It is per-cpu stack pointer as used in sleep.S, where CONFIR_NR_CPUS is used as well.
But I have to split this patch up in a better way to avoid taking your time to track my mistakes.
+#endif diff --git a/arch/arm/kernel/sr_context.c b/arch/arm/kernel/sr_context.c new file mode 100644 index 0000000..25eaa43 --- /dev/null +++ b/arch/arm/kernel/sr_context.c @@ -0,0 +1,23 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/cache.h> +#include <asm/cacheflush.h> +#include "sr.h"
+int sr_context_init(void) +{
- idle_save_context = (unsigned long) arch->save_context;
This looks wrong. idle_save_context probably has the wrong type.
- idle_restore_context = __pa(arch->restore_context);
- __cpuc_flush_dcache_area(&idle_restore_context, sizeof(unsigned long));
- outer_clean_range(__pa(&idle_restore_context),
__pa(&idle_restore_context + 1));
This kind of thing needs rethinking - calling these internal functions directly just isn't on.
Yes, you are right.
- return 0;
+}
And why have a single .c file for just one function? With all the copyright headers on each file this just results in extra LOC bloat, which given the situation we find ourselves in with mainline is *definitely* a bad idea.
I will rework that.
diff --git a/arch/arm/kernel/sr_helpers.h b/arch/arm/kernel/sr_helpers.h new file mode 100644 index 0000000..1ae3a9a --- /dev/null +++ b/arch/arm/kernel/sr_helpers.h @@ -0,0 +1,56 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+static inline int sr_platform_get_cpu_index(void) +{
- unsigned int cpu;
- __asm__ __volatile__(
"mrc p15, 0, %0, c0, c0, 5\n\t"
: "=r" (cpu));
- return cpu & 0xf;
+}
Use smp_processor_id() for indexes into kernel based structures, which has to be a unique CPU number for any CPU in the system. You only need the physical CPU ID when talking to the hardware.
Ok, I will reuse it where I can, there are some code paths where I cannot (wake-up) and there I am afraid I have to resort to it.
+/*
- Placeholder for further extensions
- */
+static inline int sr_platform_get_cluster_index(void) +{
- return 0;
+}
+static inline void __iomem *sr_platform_cbar(void) +{
- void __iomem *base;
- __asm__ __volatile__(
"mrc p15, 4, %0, c15, c0, 0\n\t"
: "=r" (base));
- return base;
+}
This I imagine is another CPU specific register. On ARM926 its a register which controls the cache mode (writeback vs writethrough).
Yes, it is there to get the PERIPHBASE and the SCU physical address. I just use it when I detect an A9 through the processor id.
+#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{
- unsigned int v;
- asm volatile (
"mrc p15, 0, %0, c1, c0, 1\n"
"bic %0, %0, %1\n"
"mcr p15, 0, %0, c1, c0, 1\n"
: "=&r" (v)
: "Ir" (0x40)
: );
+}
Firstly, this is specific to ARM CPUs. Secondly, the bit you require is very CPU specific even then. Some it's 0x40, others its 0x20. So this just does not deserve to be in generic code.
Right.
+#else +static inline void exit_coherency(void) { } +#endif
+extern void default_sleep(void); +extern void sr_suspend(void *); +extern void sr_resume(void *, int); +extern void disable_clean_inv_dcache_v7_all(void); diff --git a/arch/arm/kernel/sr_platform.c b/arch/arm/kernel/sr_platform.c new file mode 100644 index 0000000..530aa1b --- /dev/null +++ b/arch/arm/kernel/sr_platform.c @@ -0,0 +1,48 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/memory.h> +#include <asm/page.h> +#include <asm/sr_platform_api.h> +#include "sr.h"
+void *context_memory_uncached;
+/*
- Simple memory allocator function.
- Returns start address of allocated region
- Memory is zero-initialized.
- */
+static unsigned int watermark;
+void *get_memory(unsigned int size) +{
- unsigned ret;
- void *vmem = NULL;
- ret = watermark;
- watermark += size;
- BUG_ON(watermark >= CONTEXT_SPACE_UNCACHED);
- vmem = (context_memory_uncached + ret);
- watermark = ALIGN(watermark, sizeof(long long));
- return vmem;
+}
+int sr_platform_init(void) +{
- memset(context_memory_uncached, 0, CONTEXT_SPACE_UNCACHED);
- return arch->init();
+}
sr_platform_init() looks like its a pointless additional function just here to obfuscate the code. This code could very well be at the sr_platform_init() callsite, which would help make the code more understandable.
And why the initialization of your simple memory allocator is part of the platform code I've no idea.
Yes, I have to improve the layout as mentioned above.
diff --git a/arch/arm/kernel/sr_power.c b/arch/arm/kernel/sr_power.c new file mode 100644 index 0000000..2585559 --- /dev/null +++ b/arch/arm/kernel/sr_power.c @@ -0,0 +1,26 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include "sr.h"
+int sr_platform_enter_cstate(unsigned cpu_index,
struct sr_cpu *cpu,
struct sr_cluster *cluster)
+{
- return arch->enter_cstate(cpu_index, cpu, cluster);
+}
+int sr_platform_leave_cstate(unsigned cpu_index,
struct sr_cpu *cpu,
struct sr_cluster *cluster)
+{
- return arch->leave_cstate(cpu_index, cpu, cluster);
+}
Is this really worth being a new separate .c file when all it does is call other functions via pointers. What about an inline function doing this in a header file.
You are right throughout Russell, I will rework it. But please keep in mind the cpu_suspend/cpu_resume bits and let me know how you think we can best use them within cpuidle.
Thank you very much indeed, Lorenzo
On 7 July 2011 21:20, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
This patch provides the code infrastructure needed to maintain a generic per-cpu architecture implementation of idle code.
sr_platform.c : - code manages patchset initialization and memory management
sr_context.c: - code initializes run-time context save/restore generic support
sr_power.c: - provides the generic infrastructure to enter exit low power modes and communicate with Power Control Unit (PCU)
v7 support hinges on the basic infrastructure to provide per-cpu arch implementation basically through standard function pointers signatures.
Preprocessor defines include size of data needed to save/restore L2 state. This define value should be moved to the respective subsystem (PL310) once the patchset IF to that subsystem is settled.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/kernel/sr.h | 162 +++++++++++++++++++++++++++++++++++++++++ arch/arm/kernel/sr_context.c | 23 ++++++ arch/arm/kernel/sr_helpers.h | 56 ++++++++++++++ arch/arm/kernel/sr_platform.c | 48 ++++++++++++ arch/arm/kernel/sr_power.c | 26 +++++++ 5 files changed, 315 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr.h create mode 100644 arch/arm/kernel/sr_context.c create mode 100644 arch/arm/kernel/sr_helpers.h create mode 100644 arch/arm/kernel/sr_platform.c create mode 100644 arch/arm/kernel/sr_power.c
diff --git a/arch/arm/kernel/sr.h b/arch/arm/kernel/sr.h new file mode 100644 index 0000000..6b24e53 --- /dev/null +++ b/arch/arm/kernel/sr.h @@ -0,0 +1,162 @@ +#define SR_NR_CLUSTERS 1
+#define STACK_SIZE 512
+#define CPU_A5 0x4100c050 +#define CPU_A8 0x4100c080 +#define CPU_A9 0x410fc090 +#define L2_DATA_SIZE 16 +#define CONTEXT_SPACE_UNCACHED (2 * PAGE_SIZE) +#define PA(f) ((typeof(f) *)virt_to_phys((void *)f))
+#ifndef __ASSEMBLY__
+#include <linux/types.h> +#include <linux/threads.h> +#include <linux/cpumask.h> +#include <asm/page.h>
+/*
- Structures we hide from the OS API
- */
+struct sr_cpu_context {
- u32 flags;
- u32 saved_items;
- u32 *mmu_data;
+};
+struct sr_cluster_context {
- u32 saved_items;
- u32 *l2_data;
+};
+struct sr_main_table {
- pgd_t *os_mmu_context[SR_NR_CLUSTERS][CONFIG_NR_CPUS];
- cpumask_t cpu_idle_mask[SR_NR_CLUSTERS];
- pgd_t *fw_mmu_context;
- u32 num_clusters;
- struct sr_cluster *cluster_table;
+};
+/*
- A cluster is a container for CPUs, typically either a single CPU or a
- coherent cluster.
- We assume the CPUs in the cluster can be switched off independently.
- */
+struct sr_cluster {
- u32 cpu_type; /* A9mpcore, A5mpcore, etc */
- u32 num_cpus;
- struct sr_cluster_context *context;
- struct sr_cpu *cpu_table;
- u32 power_state;
- u32 cluster_down;
- void __iomem *scu_address;
- void *lock;
+};
+struct sr_cpu {
- struct sr_cpu_context *context;
- u32 power_state;
+};
+/*
- arch infrastructure
- */
+struct sr_arch {
- unsigned int cpu_val;
- unsigned int cpu_mask;
- int (*init)(void);
- int (*save_context)(struct sr_cluster *, struct sr_cpu *,
- unsigned);
- int (*restore_context)(struct sr_cluster *, struct sr_cpu *);
- int (*enter_cstate)(unsigned cpu_index,
- struct sr_cpu *cpu,
- struct sr_cluster *cluster);
- int (*leave_cstate)(unsigned, struct sr_cpu *,
- struct sr_cluster *);
- void (*reset)(void);
+};
+extern struct sr_arch *arch; +extern int lookup_arch(void);
+/*
- Global variables
- */
+extern struct sr_main_table main_table; +extern unsigned long idle_save_context; +extern unsigned long idle_restore_context; +extern unsigned long idle_mt; +extern void *context_memory_uncached;
+/*
- Context save/restore
- */
+typedef u32 (sr_save_context_t)
- (struct sr_cluster *,
- struct sr_cpu*, u32);
+typedef u32 (sr_restore_context_t)
- (struct sr_cluster *,
- struct sr_cpu*);
+extern sr_save_context_t sr_save_context; +extern sr_restore_context_t sr_restore_context;
+extern struct sr_arch *get_arch(void);
+/*
- 1:1 mappings
- */
+extern int linux_sr_setup_translation_tables(void);
+/*
- dumb memory allocator
- */
+extern void *get_memory(unsigned int size);
+/*
- Entering/Leaving C-states function entries
- */
+extern int sr_platform_enter_cstate(unsigned cpu_index, struct sr_cpu *cpu,
- struct sr_cluster *cluster);
+extern int sr_platform_leave_cstate(unsigned cpu_index, struct sr_cpu *cpu,
- struct sr_cluster *cluster);
+/* save/restore main table */ +extern struct sr_main_table main_table;
+/*
- Init functions
- */
+extern int sr_platform_runtime_init(void); +extern int sr_platform_init(void); +extern int sr_context_init(void);
+/*
- v7 specific
- */
+extern char *cpu_v7_suspend_size; +extern void scu_cpu_mode(void __iomem *base, int state);
+/*
- These arrays keep suitable stack pointers for CPUs.
- The memory must be 8-byte aligned.
- */
+extern unsigned long platform_cpu_stacks[CONFIG_NR_CPUS]; +extern unsigned long platform_cpu_nc_stacks[CONFIG_NR_CPUS]; +#endif diff --git a/arch/arm/kernel/sr_context.c b/arch/arm/kernel/sr_context.c new file mode 100644 index 0000000..25eaa43 --- /dev/null +++ b/arch/arm/kernel/sr_context.c @@ -0,0 +1,23 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/cache.h> +#include <asm/cacheflush.h> +#include "sr.h"
+int sr_context_init(void) +{
- idle_save_context = (unsigned long) arch->save_context;
- idle_restore_context = __pa(arch->restore_context);
- __cpuc_flush_dcache_area(&idle_restore_context, sizeof(unsigned long));
- outer_clean_range(__pa(&idle_restore_context),
- __pa(&idle_restore_context + 1));
- return 0;
+} diff --git a/arch/arm/kernel/sr_helpers.h b/arch/arm/kernel/sr_helpers.h new file mode 100644 index 0000000..1ae3a9a --- /dev/null +++ b/arch/arm/kernel/sr_helpers.h @@ -0,0 +1,56 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+static inline int sr_platform_get_cpu_index(void) +{
- unsigned int cpu;
- __asm__ __volatile__(
- "mrc p15, 0, %0, c0, c0, 5\n\t"
- : "=r" (cpu));
- return cpu & 0xf;
+}
+/*
- Placeholder for further extensions
- */
+static inline int sr_platform_get_cluster_index(void) +{
- return 0;
+}
+static inline void __iomem *sr_platform_cbar(void) +{
- void __iomem *base;
- __asm__ __volatile__(
- "mrc p15, 4, %0, c15, c0, 0\n\t"
- : "=r" (base));
- return base;
+}
+#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{
- unsigned int v;
- asm volatile (
- "mrc p15, 0, %0, c1, c0, 1\n"
- "bic %0, %0, %1\n"
- "mcr p15, 0, %0, c1, c0, 1\n"
- : "=&r" (v)
- : "Ir" (0x40)
- : );
The above line gives compilation error with my toolchain. Adding it as : "cc"); fixes the error.
+} +#else +static inline void exit_coherency(void) { } +#endif
+extern void default_sleep(void); +extern void sr_suspend(void *); +extern void sr_resume(void *, int); +extern void disable_clean_inv_dcache_v7_all(void); diff --git a/arch/arm/kernel/sr_platform.c b/arch/arm/kernel/sr_platform.c new file mode 100644 index 0000000..530aa1b --- /dev/null +++ b/arch/arm/kernel/sr_platform.c @@ -0,0 +1,48 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <asm/memory.h> +#include <asm/page.h> +#include <asm/sr_platform_api.h> +#include "sr.h"
+void *context_memory_uncached;
+/*
- Simple memory allocator function.
- Returns start address of allocated region
- Memory is zero-initialized.
- */
+static unsigned int watermark;
+void *get_memory(unsigned int size) +{
- unsigned ret;
- void *vmem = NULL;
- ret = watermark;
- watermark += size;
- BUG_ON(watermark >= CONTEXT_SPACE_UNCACHED);
- vmem = (context_memory_uncached + ret);
- watermark = ALIGN(watermark, sizeof(long long));
- return vmem;
+}
+int sr_platform_init(void) +{
- memset(context_memory_uncached, 0, CONTEXT_SPACE_UNCACHED);
- return arch->init();
+} diff --git a/arch/arm/kernel/sr_power.c b/arch/arm/kernel/sr_power.c new file mode 100644 index 0000000..2585559 --- /dev/null +++ b/arch/arm/kernel/sr_power.c @@ -0,0 +1,26 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
- Author(s): Jon Callan, Lorenzo Pieralisi
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License version 2 as
- published by the Free Software Foundation.
- */
+#include "sr.h"
+int sr_platform_enter_cstate(unsigned cpu_index,
- struct sr_cpu *cpu,
- struct sr_cluster *cluster)
+{
- return arch->enter_cstate(cpu_index, cpu, cluster);
+}
+int sr_platform_leave_cstate(unsigned cpu_index,
- struct sr_cpu *cpu,
- struct sr_cluster *cluster)
+{
- return arch->leave_cstate(cpu_index, cpu, cluster);
+}
1.7.4.4
On Thu, Jul 28, 2011 at 05:22:38PM +0100, Amit Kachhap wrote:
On 7 July 2011 21:20, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
This patch provides the code infrastructure needed to maintain a generic per-cpu architecture implementation of idle code.
sr_platform.c : - code manages patchset initialization and memory management
sr_context.c: - code initializes run-time context save/restore generic support
sr_power.c: - provides the generic infrastructure to enter exit low power modes and communicate with Power Control Unit (PCU)
v7 support hinges on the basic infrastructure to provide per-cpu arch implementation basically through standard function pointers signatures.
Preprocessor defines include size of data needed to save/restore L2 state. This define value should be moved to the respective subsystem (PL310) once the patchset IF to that subsystem is settled.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/kernel/sr.h | 162 +++++++++++++++++++++++++++++++++++++++++ arch/arm/kernel/sr_context.c | 23 ++++++ arch/arm/kernel/sr_helpers.h | 56 ++++++++++++++ arch/arm/kernel/sr_platform.c | 48 ++++++++++++ arch/arm/kernel/sr_power.c | 26 +++++++ 5 files changed, 315 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr.h create mode 100644 arch/arm/kernel/sr_context.c create mode 100644 arch/arm/kernel/sr_helpers.h create mode 100644 arch/arm/kernel/sr_platform.c create mode 100644 arch/arm/kernel/sr_power.c
[...]
+#ifdef CONFIG_SMP +static inline void exit_coherency(void) +{
unsigned int v;
asm volatile (
"mrc p15, 0, %0, c1, c0, 1\n"
"bic %0, %0, %1\n"
"mcr p15, 0, %0, c1, c0, 1\n"
: "=&r" (v)
: "Ir" (0x40)
: );
The above line gives compilation error with my toolchain. Adding it as : "cc"); fixes the error.
Already fixed, with thanks, Lorenzo
This patch provides v7 assembly functions to disable-clean-invalidate D$ and programme the SCU CPU power status register.
The function to disable/clean/invalidate D$ is just a shim that clears the C bit and jump to the respective function defined in proc info.
SCU CPU power mode is there to programme the register from an MMU off path where the C environment is not up and running.
Using scu_power_mode(...) is not possible, or at least hairy, since it relies on the smp_processor_id() (so the kernel stack) to be up and running, and it might not access any static data (since gcc inserts virtual addresses constants into the code, making it impossible to call when MMU is off and virtual translation is still not up and running).
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/sr_v7_helpers.S | 47 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 47 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr_v7_helpers.S
diff --git a/arch/arm/kernel/sr_v7_helpers.S b/arch/arm/kernel/sr_v7_helpers.S new file mode 100644 index 0000000..6443918 --- /dev/null +++ b/arch/arm/kernel/sr_v7_helpers.S @@ -0,0 +1,47 @@ +/* + * Copyright (c) 2008-2011 ARM Ltd + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/linkage.h> +#include <asm/assembler.h> + + .text + @ this function disables data caching, then cleans and invalidates + @ the whole data cache. Apart from the preamble to clear the C bit it + @ uses kernel flushing function provided for v7 + +ENTRY(disable_clean_inv_dcache_v7_all) + ARM( stmfd sp!, {r4-r5, r7, r9-r11, lr} ) + THUMB( stmfd sp!, {r4-r7, r9-r11, lr} ) + + dsb + mrc p15, 0, r3, c1, c0, 0 + bic r3, #4 @ clear C bit + mcr p15, 0, r3, c1, c0, 0 + dsb + + bl v7_flush_dcache_all + ARM( ldmfd sp!, {r4-r5, r7, r9-r11, lr} ) + THUMB( ldmfd sp!, {r4-r7, r9-r11, lr} ) + mov pc, lr +ENDPROC(disable_clean_inv_dcache_v7_all) + + @ Resetting the SCU CPU power register when the MMU is off + @ must be done in assembly since the C environment is not + @ set-up yet +ENTRY(scu_cpu_mode) + ALT_SMP(mrc p15, 0, r2, c0, c0, 5) + ALT_UP(mov r2, #0) + and r2, r2, #15 + strb r1, [r0, r2] + mov pc, lr +ENDPROC(scu_cpu_mode) + .end +
This patch introduces a simple infrastructure to initialize save/restore at run-time depending on the HW processor id.
It defines an array of structures containing function pointers that can be called to manage low power state.
The arch pointer is initialized at run-time in lookup_arch() on id matching.
Fully tested on A9 dual core CPU. A8, A5 support compile tested.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/sr_arch.c | 74 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 74 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr_arch.c
diff --git a/arch/arm/kernel/sr_arch.c b/arch/arm/kernel/sr_arch.c new file mode 100644 index 0000000..c9f7dbe --- /dev/null +++ b/arch/arm/kernel/sr_arch.c @@ -0,0 +1,74 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/pm.h> +#include <linux/cache.h> +#include <linux/compiler.h> +#include <linux/errno.h> + +#include <asm/cacheflush.h> +#include <asm/system.h> +#include <asm/cputype.h> +#include <asm/cpu_pm.h> +#include <asm/lb_lock.h> +#include <asm/sr_platform_api.h> +#include "sr_helpers.h" +#include "sr.h" + + +#include "sr_v7.c" + +extern void platform_a8_reset_handler(void); +extern void platform_a9_reset_handler(void); + +struct sr_arch *arch; + +struct sr_arch archs[] = { + { + .cpu_val = CPU_A8, + .cpu_mask = 0xff0ffff0, + .init = sr_platform_a8_init, + .save_context = sr_platform_a8_save_context, + .restore_context = sr_platform_a8_restore_context, + .enter_cstate = sr_platform_a8_enter_cstate, + .leave_cstate = sr_platform_a8_leave_cstate, + .reset = platform_a8_reset_handler}, + { + .cpu_val = CPU_A9, + .cpu_mask = 0xff0ffff0, + .init = sr_platform_a9_init, + .save_context = sr_platform_a9_save_context, + .restore_context = sr_platform_a9_restore_context, + .enter_cstate = sr_platform_a9_enter_cstate, + .leave_cstate = sr_platform_a9_leave_cstate, + .reset = platform_a9_reset_handler}, + { + .cpu_val = CPU_A5, + .cpu_mask = 0xff0ffff0, + .init = sr_platform_a9_init, + .save_context = sr_platform_a9_save_context, + .restore_context = sr_platform_a9_restore_context, + .enter_cstate = sr_platform_a9_enter_cstate, + .leave_cstate = sr_platform_a9_leave_cstate, + .reset = platform_a9_reset_handler}, + { + .cpu_val = 0x0} +}; + +int lookup_arch(void) +{ + struct sr_arch *ap = NULL; + for (ap = archs; ap->cpu_val; ap++) + if ((read_cpuid_id() & ap->cpu_mask) == ap->cpu_val) { + arch = ap; + return 0; + } + return -ENXIO; +}
This patch provides reset entry point for A9, A8, A5 processors.
The reset functions invalidate I$ and D$ depending on the processor needs and jump to the save/restore entry point in sr_entry.S.
The reset address is obtained through the arch_reset_handler() function that returns a function pointer, detected dynamically through cpu id.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/reset_v7.S | 109 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 109 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/reset_v7.S
diff --git a/arch/arm/kernel/reset_v7.S b/arch/arm/kernel/reset_v7.S new file mode 100644 index 0000000..287074c --- /dev/null +++ b/arch/arm/kernel/reset_v7.S @@ -0,0 +1,109 @@ +/* + * Copyright (c) 2008-2011 ARM Ltd + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/linkage.h> +#include "sr.h" + +#define SCTLR_I (1<<12) +#define SCTLR_Z (1<<11) + +ENTRY(platform_a8_reset_handler) + b sr_reset_entry_point +ENDPROC(platform_a8_reset_handler) + +ENTRY(invalidate_icache_v7_pou) + mov r0, #0 + mcr p15, 0, r0, c7, c5, 0 @ iciallu + bx lr +ENDPROC(invalidate_icache_v7_pou) + +ENTRY(invalidate_dcache_v7_all) + @ must iterate over the caches in order to synthesise a complete + @ invalidation of data/unified cache + mrc p15, 1, r0, c0, c0, 1 @ read clidr + ands r3, r0, #0x7000000 @ extract loc from clidr + mov r3, r3, lsr #23 @ left align loc bit field + beq finished @ if loc is 0, then no need to + @ clean + mov r10, #0 @ start clean at cache level 0 + @ (in r10) +loop1: + add r2, r10, r10, lsr #1 @ work out 3x current cache + @ level + mov r12, r0, lsr r2 @ extract cache type bits from + @ clidr + and r12, r12, #7 @ mask of bits for current + @ cache only + cmp r12, #2 @ see what cache we have at + @ this level + blt skip @ skip if no cache, or just + @ i-cache + mcr p15, 2, r10, c0, c0, 0 @ select current cache level + @ in cssr + mov r12, #0 + mcr p15, 0, r12, c7, c5, 4 @ prefetchflush to sync new + @ cssr&csidr + mrc p15, 1, r12, c0, c0, 0 @ read the new csidr + and r2, r12, #7 @ extract the length of the + @ cache lines + add r2, r2, #4 @ add 4 (line length offset) + ldr r6, =0x3ff + ands r6, r6, r12, lsr #3 @ find maximum number on the + @ way size + clz r5, r6 @ find bit pos of way size + @ increment + ldr r7, =0x7fff + ands r7, r7, r12, lsr #13 @ extract max number of the + @ index size +loop2: + mov r8, r6 @ create working copy of max + @ way size +loop3: + orr r11, r10, r8, lsl r5 @ factor way and cache number + @ into r11 + orr r11, r11, r7, lsl r2 @ factor index number into r11 + mcr p15, 0, r11, c7, c6, 2 @ invalidate by set/way + subs r8, r8, #1 @ decrement the way + bge loop3 + subs r7, r7, #1 @ decrement the index + bge loop2 +skip: + add r10, r10, #2 @ increment cache number + cmp r3, r10 + bgt loop1 +finished: + mov r10, #0 + + mcr p15, 0, r10, c7, c10, 4 @ drain write buffer + mcr p15, 0, r10, c8, c7, 0 @ invalidate i + d tlbs + mcr p15, 0, r10, c2, c0, 2 @ ttb control register + bx lr +ENDPROC(invalidate_dcache_v7_all) + +ENTRY(platform_a9_reset_handler) + @ Work out whether caches need to be invalidated: A9 - yes, A5 - no + mrc p15, 0, r0, c0, c0, 0 + ldr r1, =CPU_A5 + cmp r0, r1 + beq icache + + bl invalidate_icache_v7_pou + + @ Turn I cache and branch prediction on +icache: + mrc p15, 0, r0, c1, c0, 0 + orr r0, r0, #(SCTLR_I | SCTLR_Z) + mcr p15, 0, r0, c1, c0, 0 + + @ Clear all data cache levels visible to CPU + blne invalidate_dcache_v7_all + + b sr_reset_entry_point +ENDPROC(platform_a9_reset_handler)
This patch provides all the functions required to manage platform initialization, context save/restore and power entry/exit for A9, A8, A5 ARM processors.
It builds on the infrastructure laid out by the patchset and aims at keeping all v7 code in one single place.
The code relies on common suspend/resume code in the kernel and calls into the respective subsystems (SCU and L2 for A9) in order to carry out actions required to enter idle modes.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/sr_v7.c | 298 +++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 298 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr_v7.c
diff --git a/arch/arm/kernel/sr_v7.c b/arch/arm/kernel/sr_v7.c new file mode 100644 index 0000000..32d1073 --- /dev/null +++ b/arch/arm/kernel/sr_v7.c @@ -0,0 +1,298 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <asm/smp_scu.h> +#include <asm/outercache.h> + +int sr_platform_a8_init(void) +{ + int i; + unsigned int suspend_size = (unsigned int)(&cpu_v7_suspend_size); + + struct sr_cpu *aem_cpu = (struct sr_cpu *) + get_memory(sizeof(struct sr_cpu)); + + struct sr_cluster *aem_cluster = (struct sr_cluster *) + get_memory(sizeof(struct sr_cluster)); + + aem_cluster->cpu_type = read_cpuid_id() & 0xff0ffff0; + + if (aem_cluster->cpu_type != CPU_A8) + return -ENODEV; + + aem_cluster->num_cpus = 1; + aem_cluster->cpu_table = aem_cpu; + aem_cluster->lock = get_memory(sizeof(struct bakery)); + initialize_spinlock(aem_cluster->lock); + + /* (No cluster context for A8) */ + + for (i = 0; i < aem_cluster->num_cpus; ++i) { + aem_cpu[i].context = + get_memory(sizeof(struct sr_cpu_context)); + aem_cpu[i].context->mmu_data = get_memory(suspend_size); + } + + main_table.cluster_table = aem_cluster; + main_table.num_clusters = SR_NR_CLUSTERS; + return 0; +} + +/* + * This function is called at the end of runtime initialization. + * + */ +int sr_platform_a9_init(void) +{ + int i; + struct sr_cpu *aem_cpu; + unsigned int suspend_size = (unsigned int)(&cpu_v7_suspend_size); + + struct sr_cluster *aem_cluster = (struct sr_cluster *) + get_memory(sizeof(struct sr_cluster)); + + aem_cluster->cpu_type = read_cpuid_id() & 0xff0ffff0; + aem_cluster->cluster_down = 0; + + if ((aem_cluster->cpu_type != CPU_A9) && + (aem_cluster->cpu_type != CPU_A5)) + return -ENODEV; +#ifdef CONFIG_SMP + aem_cluster->scu_address = sr_platform_cbar(); +#endif + aem_cluster->num_cpus = num_online_cpus(); + + aem_cluster->cpu_table = (struct sr_cpu *) + get_memory(sizeof(struct sr_cpu) * CONFIG_NR_CPUS); + aem_cluster->lock = get_memory(sizeof(struct bakery)); + + initialize_spinlock(aem_cluster->lock); + + aem_cluster->context = get_memory(sizeof(struct sr_cluster_context)); + + aem_cluster->context->l2_data = get_memory(L2_DATA_SIZE); + + + for (i = 0, aem_cpu = aem_cluster->cpu_table; + i < aem_cluster->num_cpus; ++i) { + + platform_cpu_nc_stacks[i] = (unsigned long) + get_memory(STACK_SIZE) + + STACK_SIZE - 8; + + aem_cpu[i].context = + get_memory(sizeof(struct sr_cpu_context)); + aem_cpu[i].context->mmu_data = get_memory(suspend_size); + } + main_table.cluster_table = aem_cluster; + + __cpuc_flush_dcache_area(&main_table, sizeof(struct sr_main_table)); + outer_clean_range(__pa(&main_table), __pa(&main_table + 1)); + + __cpuc_flush_dcache_area(platform_cpu_nc_stacks, + sizeof(platform_cpu_nc_stacks)/ + sizeof(platform_cpu_nc_stacks[0])); + outer_clean_range(__pa(platform_cpu_nc_stacks), + __pa(platform_cpu_nc_stacks + CONFIG_NR_CPUS)); + + return 0; +} + +int notrace sr_platform_a9_save_context(struct sr_cluster *cluster, + struct sr_cpu *cpu, + unsigned flags) +{ + u32 cluster_saved_items = 0; + struct sr_cpu_context *context; + struct sr_cluster_context *cluster_context; + unsigned int cpu_index = sr_platform_get_cpu_index(); + + context = cpu->context; + cluster_context = cluster->context; + + /* add flags as required by hardware (e.g. SR_SAVE_L2 if L2 is on) */ + flags |= context->flags; + + sr_suspend(context->mmu_data); + + /* + * DISABLE DATA CACHES + * + * - Disable D$ look-up (clear C-bit) + * - Clean+invalidate the D$ cache + * - Exit coherency if SMP + * + */ + + disable_clean_inv_dcache_v7_all(); + + exit_coherency(); + +#ifdef CONFIG_SMP + if (cpu->power_state >= 2) + scu_power_mode(SCU_PM_POWEROFF); +#endif + if (cluster->cluster_down) { + if (flags & SR_SAVE_SCU) + cluster_saved_items |= SR_SAVE_SCU; + + if (flags & SR_SAVE_L2) { + outer_save_context(cluster_context->l2_data, + cluster->power_state == 2, + platform_cpu_stacks[cpu_index]); + cluster_saved_items |= SR_SAVE_L2; + } + cluster_context->saved_items = cluster_saved_items; + } + + return 0; +} + + + +/* + * This function restores all the context that was lost + * when a CPU and cluster entered a low power state. It is called shortly after + * reset, with the MMU and data cache off. + * + * This function is called with cluster->lock held + */ +int notrace sr_platform_a9_restore_context(struct sr_cluster *cluster, + struct sr_cpu *cpu) +{ + struct sr_cpu_context *context; + struct sr_cluster_context *cluster_context; + u32 cluster_saved_items = 0; + int cluster_init = cluster->cluster_down; + + /* + * At this point we may not write to any static data, and we may + * only read the data that we explicitly cleaned from the L2 above. + */ + + context = cpu->context; + cluster_context = cluster->context; +#ifdef CONFIG_SMP + if (cpu->power_state >= 2) + PA(scu_cpu_mode)(cluster->scu_address + 0x8, SCU_PM_NORMAL); +#endif + PA(sr_resume)(context->mmu_data, PLAT_PHYS_OFFSET - PAGE_OFFSET); + + /* First set up the SCU & L2, if necessary */ + if (cluster_init) { + cluster_saved_items = cluster_context->saved_items; +#ifdef CONFIG_SMP + if (cluster_saved_items & SR_SAVE_SCU) + scu_reset(); +#endif + if (cluster_saved_items & SR_SAVE_L2) { + outer_restore_context(cluster_context->l2_data, + cluster->power_state == 2); + + } + } + + /* Return to OS */ + return 0; +} + + +/* + * This function saves all a8 the context that will be lost + * when a CPU and cluster enter a low power state. + * + * This function is called with cluster->lock held + */ +int notrace sr_platform_a8_save_context(struct sr_cluster *cluster, + struct sr_cpu *cpu, unsigned flags) +{ + struct sr_cpu_context *context; + + context = cpu->context; + + sr_suspend(context->mmu_data); + + /* + * Disable, then clean+invalidate the L1 (data) & L2 caches. + * + * Note that if L1 or L2 was to be dormant we would only need to + * clean some key data out, + * and clean+invalidate the stack. + */ + disable_clean_inv_dcache_v7_all(); + + return 0; +} + +/* + * This function restores all the a8 context that was lost + * when a CPU and cluster entered a low power state. It is called shortly after + * reset, with the MMU and data cache off. + */ +int notrace sr_platform_a8_restore_context(struct sr_cluster *cluster, + struct sr_cpu *cpu) +{ + struct sr_cpu_context *context; + + /* + * If the L1 or L2 is dormant, there are special precautions: + * At this point we may not write to any static data, and we may + * only read the data that we explicitly cleaned from the caches above. + */ + context = cpu->context; + + PA(sr_resume)(context->mmu_data, PLAT_PHYS_OFFSET - PAGE_OFFSET); + + /* Return to OS */ + return 0; +} + +static struct lp_state lps; + +int sr_platform_a8_enter_cstate(unsigned cpu_index, struct sr_cpu *cpu, + struct sr_cluster *cluster) +{ + lps.cpu = cpu->power_state; + lps.cluster = cluster->power_state; + platform_pm_enter(&lps); + return 0; +} + +int sr_platform_a8_leave_cstate(unsigned cpu_index, struct sr_cpu *cpu, + struct sr_cluster *cluster) +{ + platform_pm_exit(&lps); + return 0; +} + + +int sr_platform_a9_enter_cstate(unsigned cpu_index, + struct sr_cpu *cpu, + struct sr_cluster *cluster) +{ + lps.cpu = cpu->power_state; + lps.cluster = cluster->power_state; + platform_pm_enter(&lps); + + return 0; +} + +/* + * This function tells the PCU this CPU has finished powering up. + * It is entered with cluster->lock held. + */ +int sr_platform_a9_leave_cstate(unsigned cpu_index, + struct sr_cpu *cpu, + struct sr_cluster *cluster) +{ + platform_pm_exit(&lps); + return 0; +}
NOTE: This toy locking algorithm is there waiting for a warm-boot protocol to be defined for ARM SMP platforms. It is there to prevent races that will never show up in real platforms (ie multiple CPUs coming out of low-power at once from cluster shutdown with no HW/SW control so that they might try to renable shared resources like SCU concurrently). When a warm-boot protocol is defined this patch can be and must be simply ignored; it is provided for completeness.
The current implementation of spinlocks on ARM relies on ldrex/strex which are only functional if and only if the D$ is looked-up and the CPU is within the coherency domain. When a CPU has to be powered down, it has to turn off D$ look-up to clean and invalidate L1 dirty lines, and on an SMP system it must be taken out of the coherency domain it belongs to. This means that after the point of no return on power down (cache look-up disabled and SMP bit cleared) ldrex/strex cacheable spinlocks become unusable. Normal non-cacheable memory based ldrex/strex require a global monitor which is not available in all HW configurations.
In order to ensure atomicity of complex operations like cleaning/invalidating/disabling L2 and reset the SCU, this patch adds the Lamport's bakery algorithm, which does not rely on any low-level atomic operation, to the Linux kernel.
Unless restrictions are put in place in HW, processors can be powered down and powered up at random times (e.g IRQ), which means that the ordering is not really controlled by SW, so lock coordination is required to make the code generic. Memory allocated to LB spinlocks must be strongly ordered and the algorithm relies on strongly and coherent writes ordering among processors to guarantee atomicity.
It is slow and it must be avoided (see NOTE); it is not meant to provide a definitive solution.
Tested on dual-core A9 with processors being powered-down and up according to CPU idle workloads.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/include/asm/lb_lock.h | 34 ++++++++++++++++ arch/arm/kernel/lb_lock.c | 85 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 119 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/lb_lock.h create mode 100644 arch/arm/kernel/lb_lock.c
diff --git a/arch/arm/include/asm/lb_lock.h b/arch/arm/include/asm/lb_lock.h new file mode 100644 index 0000000..2ed722c --- /dev/null +++ b/arch/arm/include/asm/lb_lock.h @@ -0,0 +1,34 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#ifndef __ASM_ARM_LB_LOCK_H +#define __ASM_ARM_LB_LOCK_H + +/* + * Lamport's Bakery algorithm for spinlock handling + * + * Note that the algorithm requires the bakery struct + * to be in Strongly-Ordered memory. + */ + +/* + * Bakery structure - declare/allocate one of these for each lock. + * A pointer to this struct is passed to the lock/unlock functions. + */ +struct bakery { + unsigned int number[CONFIG_NR_CPUS]; + char entering[CONFIG_NR_CPUS]; +}; + +extern void initialize_spinlock(struct bakery *bakery); +extern void get_spinlock(unsigned cpuid, struct bakery *bakery); +extern void release_spinlock(unsigned cpuid, struct bakery *bakery); +#endif diff --git a/arch/arm/kernel/lb_lock.c b/arch/arm/kernel/lb_lock.c new file mode 100644 index 0000000..199e79e --- /dev/null +++ b/arch/arm/kernel/lb_lock.c @@ -0,0 +1,85 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Lamport's Bakery algorithm for spinlock handling + * + * Note that the algorithm requires the bakery struct to be in + * Strongly-Ordered memory. + */ + +#include <asm/system.h> +#include <asm/string.h> +#include <asm/lb_lock.h> +#include <asm/io.h> + +/* + * Initialize a bakery - only required if the bakery is + * on the stack or heap, as static data is zeroed anyway. + */ +void initialize_spinlock(struct bakery *bakery) +{ + memset(bakery, 0, sizeof(struct bakery)); +} + +/* + * Claim a bakery lock. Function does not return until + * lock has been obtained. + */ +void notrace get_spinlock(unsigned cpuid, struct bakery *bakery) +{ + unsigned i, my_full_number, his_full_number, max = 0; + + /* Get a ticket */ + __raw_writeb(1, bakery->entering + cpuid); + /* + * The order of writes is paramount to the algorithm + * dsb() ensures the write happens in order here + */ + dsb(); + for (i = 0; i < CONFIG_NR_CPUS; ++i) { + if (__raw_readl(bakery->number + i) > max) + max = __raw_readl(bakery->number + i); + } + ++max; + /* + * The order of writes is paramount to the algorithm + * dsb() ensures the write happens in order here + */ + __raw_writel(max, bakery->number + cpuid); + dsb(); + __raw_writeb(0, bakery->entering + cpuid); + + /* Wait for our turn */ + my_full_number = (max << 8) + cpuid; + for (i = 0; i < CONFIG_NR_CPUS; ++i) { + while (__raw_readb(bakery->entering + i)) + ; + /* + * Wait since another CPU might be taking our + * same ticket number and have higher priority + */ + do { + his_full_number = __raw_readl(bakery->number + i); + if (his_full_number) + his_full_number = + (his_full_number << 8) + i; + + } while (his_full_number && (his_full_number < my_full_number)); + } + dsb(); +} + +/* + * Release a bakery lock. + */ +void release_spinlock(unsigned cpuid, struct bakery *bakery) +{ + dsb(); + __raw_writel(0, bakery->number + cpuid); +}
When a CLUSTER is powered down the SCU must be reinitialized on warm-boot. This patch adds a hook to reset the SCU, which implies invalidating TAG RAMs and renabling it.
The scu virtual address is saved in a static variable when the SCU is first enabled at boot; this allows common idle code to be generic and avoid relying on platform code to get the address at run-time. On warm-boot the SCU TAG RAM is invalidated and the SCU enabled if it is not already enabled.
The reset can be skipped altogether thanks to save/restore framework flags.
Flushing D$ cache is cumbersome since the system just comes out of reset, which invalidates caches in the process if needed (A9), that is why the scu_enable function is not reused as it is to reset the SCU.
If the init function is extended, there might not be a need for a SCU specific hook, since the init function can be reused to reinitialize the SCU at boot provided it is removed from the init section and kept in memory.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/include/asm/smp_scu.h | 3 ++- arch/arm/kernel/smp_scu.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/smp_scu.h b/arch/arm/include/asm/smp_scu.h index 4eb6d00..cfaa68e 100644 --- a/arch/arm/include/asm/smp_scu.h +++ b/arch/arm/include/asm/smp_scu.h @@ -8,7 +8,8 @@ #ifndef __ASSEMBLER__ unsigned int scu_get_core_count(void __iomem *); void scu_enable(void __iomem *); -int scu_power_mode(void __iomem *, unsigned int); +int scu_power_mode(unsigned int); +void scu_reset(void); #endif
#endif diff --git a/arch/arm/kernel/smp_scu.c b/arch/arm/kernel/smp_scu.c index a1e757c..980ced9 100644 --- a/arch/arm/kernel/smp_scu.c +++ b/arch/arm/kernel/smp_scu.c @@ -20,6 +20,7 @@ #define SCU_INVALIDATE 0x0c #define SCU_FPGA_REVISION 0x10
+static void __iomem *scu_va; /* * Get the number of CPU cores from the SCU configuration */ @@ -36,6 +37,7 @@ void __init scu_enable(void __iomem *scu_base) { u32 scu_ctrl;
+ scu_va = scu_base; scu_ctrl = __raw_readl(scu_base + SCU_CTRL); /* already enabled? */ if (scu_ctrl & 1) @@ -59,7 +61,7 @@ void __init scu_enable(void __iomem *scu_base) * has the side effect of disabling coherency, caches must have been * flushed. Interrupts must also have been disabled. */ -int scu_power_mode(void __iomem *scu_base, unsigned int mode) +int scu_power_mode(unsigned int mode) { unsigned int val; int cpu = smp_processor_id(); @@ -67,9 +69,34 @@ int scu_power_mode(void __iomem *scu_base, unsigned int mode) if (mode > 3 || mode == 1 || cpu > 3) return -EINVAL;
- val = __raw_readb(scu_base + SCU_CPU_STATUS + cpu) & ~0x03; + val = __raw_readb(scu_va + SCU_CPU_STATUS + cpu) & ~0x03; val |= mode; - __raw_writeb(val, scu_base + SCU_CPU_STATUS + cpu); + __raw_writeb(val, scu_va + SCU_CPU_STATUS + cpu);
return 0; } + +/* + * Reinitialise the SCU after power-down + */ + +void scu_reset(void) +{ + u32 scu_ctrl; + + scu_ctrl = __raw_readl(scu_va + SCU_CTRL); + /* already enabled? */ + if (scu_ctrl & 1) + return; + /* + * SCU TAGS should be invalidated on boot-up + */ + __raw_writel(0xffff, scu_va + SCU_INVALIDATE); + /* + * Coming out of reset, dcache invalidated + * no need to go through the whole hog again + * just enable the SCU and pop out + */ + scu_ctrl |= 1; + __raw_writel(scu_ctrl, scu_va + SCU_CTRL); +}
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
When a CLUSTER is powered down the SCU must be reinitialized on warm-boot. This patch adds a hook to reset the SCU, which implies invalidating TAG RAMs and renabling it.
The scu virtual address is saved in a static variable when the SCU is first enabled at boot; this allows common idle code to be generic and avoid relying on platform code to get the address at run-time. On warm-boot the SCU TAG RAM is invalidated and the SCU enabled if it is not already enabled.
The reset can be skipped altogether thanks to save/restore framework flags.
Flushing D$ cache is cumbersome since the system just comes out of reset, which invalidates caches in the process if needed (A9), that is why the scu_enable function is not reused as it is to reset the SCU.
If the init function is extended, there might not be a need for a SCU specific hook, since the init function can be reused to reinitialize the SCU at boot provided it is removed from the init section and kept in memory.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/include/asm/smp_scu.h | 3 ++- arch/arm/kernel/smp_scu.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/smp_scu.h b/arch/arm/include/asm/smp_scu.h index 4eb6d00..cfaa68e 100644 --- a/arch/arm/include/asm/smp_scu.h +++ b/arch/arm/include/asm/smp_scu.h @@ -8,7 +8,8 @@ #ifndef __ASSEMBLER__ unsigned int scu_get_core_count(void __iomem *); void scu_enable(void __iomem *); -int scu_power_mode(void __iomem *, unsigned int); +int scu_power_mode(unsigned int); +void scu_reset(void); #endif
#endif diff --git a/arch/arm/kernel/smp_scu.c b/arch/arm/kernel/smp_scu.c index a1e757c..980ced9 100644 --- a/arch/arm/kernel/smp_scu.c +++ b/arch/arm/kernel/smp_scu.c @@ -20,6 +20,7 @@ #define SCU_INVALIDATE 0x0c #define SCU_FPGA_REVISION 0x10
+static void __iomem *scu_va;
Change log and patch doesn't seems to match. I remember suggesting this change to Russell when "scu_power_mode()" was introduced. His preference was to have scu_base passed as part of the API.
/*
- Get the number of CPU cores from the SCU configuration
*/ @@ -36,6 +37,7 @@ void __init scu_enable(void __iomem *scu_base) { u32 scu_ctrl;
- scu_va = scu_base; scu_ctrl = __raw_readl(scu_base + SCU_CTRL); /* already enabled? */ if (scu_ctrl& 1)
@@ -59,7 +61,7 @@ void __init scu_enable(void __iomem *scu_base)
- has the side effect of disabling coherency, caches must have been
- flushed. Interrupts must also have been disabled.
*/ -int scu_power_mode(void __iomem *scu_base, unsigned int mode) +int scu_power_mode(unsigned int mode) { unsigned int val; int cpu = smp_processor_id(); @@ -67,9 +69,34 @@ int scu_power_mode(void __iomem *scu_base, unsigned int mode) if (mode> 3 || mode == 1 || cpu> 3) return -EINVAL;
- val = __raw_readb(scu_base + SCU_CPU_STATUS + cpu)& ~0x03;
- val = __raw_readb(scu_va + SCU_CPU_STATUS + cpu)& ~0x03; val |= mode;
- __raw_writeb(val, scu_base + SCU_CPU_STATUS + cpu);
__raw_writeb(val, scu_va + SCU_CPU_STATUS + cpu);
return 0; }
+/*
- Reinitialise the SCU after power-down
- */
+void scu_reset(void) +{
- u32 scu_ctrl;
- scu_ctrl = __raw_readl(scu_va + SCU_CTRL);
- /* already enabled? */
- if (scu_ctrl& 1)
return;
- /*
* SCU TAGS should be invalidated on boot-up
*/
- __raw_writel(0xffff, scu_va + SCU_INVALIDATE);
- /*
* Coming out of reset, dcache invalidated
* no need to go through the whole hog again
* just enable the SCU and pop out
*/
- scu_ctrl |= 1;
- __raw_writel(scu_ctrl, scu_va + SCU_CTRL);
+}
On Fri, Jul 08, 2011 at 03:14:02AM +0100, Santosh Shilimkar wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
When a CLUSTER is powered down the SCU must be reinitialized on warm-boot. This patch adds a hook to reset the SCU, which implies invalidating TAG RAMs and renabling it.
The scu virtual address is saved in a static variable when the SCU is first enabled at boot; this allows common idle code to be generic and avoid relying on platform code to get the address at run-time. On warm-boot the SCU TAG RAM is invalidated and the SCU enabled if it is not already enabled.
The reset can be skipped altogether thanks to save/restore framework flags.
Flushing D$ cache is cumbersome since the system just comes out of reset, which invalidates caches in the process if needed (A9), that is why the scu_enable function is not reused as it is to reset the SCU.
If the init function is extended, there might not be a need for a SCU specific hook, since the init function can be reused to reinitialize the SCU at boot provided it is removed from the init section and kept in memory.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/include/asm/smp_scu.h | 3 ++- arch/arm/kernel/smp_scu.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/smp_scu.h b/arch/arm/include/asm/smp_scu.h index 4eb6d00..cfaa68e 100644 --- a/arch/arm/include/asm/smp_scu.h +++ b/arch/arm/include/asm/smp_scu.h @@ -8,7 +8,8 @@ #ifndef __ASSEMBLER__ unsigned int scu_get_core_count(void __iomem *); void scu_enable(void __iomem *); -int scu_power_mode(void __iomem *, unsigned int); +int scu_power_mode(unsigned int); +void scu_reset(void); #endif
#endif diff --git a/arch/arm/kernel/smp_scu.c b/arch/arm/kernel/smp_scu.c index a1e757c..980ced9 100644 --- a/arch/arm/kernel/smp_scu.c +++ b/arch/arm/kernel/smp_scu.c @@ -20,6 +20,7 @@ #define SCU_INVALIDATE 0x0c #define SCU_FPGA_REVISION 0x10
+static void __iomem *scu_va;
Change log and patch doesn't seems to match. I remember suggesting this change to Russell when "scu_power_mode()" was introduced. His preference was to have scu_base passed as part of the API.
Yes, but this implies that I have to call into platform code to get the SCU virtual address. The change log seems ok to me, maybe I have not explained properly the end goal. I would like to reuse scu_enable when the cluster wakes up from low-power instead of adding a reset hook, but invalidating caches (flush_cache_all() in scu_enable) is already catered for in the reset path, so useless. I will work that out. I also added an assembly function to programme the SCU power mode (PATCH 07/17 of this series), since it has to be called from MMU off path.
Lorenzo
When the system hits deep low power states the L2 cache controller can lose its internal logic values and possibly its TAG/DATA RAM content.
This patch adds save/restore hooks to the L2x0 subsystem to save/restore L2x0 registers and clean/invalidate/disable the cache controller as needed.
The cache controller has to go to power down disabled even if its RAM(s) are retained to prevent it from sending AXI transactions on the bus when the cluster is shut-down which might leave the system in a limbo state.
Hence the save function cleans (completely or partially) L2 and disable it in one single function to avoid playing with cacheable stack and flush data to L3.
The current code saving context for retention mode is still a hack and must be improved.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/include/asm/outercache.h | 22 +++++++++++++ arch/arm/mm/cache-l2x0.c | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h index d838743..0437c21 100644 --- a/arch/arm/include/asm/outercache.h +++ b/arch/arm/include/asm/outercache.h @@ -34,6 +34,8 @@ struct outer_cache_fns { void (*sync)(void); #endif void (*set_debug)(unsigned long); + void (*save_context)(void *, bool, unsigned long); + void (*restore_context)(void *, bool); };
#ifdef CONFIG_OUTER_CACHE @@ -74,6 +76,19 @@ static inline void outer_disable(void) outer_cache.disable(); }
+static inline void outer_save_context(void *data, bool dormant, + phys_addr_t end) +{ + if (outer_cache.save_context) + outer_cache.save_context(data, dormant, end); +} + +static inline void outer_restore_context(void *data, bool dormant) +{ + if (outer_cache.restore_context) + outer_cache.restore_context(data, dormant); +} + #else
static inline void outer_inv_range(phys_addr_t start, phys_addr_t end) @@ -86,6 +101,13 @@ static inline void outer_flush_all(void) { } static inline void outer_inv_all(void) { } static inline void outer_disable(void) { }
+static inline void outer_save_context(void *data, bool dormant, + phys_addr_t end) +{ } + +static inline void outer_restore_context(void *data, bool dormant) +{ } + #endif
#ifdef CONFIG_OUTER_CACHE_SYNC diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index ef59099..331fe9b 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -270,6 +270,67 @@ static void l2x0_disable(void) spin_unlock_irqrestore(&l2x0_lock, flags); }
+static void l2x0_save_context(void *data, bool dormant, unsigned long end) +{ + u32 *l2x0_regs = (u32 *) data; + *l2x0_regs = readl_relaxed(l2x0_base + L2X0_AUX_CTRL); + l2x0_regs++; + *l2x0_regs = readl_relaxed(l2x0_base + L2X0_TAG_LATENCY_CTRL); + l2x0_regs++; + *l2x0_regs = readl_relaxed(l2x0_base + L2X0_DATA_LATENCY_CTRL); + + if (!dormant) { + /* clean entire L2 before disabling it*/ + writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_CLEAN_WAY); + cache_wait_way(l2x0_base + L2X0_CLEAN_WAY, l2x0_way_mask); + } else { + /* + * This is an ugly hack, which is there to clean + * the stack from L2 before disabling it + * The only alternative consists in using a non-cacheable stack + * but it is poor in terms of performance since it is only + * needed for cluster shutdown and L2 retention + * On L2 off mode the cache is cleaned anyway + */ + register unsigned long start asm("sp"); + start &= ~(CACHE_LINE_SIZE - 1); + while (start < end) { + cache_wait(l2x0_base + L2X0_CLEAN_LINE_PA, 1); + writel_relaxed(__pa(start), l2x0_base + + L2X0_CLEAN_LINE_PA); + start += CACHE_LINE_SIZE; + } + } + /* + * disable the cache implicitly syncs + */ + writel_relaxed(0, l2x0_base + L2X0_CTRL); +} + +static void l2x0_restore_context(void *data, bool dormant) +{ + u32 *l2x0_regs = (u32 *) data; + + if (!(readl_relaxed(l2x0_base + L2X0_CTRL) & 1)) { + + writel_relaxed(*l2x0_regs, l2x0_base + L2X0_AUX_CTRL); + l2x0_regs++; + writel_relaxed(*l2x0_regs, l2x0_base + L2X0_TAG_LATENCY_CTRL); + l2x0_regs++; + writel_relaxed(*l2x0_regs, l2x0_base + L2X0_DATA_LATENCY_CTRL); + /* + * If L2 is retained do not invalidate + */ + if (!dormant) { + writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_INV_WAY); + cache_wait_way(l2x0_base + L2X0_INV_WAY, l2x0_way_mask); + cache_sync(); + } + + writel_relaxed(1, l2x0_base + L2X0_CTRL); + } +} + void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask) { __u32 aux; @@ -339,6 +400,8 @@ void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask) outer_cache.inv_all = l2x0_inv_all; outer_cache.disable = l2x0_disable; outer_cache.set_debug = l2x0_set_debug; + outer_cache.save_context = l2x0_save_context; + outer_cache.restore_context = l2x0_restore_context;
printk(KERN_INFO "%s cache controller enabled\n", type); printk(KERN_INFO "l2x0: %d ways, CACHE_ID 0x%08x, AUX_CTRL 0x%08x, Cache size: %d B\n",
On Thu, Jul 7, 2011 at 8:50 AM, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
When the system hits deep low power states the L2 cache controller can lose its internal logic values and possibly its TAG/DATA RAM content.
This patch adds save/restore hooks to the L2x0 subsystem to save/restore L2x0 registers and clean/invalidate/disable the cache controller as needed.
The cache controller has to go to power down disabled even if its RAM(s) are retained to prevent it from sending AXI transactions on the bus when the cluster is shut-down which might leave the system in a limbo state.
Hence the save function cleans (completely or partially) L2 and disable it in one single function to avoid playing with cacheable stack and flush data to L3.
The current code saving context for retention mode is still a hack and must be improved.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/include/asm/outercache.h | 22 +++++++++++++ arch/arm/mm/cache-l2x0.c | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-)
<snip>
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index ef59099..331fe9b 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -270,6 +270,67 @@ static void l2x0_disable(void) spin_unlock_irqrestore(&l2x0_lock, flags); }
+static void l2x0_save_context(void *data, bool dormant, unsigned long end) +{
- u32 *l2x0_regs = (u32 *) data;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
- l2x0_regs++;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_TAG_LATENCY_CTRL);
- l2x0_regs++;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_DATA_LATENCY_CTRL);
- if (!dormant) {
- /* clean entire L2 before disabling it*/
- writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_CLEAN_WAY);
- cache_wait_way(l2x0_base + L2X0_CLEAN_WAY, l2x0_way_mask);
- } else {
- /*
- * This is an ugly hack, which is there to clean
- * the stack from L2 before disabling it
- * The only alternative consists in using a non-cacheable stack
- * but it is poor in terms of performance since it is only
- * needed for cluster shutdown and L2 retention
- * On L2 off mode the cache is cleaned anyway
- */
You could avoid the need to pass in "end", and all the code to track it, if you just flush all of the used stack. Idle is always called from a kernel thread, so it should be guaranteed that the stack is size THREAD_SIZE and THREAD_SIZE aligned, so: end = ALIGN(start, THREAD_SIZE);
- register unsigned long start asm("sp");
- start &= ~(CACHE_LINE_SIZE - 1);
Why doesn't this line modify sp? You have declared start to be stored in sp, and modified start, but gcc seems to use a different register initialized from sp. You still probably shouldn't modify start.
- while (start < end) {
- cache_wait(l2x0_base + L2X0_CLEAN_LINE_PA, 1);
- writel_relaxed(__pa(start), l2x0_base +
- L2X0_CLEAN_LINE_PA);
- start += CACHE_LINE_SIZE;
- }
- }
- /*
- * disable the cache implicitly syncs
- */
- writel_relaxed(0, l2x0_base + L2X0_CTRL);
+}
<snip>
Tested just this patch on Tegra to avoid flushing the whole L2 on idle, so: Tested-by: Colin Cross ccross@android.com
Thanks Colin for looking at this.
On Thu, Jul 07, 2011 at 11:06:13PM +0100, Colin Cross wrote:
On Thu, Jul 7, 2011 at 8:50 AM, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
When the system hits deep low power states the L2 cache controller can lose its internal logic values and possibly its TAG/DATA RAM content.
This patch adds save/restore hooks to the L2x0 subsystem to save/restore L2x0 registers and clean/invalidate/disable the cache controller as needed.
The cache controller has to go to power down disabled even if its RAM(s) are retained to prevent it from sending AXI transactions on the bus when the cluster is shut-down which might leave the system in a limbo state.
Hence the save function cleans (completely or partially) L2 and disable it in one single function to avoid playing with cacheable stack and flush data to L3.
The current code saving context for retention mode is still a hack and must be improved.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/include/asm/outercache.h | 22 +++++++++++++ arch/arm/mm/cache-l2x0.c | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-)
<snip>
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index ef59099..331fe9b 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -270,6 +270,67 @@ static void l2x0_disable(void) spin_unlock_irqrestore(&l2x0_lock, flags); }
+static void l2x0_save_context(void *data, bool dormant, unsigned long end) +{
u32 *l2x0_regs = (u32 *) data;
*l2x0_regs = readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
l2x0_regs++;
*l2x0_regs = readl_relaxed(l2x0_base + L2X0_TAG_LATENCY_CTRL);
l2x0_regs++;
*l2x0_regs = readl_relaxed(l2x0_base + L2X0_DATA_LATENCY_CTRL);
if (!dormant) {
/* clean entire L2 before disabling it*/
writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_CLEAN_WAY);
cache_wait_way(l2x0_base + L2X0_CLEAN_WAY, l2x0_way_mask);
} else {
/*
* This is an ugly hack, which is there to clean
* the stack from L2 before disabling it
* The only alternative consists in using a non-cacheable stack
* but it is poor in terms of performance since it is only
* needed for cluster shutdown and L2 retention
* On L2 off mode the cache is cleaned anyway
*/
You could avoid the need to pass in "end", and all the code to track it, if you just flush all of the used stack. Idle is always called from a kernel thread, so it should be guaranteed that the stack is size THREAD_SIZE and THREAD_SIZE aligned, so: end = ALIGN(start, THREAD_SIZE);
Eheh, the used stack, that's what I am trying to achieve with the end variable, I would avoid cleaning THREAD_SIZE worth of L2 when it is just a matter of few bytes.
On the other end, you are right this code path is really horrible. I would do it in assembly, or follow your suggestion and clean starting from above thread_info.
register unsigned long start asm("sp");
start &= ~(CACHE_LINE_SIZE - 1);
Why doesn't this line modify sp? You have declared start to be stored in sp, and modified start, but gcc seems to use a different register initialized from sp. You still probably shouldn't modify start.
You are right, gcc allocates a register but on second thoughts this code does not look safe to me. I just wanted to avoid allocating another stack variable when cleaning the stack. I will rework it, see above.
while (start < end) {
cache_wait(l2x0_base + L2X0_CLEAN_LINE_PA, 1);
writel_relaxed(__pa(start), l2x0_base +
L2X0_CLEAN_LINE_PA);
start += CACHE_LINE_SIZE;
}
}
/*
* disable the cache implicitly syncs
*/
writel_relaxed(0, l2x0_base + L2X0_CTRL);
+}
<snip>
Tested just this patch on Tegra to avoid flushing the whole L2 on idle, so: Tested-by: Colin Cross ccross@android.com
On Tegra Colin, how do you make sure this call is atomic when calling from cpu idle ? I reckon you are sure the calling cpu is the last one up and running, am I right ?
Thanks.
Lorenzo
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
When the system hits deep low power states the L2 cache controller can lose its internal logic values and possibly its TAG/DATA RAM content.
This patch adds save/restore hooks to the L2x0 subsystem to save/restore L2x0 registers and clean/invalidate/disable the cache controller as needed.
The cache controller has to go to power down disabled even if its RAM(s) are retained to prevent it from sending AXI transactions on the bus when the cluster is shut-down which might leave the system in a limbo state.
Hence the save function cleans (completely or partially) L2 and disable it in one single function to avoid playing with cacheable stack and flush data to L3.
The current code saving context for retention mode is still a hack and must be improved.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/include/asm/outercache.h | 22 +++++++++++++ arch/arm/mm/cache-l2x0.c | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h index d838743..0437c21 100644 --- a/arch/arm/include/asm/outercache.h +++ b/arch/arm/include/asm/outercache.h @@ -34,6 +34,8 @@ struct outer_cache_fns { void (*sync)(void); #endif void (*set_debug)(unsigned long);
void (*save_context)(void *, bool, unsigned long);
void (*restore_context)(void *, bool); };
#ifdef CONFIG_OUTER_CACHE
@@ -74,6 +76,19 @@ static inline void outer_disable(void) outer_cache.disable(); }
+static inline void outer_save_context(void *data, bool dormant,
phys_addr_t end)
+{
- if (outer_cache.save_context)
outer_cache.save_context(data, dormant, end);
+}
+static inline void outer_restore_context(void *data, bool dormant) +{
- if (outer_cache.restore_context)
outer_cache.restore_context(data, dormant);
+}
#else
static inline void outer_inv_range(phys_addr_t start, phys_addr_t end)
@@ -86,6 +101,13 @@ static inline void outer_flush_all(void) { } static inline void outer_inv_all(void) { } static inline void outer_disable(void) { }
+static inline void outer_save_context(void *data, bool dormant,
phys_addr_t end)
+{ }
+static inline void outer_restore_context(void *data, bool dormant) +{ }
#endif
#ifdef CONFIG_OUTER_CACHE_SYNC
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index ef59099..331fe9b 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -270,6 +270,67 @@ static void l2x0_disable(void) spin_unlock_irqrestore(&l2x0_lock, flags); }
+static void l2x0_save_context(void *data, bool dormant, unsigned long end) +{
- u32 *l2x0_regs = (u32 *) data;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
- l2x0_regs++;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_TAG_LATENCY_CTRL);
- l2x0_regs++;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_DATA_LATENCY_CTRL);
- if (!dormant) {
/* clean entire L2 before disabling it*/
writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_CLEAN_WAY);
cache_wait_way(l2x0_base + L2X0_CLEAN_WAY, l2x0_way_mask);
- } else {
/*
* This is an ugly hack, which is there to clean
* the stack from L2 before disabling it
* The only alternative consists in using a non-cacheable stack
* but it is poor in terms of performance since it is only
* needed for cluster shutdown and L2 retention
* On L2 off mode the cache is cleaned anyway
*/
register unsigned long start asm("sp");
start&= ~(CACHE_LINE_SIZE - 1);
while (start< end) {
cache_wait(l2x0_base + L2X0_CLEAN_LINE_PA, 1);
writel_relaxed(__pa(start), l2x0_base +
L2X0_CLEAN_LINE_PA);
start += CACHE_LINE_SIZE;
}
- }
I think you need a cache_sync() here.
- /*
* disable the cache implicitly syncs
*/
- writel_relaxed(0, l2x0_base + L2X0_CTRL);
+}
+static void l2x0_restore_context(void *data, bool dormant) +{
- u32 *l2x0_regs = (u32 *) data;
- if (!(readl_relaxed(l2x0_base + L2X0_CTRL)& 1)) {
writel_relaxed(*l2x0_regs, l2x0_base + L2X0_AUX_CTRL);
l2x0_regs++;
writel_relaxed(*l2x0_regs, l2x0_base + L2X0_TAG_LATENCY_CTRL);
l2x0_regs++;
writel_relaxed(*l2x0_regs, l2x0_base + L2X0_DATA_LATENCY_CTRL);
/*
* If L2 is retained do not invalidate
*/
if (!dormant) {
writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_INV_WAY);
cache_wait_way(l2x0_base + L2X0_INV_WAY, l2x0_way_mask);
cache_sync();
}
writel_relaxed(1, l2x0_base + L2X0_CTRL);
Sorry for giving comments on OMAP needs. None of the above registers are accessible from non-secure SW. They need a secure API to set them. This one too like GIC looks not useful in it's current form. :(
Regards Santosh
On Fri, Jul 08, 2011 at 03:19:51AM +0100, Santosh Shilimkar wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
When the system hits deep low power states the L2 cache controller can lose its internal logic values and possibly its TAG/DATA RAM content.
This patch adds save/restore hooks to the L2x0 subsystem to save/restore L2x0 registers and clean/invalidate/disable the cache controller as needed.
The cache controller has to go to power down disabled even if its RAM(s) are retained to prevent it from sending AXI transactions on the bus when the cluster is shut-down which might leave the system in a limbo state.
Hence the save function cleans (completely or partially) L2 and disable it in one single function to avoid playing with cacheable stack and flush data to L3.
The current code saving context for retention mode is still a hack and must be improved.
Fully tested on dual-core A9 cluster.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/include/asm/outercache.h | 22 +++++++++++++ arch/arm/mm/cache-l2x0.c | 63 +++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h index d838743..0437c21 100644 --- a/arch/arm/include/asm/outercache.h +++ b/arch/arm/include/asm/outercache.h @@ -34,6 +34,8 @@ struct outer_cache_fns { void (*sync)(void); #endif void (*set_debug)(unsigned long);
void (*save_context)(void *, bool, unsigned long);
void (*restore_context)(void *, bool); };
#ifdef CONFIG_OUTER_CACHE
@@ -74,6 +76,19 @@ static inline void outer_disable(void) outer_cache.disable(); }
+static inline void outer_save_context(void *data, bool dormant,
phys_addr_t end)
+{
- if (outer_cache.save_context)
outer_cache.save_context(data, dormant, end);
+}
+static inline void outer_restore_context(void *data, bool dormant) +{
- if (outer_cache.restore_context)
outer_cache.restore_context(data, dormant);
+}
#else
static inline void outer_inv_range(phys_addr_t start, phys_addr_t end)
@@ -86,6 +101,13 @@ static inline void outer_flush_all(void) { } static inline void outer_inv_all(void) { } static inline void outer_disable(void) { }
+static inline void outer_save_context(void *data, bool dormant,
phys_addr_t end)
+{ }
+static inline void outer_restore_context(void *data, bool dormant) +{ }
#endif
#ifdef CONFIG_OUTER_CACHE_SYNC
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index ef59099..331fe9b 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -270,6 +270,67 @@ static void l2x0_disable(void) spin_unlock_irqrestore(&l2x0_lock, flags); }
+static void l2x0_save_context(void *data, bool dormant, unsigned long end) +{
- u32 *l2x0_regs = (u32 *) data;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_AUX_CTRL);
- l2x0_regs++;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_TAG_LATENCY_CTRL);
- l2x0_regs++;
- *l2x0_regs = readl_relaxed(l2x0_base + L2X0_DATA_LATENCY_CTRL);
- if (!dormant) {
/* clean entire L2 before disabling it*/
writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_CLEAN_WAY);
cache_wait_way(l2x0_base + L2X0_CLEAN_WAY, l2x0_way_mask);
- } else {
/*
* This is an ugly hack, which is there to clean
* the stack from L2 before disabling it
* The only alternative consists in using a non-cacheable stack
* but it is poor in terms of performance since it is only
* needed for cluster shutdown and L2 retention
* On L2 off mode the cache is cleaned anyway
*/
register unsigned long start asm("sp");
start&= ~(CACHE_LINE_SIZE - 1);
while (start< end) {
cache_wait(l2x0_base + L2X0_CLEAN_LINE_PA, 1);
writel_relaxed(__pa(start), l2x0_base +
L2X0_CLEAN_LINE_PA);
start += CACHE_LINE_SIZE;
}
- }
I think you need a cache_sync() here.
Disabling L2 is implicitly a cache sync.
- /*
* disable the cache implicitly syncs
*/
- writel_relaxed(0, l2x0_base + L2X0_CTRL);
+}
+static void l2x0_restore_context(void *data, bool dormant) +{
- u32 *l2x0_regs = (u32 *) data;
- if (!(readl_relaxed(l2x0_base + L2X0_CTRL)& 1)) {
writel_relaxed(*l2x0_regs, l2x0_base + L2X0_AUX_CTRL);
l2x0_regs++;
writel_relaxed(*l2x0_regs, l2x0_base + L2X0_TAG_LATENCY_CTRL);
l2x0_regs++;
writel_relaxed(*l2x0_regs, l2x0_base + L2X0_DATA_LATENCY_CTRL);
/*
* If L2 is retained do not invalidate
*/
if (!dormant) {
writel_relaxed(l2x0_way_mask, l2x0_base + L2X0_INV_WAY);
cache_wait_way(l2x0_base + L2X0_INV_WAY, l2x0_way_mask);
cache_sync();
}
writel_relaxed(1, l2x0_base + L2X0_CTRL);
Sorry for giving comments on OMAP needs. None of the above registers are accessible from non-secure SW. They need a secure API to set them. This one too like GIC looks not useful in it's current form. :(
I thought about that before posting. I have to split the function in two and skip the register saving conditionally in the framework. I would leave the clean/invalidate code in a separate hook or make the register saving conditional, which is the best option I think.
Lorenzo
This patch adds the code required to allocate and populate page tables that are needed by save/restore code to deal with MMU off/on transactions.
MMU is enabled early in the resume path which allows to call into Linux subsystems with init_mm virtual mappings (cloned at boot).
Current thread page table pointer and context id is saved on power down from active_mm and restored on warm boot. Currently the translation tables contains 1:1 mappings of the Linux kernel code and data, and 1:1 UNCACHED mapping of control code required when MMU is turned off in the restore code path.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/sr_mapping.c | 78 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 78 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr_mapping.c
diff --git a/arch/arm/kernel/sr_mapping.c b/arch/arm/kernel/sr_mapping.c new file mode 100644 index 0000000..32640dc --- /dev/null +++ b/arch/arm/kernel/sr_mapping.c @@ -0,0 +1,78 @@ +/* + * Copyright (C) 2008-2011 ARM Limited + * Author(s): Jon Callan, Lorenzo Pieralisi + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/errno.h> +#include <linux/mm_types.h> + +#include <asm/page.h> +#include <asm/pgtable.h> +#include <asm/pgalloc.h> +#include <asm/sections.h> +#include <asm/cputype.h> +#include <asm/cacheflush.h> +#include "sr_helpers.h" +#include "sr.h" + +#define PROT_PTE_DEVICE (L_PTE_PRESENT|L_PTE_YOUNG|L_PTE_DIRTY|L_PTE_XN) + +static pgd_t *pgd; + +static void *linux_sr_map_page(void *addr, unsigned int size, + pgprot_t prot) +{ + pmd_t *pmd; + pte_t *pte; + u64 pfn; + unsigned long end = (unsigned long) (addr) + size; + unsigned long vaddr = (unsigned long) (addr); + + pmd = pmd_offset(pgd + pgd_index(vaddr), vaddr); + pfn = vaddr >> PAGE_SHIFT; + pte = pte_alloc_kernel(pmd, vaddr); + + do { + if (!pte) + return NULL; + set_pte_ext(pte, pfn_pte(pfn, prot), 0); + outer_clean_range(__pa(pte), __pa(pte + 1)); + pfn++; + } while (pte++, vaddr += PAGE_SIZE, vaddr != end); + + return addr; +} + +int linux_sr_setup_translation_tables(void) +{ + pgd = pgd_alloc(&init_mm); + + if (!pgd) + return -ENOMEM; + /* + * These kernel identity mappings are not strictly necessary + * since resume code creates them on the fly. + * They are left for completeness in case the suspend + * code had to turn MMU off for a power down failure and + * the call to (*sr_sleep) returns. + */ + identity_mapping_add(pgd, __pa(_stext), __pa(_etext)); + identity_mapping_add(pgd, __pa(_sdata), __pa(_edata)); + + linux_sr_map_page(context_memory_uncached, + CONTEXT_SPACE_UNCACHED, + __pgprot(PROT_PTE_DEVICE | + L_PTE_MT_UNCACHED | L_PTE_SHARED)); + + /* save pgd of translation tables for cpu_switch_mm */ + main_table.fw_mmu_context = pgd; + + __cpuc_flush_dcache_area(pgd, sizeof(pgd)); + outer_clean_range(__pa(pgd), __pa(pgd + 1)); + return 0; +}
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
This patch adds the code required to allocate and populate page tables that are needed by save/restore code to deal with MMU off/on transactions.
MMU is enabled early in the resume path which allows to call into Linux subsystems with init_mm virtual mappings (cloned at boot).
Current thread page table pointer and context id is saved on power down from active_mm and restored on warm boot. Currently the translation tables contains 1:1 mappings of the Linux kernel code and data, and 1:1 UNCACHED mapping of control code required when MMU is turned off in the restore code path.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/kernel/sr_mapping.c | 78 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 78 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr_mapping.c
diff --git a/arch/arm/kernel/sr_mapping.c b/arch/arm/kernel/sr_mapping.c new file mode 100644 index 0000000..32640dc --- /dev/null +++ b/arch/arm/kernel/sr_mapping.c @@ -0,0 +1,78 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
This is more of question so don't beat me if I am wrong here. Above file doesn't exist in k.org from 2008 right ? I noticed this in your other patches too.
Regards santosh
On Fri, Jul 08, 2011 at 03:24:38AM +0100, Santosh Shilimkar wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
This patch adds the code required to allocate and populate page tables that are needed by save/restore code to deal with MMU off/on transactions.
MMU is enabled early in the resume path which allows to call into Linux subsystems with init_mm virtual mappings (cloned at boot).
Current thread page table pointer and context id is saved on power down from active_mm and restored on warm boot. Currently the translation tables contains 1:1 mappings of the Linux kernel code and data, and 1:1 UNCACHED mapping of control code required when MMU is turned off in the restore code path.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/kernel/sr_mapping.c | 78 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 78 insertions(+), 0 deletions(-) create mode 100644 arch/arm/kernel/sr_mapping.c
diff --git a/arch/arm/kernel/sr_mapping.c b/arch/arm/kernel/sr_mapping.c new file mode 100644 index 0000000..32640dc --- /dev/null +++ b/arch/arm/kernel/sr_mapping.c @@ -0,0 +1,78 @@ +/*
- Copyright (C) 2008-2011 ARM Limited
This is more of question so don't beat me if I am wrong here. Above file doesn't exist in k.org from 2008 right ? I noticed this in your other patches too.
Yes you are right, I will updated the whole set.
This patch integrates the new PM notifier chain into the perf driver and provides code to execute proper callbacks when a CPU is powered down/up.
Since PMU registers are saved/restored on context switch a simple enable/disable of PMU is enough to enter/exit lowpower. v7 requires the counters to be disabled when the CPU comes out of reset since some register values are UNKNOWN at reset; hence, when a CPU exits low power, the perf driver reset function callback has to be executed.
Tested on dual core A9 with cores going through shut-down and reset, and perf running a per cpu task through the "perf stat" command (perf stats successfully checked against a normal run [power down disabled] through perf of the same task).
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/kernel/perf_event.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index d53c0ab..9fd44a3 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -24,6 +24,7 @@ #include <asm/irq.h> #include <asm/irq_regs.h> #include <asm/pmu.h> +#include <asm/cpu_pm.h> #include <asm/stacktrace.h>
static struct platform_device *pmu_device; @@ -623,6 +624,26 @@ static struct pmu pmu = { #include "perf_event_v6.c" #include "perf_event_v7.c"
+static int pmu_notifier(struct notifier_block *self, unsigned long cmd, void *v) +{ + switch (cmd) { + case CPU_PM_ENTER: + perf_pmu_disable(&pmu); + break; + case CPU_PM_ENTER_FAILED: + case CPU_PM_EXIT: + if (armpmu->reset) + armpmu->reset(NULL); + perf_pmu_enable(&pmu); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block pmu_notifier_block = { + .notifier_call = pmu_notifier, +}; /* * Ensure the PMU has sane values out of reset. * This requires SMP to be available, so exists as a separate initcall. @@ -682,6 +703,7 @@ init_hw_perf_events(void) }
perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW); + cpu_pm_register_notifier(&pmu_notifier_block);
return 0; }
This patch adds notifiers to manage low-power entry/exit in a platform independent manner through a series of callbacks. The goal is to enhance CPU specific notifiers with a different notifier chain that executes callbacks defined to put the system into low-power states (C-state). The callback must be executed with IRQ disabled and caches still up and running, which in particular means that spinlocks implemented as ldrex/strex are still usable on ARM.
The callbacks are a means to achieve common idle code, where the platform_pm_enter()/exit() functions trigger the actions required to enter/exit low-power states (PCU, clock tree and power domain programming) for a specific platform.
Within the common idle code for ARM, the callbacks executed upon platform_pm_enter/exit run with a virtual mapping cloned from init_mm which means that the virtual address space is still accessible.
The notifier is passed a (void *) argument, that in the context of common idle code is meant to define cpu and cluster states in order to allow the platform specific callback to handle power down/up actions accordingly.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/include/asm/cpu_pm.h | 15 +++++++ arch/arm/kernel/cpu_pm.c | 92 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/cpu_pm.h b/arch/arm/include/asm/cpu_pm.h index b4bb715..19b8106 100644 --- a/arch/arm/include/asm/cpu_pm.h +++ b/arch/arm/include/asm/cpu_pm.h @@ -42,8 +42,21 @@ enum cpu_pm_event { CPU_COMPLEX_PM_EXIT, };
+enum platform_pm_event { + /* Time to execute code to shutdown cpu/cluster */ + CPU_PM_SHUTDOWN, + + /* Shutdown cpu/cluster failed */ + CPU_PM_SHUTDOWN_FAILED, + + /* Time to execute code to wakeup cpu/cluster */ + CPU_PM_WAKEUP, +}; + int cpu_pm_register_notifier(struct notifier_block *nb); int cpu_pm_unregister_notifier(struct notifier_block *nb); +int platform_pm_register_notifier(struct notifier_block *nb); +int platform__pm_unregister_notifier(struct notifier_block *nb);
int cpu_pm_enter(void); int cpu_pm_exit(void); @@ -51,4 +64,6 @@ int cpu_pm_exit(void); int cpu_complex_pm_enter(void); int cpu_complex_pm_exit(void);
+int platform_pm_enter(void *); +int platform_pm_exit(void *); #endif diff --git a/arch/arm/kernel/cpu_pm.c b/arch/arm/kernel/cpu_pm.c index 48a5b53..2f1f661 100644 --- a/arch/arm/kernel/cpu_pm.c +++ b/arch/arm/kernel/cpu_pm.c @@ -47,6 +47,8 @@
static DEFINE_RWLOCK(cpu_pm_notifier_lock); static RAW_NOTIFIER_HEAD(cpu_pm_notifier_chain); +static DEFINE_RWLOCK(platform_pm_notifier_lock); +static RAW_NOTIFIER_HEAD(platform_pm_notifier_chain);
int cpu_pm_register_notifier(struct notifier_block *nb) { @@ -74,6 +76,33 @@ int cpu_pm_unregister_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(cpu_pm_unregister_notifier);
+int platform_pm_register_notifier(struct notifier_block *nb) +{ + unsigned long flags; + int ret; + + write_lock_irqsave(&platform_pm_notifier_lock, flags); + ret = raw_notifier_chain_register(&platform_pm_notifier_chain, nb); + write_unlock_irqrestore(&platform_pm_notifier_lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(platform_pm_register_notifier); + +int platform_pm_unregister_notifier(struct notifier_block *nb) +{ + unsigned long flags; + int ret; + + write_lock_irqsave(&platform_pm_notifier_lock, flags); + ret = raw_notifier_chain_unregister(&platform_pm_notifier_chain, nb); + write_unlock_irqrestore(&platform_pm_notifier_lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(platform_pm_unregister_notifier); + +/* These two functions are not really worth duplicating, they must be merged */ static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls) { int ret; @@ -84,8 +113,19 @@ static int cpu_pm_notify(enum cpu_pm_event event, int nr_to_call, int *nr_calls) return notifier_to_errno(ret); }
+static int __platform_pm_notify(enum platform_pm_event event, + void *arg, int nr_to_call, int *nr_calls) +{ + int ret; + + ret = __raw_notifier_call_chain(&platform_pm_notifier_chain, event, arg, + nr_to_call, nr_calls); + + return notifier_to_errno(ret); +} + /** - * cpm_pm_enter + * cpu_pm_enter * * Notifies listeners that a single cpu is entering a low power state that may * cause some blocks in the same power domain as the cpu to reset. @@ -110,7 +150,7 @@ int cpu_pm_enter(void) EXPORT_SYMBOL_GPL(cpu_pm_enter);
/** - * cpm_pm_exit + * cpu_pm_exit * * Notifies listeners that a single cpu is exiting a low power state that may * have caused some blocks in the same power domain as the cpu to reset. @@ -130,7 +170,7 @@ int cpu_pm_exit(void) EXPORT_SYMBOL_GPL(cpu_pm_exit);
/** - * cpm_complex_pm_enter + * cpu_complex_pm_enter * * Notifies listeners that all cpus in a power domain are entering a low power * state that may cause some blocks in the same power domain to reset. @@ -157,7 +197,7 @@ int cpu_complex_pm_enter(void) EXPORT_SYMBOL_GPL(cpu_complex_pm_enter);
/** - * cpm_pm_enter + * cpu_complex_pm_exit * * Notifies listeners that a single cpu is entering a low power state that may * cause some blocks in the same power domain as the cpu to reset. @@ -179,3 +219,47 @@ int cpu_complex_pm_exit(void) return ret; } EXPORT_SYMBOL_GPL(cpu_complex_pm_exit); +/* + * platform_pm_enter + * + * Notifies listeners that either cpu or cluster should enter low-power + * Should carry out the actions needed before issuing a processor specific + * instruction (wfi on ARM) + * Must be called with IRQ disabled + * arg is a parameter containing information about targeted platform state + */ +int platform_pm_enter(void *arg) +{ + int nr_calls; + int ret = 0; + + read_lock(&platform_pm_notifier_lock); + ret = __platform_pm_notify(CPU_PM_SHUTDOWN, arg, -1, &nr_calls); + if (ret) + __platform_pm_notify(CPU_PM_SHUTDOWN_FAILED, arg, + nr_calls - 1, NULL); + read_unlock(&platform_pm_notifier_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(platform_pm_enter); + +/* + * platform_pm_exit + * + * Notifies listeners that either cpu or cluster should undo actions executed + * before entering low-power mode + * Must be called with IRQ disabled + * arg is a parameter containing information about targeted platform state + */ +int platform_pm_exit(void *arg) +{ + int ret; + + read_lock(&platform_pm_notifier_lock); + ret = __platform_pm_notify(CPU_PM_WAKEUP, arg, -1, NULL); + read_unlock(&platform_pm_notifier_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(platform_pm_exit);
adding Rafael, since he was interested in cpu_pm notifiers.
On Thu, Jul 7, 2011 at 8:50 AM, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
This patch adds notifiers to manage low-power entry/exit in a platform independent manner through a series of callbacks. The goal is to enhance CPU specific notifiers with a different notifier chain that executes callbacks defined to put the system into low-power states (C-state). The callback must be executed with IRQ disabled and caches still up and running, which in particular means that spinlocks implemented as ldrex/strex are still usable on ARM.
The callbacks are a means to achieve common idle code, where the platform_pm_enter()/exit() functions trigger the actions required to enter/exit low-power states (PCU, clock tree and power domain programming) for a specific platform.
Within the common idle code for ARM, the callbacks executed upon platform_pm_enter/exit run with a virtual mapping cloned from init_mm which means that the virtual address space is still accessible.
The notifier is passed a (void *) argument, that in the context of common idle code is meant to define cpu and cluster states in order to allow the platform specific callback to handle power down/up actions accordingly.
Can you explain how this is different from the cpu_pm notifiers, besides the name? Would they get called at a different point in the idle path? Could they be the same notifier list as the cpu_pm notifiers, but with different enum values?
If this is purely for platform idle ops, maybe something more like the struct platform_suspend_ops would be more appropriate?
Rafael wanted cpu_pm moved to somewhere outside of ARM, but these platform notifiers sound specific to an ARM common idle implementation, in which case you might need to find another place to put them besides cpu_pm.c.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/include/asm/cpu_pm.h | 15 +++++++ arch/arm/kernel/cpu_pm.c | 92 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/cpu_pm.h b/arch/arm/include/asm/cpu_pm.h index b4bb715..19b8106 100644 --- a/arch/arm/include/asm/cpu_pm.h +++ b/arch/arm/include/asm/cpu_pm.h @@ -42,8 +42,21 @@ enum cpu_pm_event {
<snip>
int cpu_pm_register_notifier(struct notifier_block *nb); int cpu_pm_unregister_notifier(struct notifier_block *nb); +int platform_pm_register_notifier(struct notifier_block *nb); +int platform__pm_unregister_notifier(struct notifier_block *nb);
Extra underscore
<snip>
On Thu, Jul 07, 2011 at 10:20:47PM +0100, Colin Cross wrote:
adding Rafael, since he was interested in cpu_pm notifiers.
On Thu, Jul 7, 2011 at 8:50 AM, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
This patch adds notifiers to manage low-power entry/exit in a platform independent manner through a series of callbacks. The goal is to enhance CPU specific notifiers with a different notifier chain that executes callbacks defined to put the system into low-power states (C-state). The callback must be executed with IRQ disabled and caches still up and running, which in particular means that spinlocks implemented as ldrex/strex are still usable on ARM.
The callbacks are a means to achieve common idle code, where the platform_pm_enter()/exit() functions trigger the actions required to enter/exit low-power states (PCU, clock tree and power domain programming) for a specific platform.
Within the common idle code for ARM, the callbacks executed upon platform_pm_enter/exit run with a virtual mapping cloned from init_mm which means that the virtual address space is still accessible.
The notifier is passed a (void *) argument, that in the context of common idle code is meant to define cpu and cluster states in order to allow the platform specific callback to handle power down/up actions accordingly.
Can you explain how this is different from the cpu_pm notifiers, besides the name? Would they get called at a different point in the idle path? Could they be the same notifier list as the cpu_pm notifiers, but with different enum values?
That is what I did in my previous version, meaning adding an enum to avoid duplicating it, then I decided to change the code to two independent chains.
For the patchset to be generic I have to have a way to call into platform code to do every action needed to program power control unit registers or send firmware commands to put che CPU/Cluster in the required power state. The actions taken are different from cpu_pm and I thought that avoid sharing the same notifier chain would be better.
I just pass a (void *) to platform code through the notifier as a way to decode the required cpu and cluster states that have to be hit.
I call platform_pm_enter before cleaning and invalidating/disabling caching, which might not be proper because some platforms have a fixed time frame to enter low power from the moment power control is programmed and this does not go hand in hand with the variable number of dirty cache lines which can imply different timing required to clean/invalidate them.
If this is purely for platform idle ops, maybe something more like the struct platform_suspend_ops would be more appropriate?
I will look into that, that was planned, I have to check/test if and how I can adapt it to the patchset and if that is proper.
Rafael wanted cpu_pm moved to somewhere outside of ARM, but these platform notifiers sound specific to an ARM common idle implementation, in which case you might need to find another place to put them besides cpu_pm.c.
I understand. It would be nice if we came up with a solution to tackle both requirements in one single place.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/include/asm/cpu_pm.h | 15 +++++++ arch/arm/kernel/cpu_pm.c | 92 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/cpu_pm.h b/arch/arm/include/asm/cpu_pm.h index b4bb715..19b8106 100644 --- a/arch/arm/include/asm/cpu_pm.h +++ b/arch/arm/include/asm/cpu_pm.h @@ -42,8 +42,21 @@ enum cpu_pm_event {
<snip>
int cpu_pm_register_notifier(struct notifier_block *nb); int cpu_pm_unregister_notifier(struct notifier_block *nb); +int platform_pm_register_notifier(struct notifier_block *nb); +int platform__pm_unregister_notifier(struct notifier_block *nb);
Extra underscore
Gah.., will fix it.
Thanks, Lorenzo
This patch adds the required Kconfig and Makefile entries to enable and compile common idle code for ARM kernel.
Common idle code depends on CPU_PM platform notifiers to trigger save/restore of kernel subsystems like PMU, VFP, GIC.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm/Kconfig | 11 +++++++++++ arch/arm/kernel/Makefile | 4 ++++ 2 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 356f266..5b670bd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1992,6 +1992,17 @@ config VFP
Say N if your target does not have VFP hardware.
+config CONTEXT_SR + bool "Save/Restore code support for CPU/Cluster Power Management" + depends on CPU_V7 && CPU_PM + help + Say Y to include Save/Restore code in the kernel. This provides + generic infrastructure to put the code in dormant/shutdown mode + and save/restore the required system state inclusive of L2 cache + logic. + + Say N if your target does not have Power Management hardware. + config VFPv3 bool depends on VFP diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index 8b42d58..ac931f1 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -49,6 +49,10 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SWP_EMULATE) += swp_emulate.o CFLAGS_swp_emulate.o := -Wa,-march=armv7-a obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o +obj-$(CONFIG_CONTEXT_SR) += lb_lock.o sr_api.o sr_mapping.o \ + sr_entry.o sr_arch.o sr_context.o \ + sr_platform.o sr_power.o reset_v7.o \ + sr_v7_helpers.o
obj-$(CONFIG_CRUNCH) += crunch.o crunch-bits.o AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
This patch adds the required Kconfig and Makefile entries to enable and compile common idle code for ARM kernel.
Common idle code depends on CPU_PM platform notifiers to trigger save/restore of kernel subsystems like PMU, VFP, GIC.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/Kconfig | 11 +++++++++++ arch/arm/kernel/Makefile | 4 ++++ 2 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 356f266..5b670bd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1992,6 +1992,17 @@ config VFP
Say N if your target does not have VFP hardware.
+config CONTEXT_SR
SR sounds cryptic. Also since it's PM related _PM_ would be good.
- bool "Save/Restore code support for CPU/Cluster Power Management"
- depends on CPU_V7&& CPU_PM
^ space needed.
Regards Santosh
On Fri, Jul 08, 2011 at 03:29:10AM +0100, Santosh Shilimkar wrote:
On 7/7/2011 8:50 AM, Lorenzo Pieralisi wrote:
This patch adds the required Kconfig and Makefile entries to enable and compile common idle code for ARM kernel.
Common idle code depends on CPU_PM platform notifiers to trigger save/restore of kernel subsystems like PMU, VFP, GIC.
Signed-off-by: Lorenzo Pieralisilorenzo.pieralisi@arm.com
arch/arm/Kconfig | 11 +++++++++++ arch/arm/kernel/Makefile | 4 ++++ 2 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 356f266..5b670bd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1992,6 +1992,17 @@ config VFP
Say N if your target does not have VFP hardware.
+config CONTEXT_SR
SR sounds cryptic. Also since it's PM related _PM_ would be good.
Yes, it is cryptic. I did not give it the thought it deserves, I will come up with something possibly nicer.
- bool "Save/Restore code support for CPU/Cluster Power Management"
- depends on CPU_V7&& CPU_PM
^
space needed.
It was there, did it get munged ? I will look into that.
Lorenzo
On 7 July 2011 21:20, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
This patch adds the required Kconfig and Makefile entries to enable and compile common idle code for ARM kernel.
Common idle code depends on CPU_PM platform notifiers to trigger save/restore of kernel subsystems like PMU, VFP, GIC.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/Kconfig | 11 +++++++++++ arch/arm/kernel/Makefile | 4 ++++ 2 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 356f266..5b670bd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1992,6 +1992,17 @@ config VFP
Say N if your target does not have VFP hardware.
+config CONTEXT_SR
- bool "Save/Restore code support for CPU/Cluster Power Management"
- depends on CPU_V7 && CPU_PM
- help
- Say Y to include Save/Restore code in the kernel. This provides
- generic infrastructure to put the code in dormant/shutdown mode
- and save/restore the required system state inclusive of L2 cache
- logic.
- Say N if your target does not have Power Management hardware.
Currently this is placed inside the "Floating point emulation" menu. "cpu power management" menu may be a better option. Also I did not find where configs CPU_PM, ARCH_USES_CPU_PM are enabled.
config VFPv3 bool depends on VFP diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index 8b42d58..ac931f1 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -49,6 +49,10 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SWP_EMULATE) += swp_emulate.o CFLAGS_swp_emulate.o := -Wa,-march=armv7-a obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o +obj-$(CONFIG_CONTEXT_SR) += lb_lock.o sr_api.o sr_mapping.o \
- sr_entry.o sr_arch.o sr_context.o \
- sr_platform.o sr_power.o reset_v7.o \
- sr_v7_helpers.o
obj-$(CONFIG_CRUNCH) += crunch.o crunch-bits.o AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312 -- 1.7.4.4
On Tue, Jul 26, 2011 at 01:14:26PM +0100, Amit Kachhap wrote:
On 7 July 2011 21:20, Lorenzo Pieralisi lorenzo.pieralisi@arm.com wrote:
This patch adds the required Kconfig and Makefile entries to enable and compile common idle code for ARM kernel.
Common idle code depends on CPU_PM platform notifiers to trigger save/restore of kernel subsystems like PMU, VFP, GIC.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm/Kconfig | 11 +++++++++++ arch/arm/kernel/Makefile | 4 ++++ 2 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 356f266..5b670bd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1992,6 +1992,17 @@ config VFP
Say N if your target does not have VFP hardware.
+config CONTEXT_SR
bool "Save/Restore code support for CPU/Cluster Power Management"
depends on CPU_V7 && CPU_PM
help
Say Y to include Save/Restore code in the kernel. This provides
generic infrastructure to put the code in dormant/shutdown mode
and save/restore the required system state inclusive of L2 cache
logic.
Say N if your target does not have Power Management hardware.
Currently this is placed inside the "Floating point emulation" menu. "cpu power management" menu may be a better option. Also I did not find where configs CPU_PM, ARCH_USES_CPU_PM are enabled.
Yes Amit, I fixed that, thanks for looking at this. CPU_PM and ARCH_USES_CPU_PM are defined in PATCH 2 of this series from Colin, which adds the infrastructure for cpu PM notifiers.
Thanks, Lorenzo
On Thu, Jul 07, 2011 at 04:50:13PM +0100, Lorenzo Pieralisi wrote:
This patchset is a first attempt at providing a consolidation of idle code for the ARM processor architecture and a request for comment on the provided methodology. It relies and it is based on kernel features such as suspend/resume, pm notifiers and common code for cpu_reset().
Please delay this for the next merge window - there's too much going on at the moment to even start looking at this patch set.
Thanks.
On Thu, Jul 07, 2011 at 06:15:30PM +0100, Russell King - ARM Linux wrote:
On Thu, Jul 07, 2011 at 04:50:13PM +0100, Lorenzo Pieralisi wrote:
This patchset is a first attempt at providing a consolidation of idle code for the ARM processor architecture and a request for comment on the provided methodology. It relies and it is based on kernel features such as suspend/resume, pm notifiers and common code for cpu_reset().
Please delay this for the next merge window - there's too much going on at the moment to even start looking at this patch set.
Thanks.
Ok Russell, I understand. Please have a look whenever you can I need feedback on many details.
Thank you very much.
Lorenzo