This is my second version of patchset for ftrace support.
Actually v1 was submitted serveral weeks ago, but is still moderated.
(Just ignore them for now.)
There is another implementation from Cavium network, but both works
are independent, and my code has additional system call trace support.
I confirmed that I could compile the patches on v3.12-rc4 by Linaro's
coming 2013.10 gcc (4.8.2), and that the kernel worked on Fast Model
with the following tracers:
function tracer with dynamic ftrace
function graph tracer with dynamic ftrace
syscall tracepoint
irqsoff & preemptirqsoff (which use CALLER_ADDRx)
Also verified with in-kernel tests, FTRACE_SELFTEST, FTRACE_STARTUP_TEST
and EVENT_TRACE_TEST_SYSCALLS.
Patch[3/6] has warnings from checkpatch, but they follow other arch's style.
Please be careful that host's elf.h must have AArch64 definitions,
EM_AARCH64 and R_AARCH64_ABS64, to build the kernel. See [4/6].
Issues
* Can we optimize register usages in asm (by not saving x0, x1 and x2)? [1/6]
* Do we need "fault protection" code in ftrace_modify_code()? [1/6]
It exists in x86 and other architectures, but not in arm.
* We may be able to use aarch64_insn_patch_text_nosync() instead of
ftrace_modify_code().[2/6] But the former function does not use
probe_kernel_write(). Is this safe?
Changes from v1 to v2:
* splitted one patch into some pieces for easier review
(especially function tracer + dynamic ftrace + CALLER_ADDRx)
* put return_address() in a separate file
* renamed __mcount to _mcount (it was my mistake)
* changed stackframe handling to get parent's frame pointer
* removed ARCH_SUPPORTS_FTRACE_OPS
* switched to "hotpatch" interfaces from Huawai
* revised descriptions in comments
AKASHI Takahiro (6):
arm64: Add ftrace support
arm64: ftrace: Add dynamic ftrace support
arm64: ftrace: Add CALLER_ADDRx macros
ftrace: Add arm64 support to recordmcount
arm64: ftrace: Add system call tracepoint
arm64: Add 'notrace' attribute to unwind_frame() for ftrace
arch/arm64/Kconfig | 6 +
arch/arm64/include/asm/ftrace.h | 54 +++++++++
arch/arm64/include/asm/syscall.h | 1 +
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/include/asm/unistd.h | 2 +
arch/arm64/kernel/Makefile | 9 +-
arch/arm64/kernel/arm64ksyms.c | 4 +
arch/arm64/kernel/entry-ftrace.S | 211 ++++++++++++++++++++++++++++++++++
arch/arm64/kernel/entry.S | 1 +
arch/arm64/kernel/ftrace.c | 186 ++++++++++++++++++++++++++++++
arch/arm64/kernel/ptrace.c | 10 ++
arch/arm64/kernel/return_address.c | 55 +++++++++
arch/arm64/kernel/stacktrace.c | 2 +-
scripts/recordmcount.c | 4 +
scripts/recordmcount.pl | 5 +
15 files changed, 549 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/include/asm/ftrace.h
create mode 100644 arch/arm64/kernel/entry-ftrace.S
create mode 100644 arch/arm64/kernel/ftrace.c
create mode 100644 arch/arm64/kernel/return_address.c
--
1.7.9.5
Dear all,
There's a question in the arch/arm64/kernel/entry.S as following,
/*
* EL1 mode handlers.
*/
el1_sync:
kernel_entry 1
mrs x1, esr_el1 // read the syndrome register
lsr x24, x1, #ESR_EL1_EC_SHIFT // exception class
cmp x24, #ESR_EL1_EC_DABT_EL1 // data abort in EL1
b.eq el1_da
cmp x24, #ESR_EL1_EC_SYS64 // configurable trap
b.eq el1_undef
cmp x24, #ESR_EL1_EC_SP_ALIGN // stack alignment exception
b.eq el1_sp_pc
el1_sp_pc:
/*
* Stack or PC alignment exception handling
*/
mrs x0, far_el1
- mov x1, x25 ==> this is an extra operation
mov x2, sp
b do_sp_pc_abort //Jump to C Exception handler
/**The C Exception Handler/
asmlinkage void __exception do_sp_pc_abort(unsigned long addr,
unsigned int esr,
struct pt_regs *regs)
{
...
}
We use x1 register to store the value of ESR, and check the value to identify which exception handler to jump,
And there's a weird part In stack alignment exception handler(el1_sp_pc),
Why do we need to move x25 to x1?
The ESR has been stored into x1, and should be directly pass to do_sp_pc_abort function
"MOV x1, x25" is an extra operation and do_sp_pc_abort would get the wrong value of esr...
I'm not sure whether I'm right or not, hope someone can take a look at it, thx
BRs
andy
Drivers expecting CPU's OPPs from device tree initialize OPP table themselves by
calling of_init_opp_table() and there is nothing driver specific in that. They
all do it in the same redundant way.
It would be better if we can get rid of redundancy by initializing CPU OPPs from
CPU core code for all CPUs (that have a "operating-points" property defined in
their node).
This patchset callsl of_init_opp_table() directly from register_cpu() right
after CPU device is registered. Later patches updates existing cpufreq drivers
which were calling of_init_opp_table() directly.
V3->V4:
- Remove of_init_cpu_opp_table() and use of_init_opp_table() instead after
adding more print messages to it.
- Drop a rather unrelated patch which was getting rid of dependency on thermal
for cpufreq-cpu0. Will be sent separately.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Amit Daniel Kachhap <amit.daniel(a)samsung.com>
Cc: Kukjin Kim <kgene.kim(a)samsung.com>
Cc: Shawn Guo <shawn.guo(a)linaro.org>
Cc: Sudeep Holla <sudeep.holla(a)arm.com>
Viresh Kumar (8):
opp: of_init_opp_table(): return -ENOSYS when feature isn't
implemented
opp: call of_node_{get|put}() from of_init_opp_table()
opp: Enhance debug messages in of_init_opp_table()
driver/core: cpu: initialize opp table
cpufreq: arm_big_little: don't initialize opp table
cpufreq: imx6q: don't initialize opp table
cpufreq: cpufreq-cpu0: don't initialize opp table
cpufreq: exynos5440: don't initialize opp table
arch/arm/mach-imx/mach-imx6q.c | 36 ++++++++----------------------------
drivers/base/cpu.c | 11 +++++++----
drivers/base/power/opp.c | 11 ++++++++++-
drivers/cpufreq/arm_big_little.c | 12 +++++++-----
drivers/cpufreq/arm_big_little_dt.c | 18 ------------------
drivers/cpufreq/cpufreq-cpu0.c | 6 ------
drivers/cpufreq/exynos5440-cpufreq.c | 6 ------
drivers/cpufreq/imx6q-cpufreq.c | 20 +-------------------
include/linux/pm_opp.h | 2 +-
9 files changed, 34 insertions(+), 88 deletions(-)
--
2.0.0.rc2
Encapsulate the large portion of cpuidle_idle_call inside another
function so when CONFIG_CPU_IDLE=n, the code will be compiled out.
Also that is benefitial for the clarity of the code as it removes
a nested indentation level.
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
---
kernel/sched/idle.c | 161 +++++++++++++++++++++++++++------------------------
1 file changed, 86 insertions(+), 75 deletions(-)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index b8cd302..f2f4bc9 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -63,6 +63,90 @@ void __weak arch_cpu_idle(void)
local_irq_enable();
}
+#ifdef CONFIG_CPU_IDLE
+static int __cpuidle_idle_call(void)
+{
+ struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
+ struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
+ int next_state, entered_state, ret;
+ bool broadcast;
+
+ /*
+ * Check if the cpuidle framework is ready, otherwise fallback
+ * to the default arch specific idle method
+ */
+ ret = cpuidle_enabled(drv, dev);
+ if (ret)
+ return ret;
+
+ /*
+ * Ask the governor to choose an idle state it thinks
+ * it is convenient to go to. There is *always* a
+ * convenient idle state
+ */
+ next_state = cpuidle_select(drv, dev);
+
+ /*
+ * The idle task must be scheduled, it is pointless to
+ * go to idle, just update no idle residency and get
+ * out of this function
+ */
+ if (current_clr_polling_and_test()) {
+ dev->last_residency = 0;
+ entered_state = next_state;
+ local_irq_enable();
+ } else {
+ broadcast = !!(drv->states[next_state].flags &
+ CPUIDLE_FLAG_TIMER_STOP);
+
+ if (broadcast)
+ /*
+ * Tell the time framework to switch to a
+ * broadcast timer because our local timer
+ * will be shutdown. If a local timer is used
+ * from another cpu as a broadcast timer, this
+ * call may fail if it is not available
+ */
+ ret = clockevents_notify(
+ CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
+ &dev->cpu);
+
+ if (!ret) {
+ trace_cpu_idle_rcuidle(next_state, dev->cpu);
+
+ /*
+ * Enter the idle state previously returned by
+ * the governor decision. This function will
+ * block until an interrupt occurs and will
+ * take care of re-enabling the local
+ * interrupts
+ */
+ entered_state = cpuidle_enter(drv, dev, next_state);
+
+ trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+
+ if (broadcast)
+ clockevents_notify(
+ CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
+ &dev->cpu);
+
+ /*
+ * Give the governor an opportunity to reflect
+ * on the outcome
+ */
+ cpuidle_reflect(dev, entered_state);
+ }
+ }
+
+ return 0;
+}
+#else
+static int inline __cpuidle_idle_call(void)
+{
+ return -ENOSYS;
+}
+#endif
+
/**
* cpuidle_idle_call - the main idle function
*
@@ -70,10 +154,7 @@ void __weak arch_cpu_idle(void)
*/
static void cpuidle_idle_call(void)
{
- struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
- struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
- int next_state, entered_state, ret;
- bool broadcast;
+ int ret;
/*
* Check if the idle task must be rescheduled. If it is the
@@ -100,80 +181,10 @@ static void cpuidle_idle_call(void)
rcu_idle_enter();
/*
- * Check if the cpuidle framework is ready, otherwise fallback
- * to the default arch specific idle method
- */
- ret = cpuidle_enabled(drv, dev);
-
- if (!ret) {
- /*
- * Ask the governor to choose an idle state it thinks
- * it is convenient to go to. There is *always* a
- * convenient idle state
- */
- next_state = cpuidle_select(drv, dev);
-
- /*
- * The idle task must be scheduled, it is pointless to
- * go to idle, just update no idle residency and get
- * out of this function
- */
- if (current_clr_polling_and_test()) {
- dev->last_residency = 0;
- entered_state = next_state;
- local_irq_enable();
- } else {
- broadcast = !!(drv->states[next_state].flags &
- CPUIDLE_FLAG_TIMER_STOP);
-
- if (broadcast)
- /*
- * Tell the time framework to switch
- * to a broadcast timer because our
- * local timer will be shutdown. If a
- * local timer is used from another
- * cpu as a broadcast timer, this call
- * may fail if it is not available
- */
- ret = clockevents_notify(
- CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
- &dev->cpu);
-
- if (!ret) {
- trace_cpu_idle_rcuidle(next_state, dev->cpu);
-
- /*
- * Enter the idle state previously
- * returned by the governor
- * decision. This function will block
- * until an interrupt occurs and will
- * take care of re-enabling the local
- * interrupts
- */
- entered_state = cpuidle_enter(drv, dev,
- next_state);
-
- trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
- dev->cpu);
-
- if (broadcast)
- clockevents_notify(
- CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
- &dev->cpu);
-
- /*
- * Give the governor an opportunity to reflect on the
- * outcome
- */
- cpuidle_reflect(dev, entered_state);
- }
- }
- }
-
- /*
* We can't use the cpuidle framework, let's use the default
* idle routine
*/
+ ret = __cpuidle_idle_call();
if (ret)
arch_cpu_idle();
--
1.7.9.5
From: Mark Brown <broonie(a)linaro.org>
Some of the generic drivers used on ARM class systems use OPP so allow it
to be selected in Kconfig. No code is required for this, it is not clear
to me why there is config for this option.
Signed-off-by: Mark Brown <broonie(a)linaro.org>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8ccdd2646ae3..8256d6d09d33 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ config ARM64
def_bool y
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_USE_CMPXCHG_LOCKREF
+ select ARCH_HAS_OPP
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_WANT_OPTIONAL_GPIOLIB
--
2.0.0.rc2
"Power" is a very bad term in the scheduler context. There are so many
meanings that can be attached to it. And with the upcoming "power
aware" scheduler work, confusion is sure to happen.
The definition of "power" is typically the rate at which work is performed,
energy is converted or electric energy is transferred. The notion of
"compute capacity" is rather at odds with "power" to the point many
comments in the code have to make it explicit that "capacity" is the
actual intended meaning.
So let's make it clear what we man by using "capacity" in place of "power"
directly in the code. That will make the introduction of actual "power
consumption" concepts much clearer later on.
This is based on the latest tip tree to apply correctly on top of existing
scheduler changes already queued there.
Changes from v1:
- capa_factor and SCHED_CAPA_* changed to be spelled "capacity" in full
to save peterz some Chupacabra nightmares
- some minor corrections in commit logs
- rebased on latest tip tree
arch/arm/kernel/topology.c | 54 +++----
include/linux/sched.h | 8 +-
kernel/sched/core.c | 87 ++++++-----
kernel/sched/fair.c | 323 ++++++++++++++++++++-------------------
kernel/sched/sched.h | 18 +--
5 files changed, 246 insertions(+), 244 deletions(-)
We already have dummy implementation for most of the regulators APIs for
!CONFIG_REGULATOR case and were missing it for regulator_set_voltage_time().
Found this issue while compiling cpufreq-cpu0 driver without regulators support
in kernel.
drivers/cpufreq/cpufreq-cpu0.c: In function ‘cpu0_cpufreq_probe’:
drivers/cpufreq/cpufreq-cpu0.c:186:3: error: implicit declaration of function ‘regulator_set_voltage_time’ [-Werror=implicit-function-declaration]
Fix this by adding dummy definition for regulator_set_voltage_time().
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Liam/Broonie: Please see if this can go through Rafael as 2nd patch is dependent
on it.
include/linux/regulator/consumer.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/linux/regulator/consumer.h b/include/linux/regulator/consumer.h
index 1a4a8c1..0cfc286 100644
--- a/include/linux/regulator/consumer.h
+++ b/include/linux/regulator/consumer.h
@@ -397,6 +397,12 @@ static inline int regulator_set_voltage(struct regulator *regulator,
return 0;
}
+static inline int regulator_set_voltage_time(struct regulator *regulator,
+ int old_uV, int new_uV)
+{
+ return 0;
+}
+
static inline int regulator_get_voltage(struct regulator *regulator)
{
return -EINVAL;
--
2.0.0.rc2
Implement and enable context tracking for arm64 (which is
a prerequisite for FULL_NOHZ support). This patchset
builds upon earlier work by Kevin Hilman and is based on
Will Deacon's tree.
Changes v5 to v6:
* Don't save far_el1 in x26 in el0_dbg path (not needed)
* TIF_NOHZ processes go through the slow path (so no register
save/restore is needed in ct_user_enter)
Changes v4 to v5:
* Improvement to code restoring far_el1 (suggested by Christopher Covington)
* Improvement to register save/restore in ct_user_enter
Changes v3 to v4:
* Rename parameter of ct_user_exit from save to restore
* Rebased patch to Will Deacon's tree (branch remotes/origin/aarch64
of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git)
Changes v2 to v3:
* Save/restore necessary registers in ct_user_enter and ct_user_exit
* Annotate "error paths" out of el0_sync with ct_user_exit
Changes v1 to v2:
* Save far_el1 in x26 temporarily
Larry Bassel (2):
arm64: adjust el0_sync so that a function can be called
arm64: enable context tracking
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/thread_info.h | 4 +++
arch/arm64/kernel/entry.S | 58 +++++++++++++++++++++++++++++++-----
3 files changed, 56 insertions(+), 7 deletions(-)
--
1.8.3.2