This is my second version of patchset for ftrace support.
Actually v1 was submitted serveral weeks ago, but is still moderated.
(Just ignore them for now.)
There is another implementation from Cavium network, but both works
are independent, and my code has additional system call trace support.
I confirmed that I could compile the patches on v3.12-rc4 by Linaro's
coming 2013.10 gcc (4.8.2), and that the kernel worked on Fast Model
with the following tracers:
function tracer with dynamic ftrace
function graph tracer with dynamic ftrace
syscall tracepoint
irqsoff & preemptirqsoff (which use CALLER_ADDRx)
Also verified with in-kernel tests, FTRACE_SELFTEST, FTRACE_STARTUP_TEST
and EVENT_TRACE_TEST_SYSCALLS.
Patch[3/6] has warnings from checkpatch, but they follow other arch's style.
Please be careful that host's elf.h must have AArch64 definitions,
EM_AARCH64 and R_AARCH64_ABS64, to build the kernel. See [4/6].
Issues
* Can we optimize register usages in asm (by not saving x0, x1 and x2)? [1/6]
* Do we need "fault protection" code in ftrace_modify_code()? [1/6]
It exists in x86 and other architectures, but not in arm.
* We may be able to use aarch64_insn_patch_text_nosync() instead of
ftrace_modify_code().[2/6] But the former function does not use
probe_kernel_write(). Is this safe?
Changes from v1 to v2:
* splitted one patch into some pieces for easier review
(especially function tracer + dynamic ftrace + CALLER_ADDRx)
* put return_address() in a separate file
* renamed __mcount to _mcount (it was my mistake)
* changed stackframe handling to get parent's frame pointer
* removed ARCH_SUPPORTS_FTRACE_OPS
* switched to "hotpatch" interfaces from Huawai
* revised descriptions in comments
AKASHI Takahiro (6):
arm64: Add ftrace support
arm64: ftrace: Add dynamic ftrace support
arm64: ftrace: Add CALLER_ADDRx macros
ftrace: Add arm64 support to recordmcount
arm64: ftrace: Add system call tracepoint
arm64: Add 'notrace' attribute to unwind_frame() for ftrace
arch/arm64/Kconfig | 6 +
arch/arm64/include/asm/ftrace.h | 54 +++++++++
arch/arm64/include/asm/syscall.h | 1 +
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/include/asm/unistd.h | 2 +
arch/arm64/kernel/Makefile | 9 +-
arch/arm64/kernel/arm64ksyms.c | 4 +
arch/arm64/kernel/entry-ftrace.S | 211 ++++++++++++++++++++++++++++++++++
arch/arm64/kernel/entry.S | 1 +
arch/arm64/kernel/ftrace.c | 186 ++++++++++++++++++++++++++++++
arch/arm64/kernel/ptrace.c | 10 ++
arch/arm64/kernel/return_address.c | 55 +++++++++
arch/arm64/kernel/stacktrace.c | 2 +-
scripts/recordmcount.c | 4 +
scripts/recordmcount.pl | 5 +
15 files changed, 549 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/include/asm/ftrace.h
create mode 100644 arch/arm64/kernel/entry-ftrace.S
create mode 100644 arch/arm64/kernel/ftrace.c
create mode 100644 arch/arm64/kernel/return_address.c
--
1.7.9.5
This patchset adds audit support on arm64.
The implementation is just like in other architectures,
and so I think little explanation is needed.
I verified this patch with some commands on both 64-bit rootfs
and 32-bit rootfs(, but only in little-endian):
# auditctl -a exit,always -S openat -F path=/etc/inittab
# auditctl -a exit,always -F dir=/tmp -F perm=rw
# auditctl -a task,always
# autrace /bin/ls
What else?
(Thanks to Clayton for his cross-compiling patch)
I'd like to discuss about the following issues:
(issues)
* AUDIT_ARCH_*
Why do we need to distiguish big-endian and little-endian? [2/4]
* AArch32
We need to add a check for identifying the endian in 32-bit tasks. [3/4]
* syscall no in AArch32
Currently all the definitions are added in unistd32.h with
"ifdef __AARCH32_AUDITSYSCALL" to use asm-generic/audit_*.h. [3/4]
"ifdef" is necessary to avoid a conflict with 64-bit definitions.
Do we need a more sophisticated way?
* TIF_AUDITSYSCALL
Most architectures, except x86, do not check TIF_AUDITSYSCALL. Why not? [4/4]
* Userspace audit package
There are some missing syscall definitions in lib/aarch64_table.h.
There is no support for AUDIT_ARCH_ARM (I mean LE. armeb is BE).
AKASHI Takahiro (4):
audit: Enable arm64 support
arm64: Add audit support
arm64: audit: Add AArch32 support
arm64: audit: Add audit hook in ptrace/syscall_trace
arch/arm64/Kconfig | 3 +
arch/arm64/include/asm/audit32.h | 12 ++
arch/arm64/include/asm/ptrace.h | 5 +
arch/arm64/include/asm/syscall.h | 18 ++
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/include/asm/unistd32.h | 387 ++++++++++++++++++++++++++++++++++
arch/arm64/kernel/Makefile | 4 +
arch/arm64/kernel/audit.c | 77 +++++++
arch/arm64/kernel/audit32.c | 46 ++++
arch/arm64/kernel/entry.S | 3 +
arch/arm64/kernel/ptrace.c | 12 ++
include/uapi/linux/audit.h | 2 +
init/Kconfig | 2 +-
13 files changed, 571 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/audit32.h
create mode 100644 arch/arm64/kernel/audit.c
create mode 100644 arch/arm64/kernel/audit32.c
--
1.7.9.5
From: Vijaya Kumar K <Vijaya.Kumar(a)caviumnetworks.com>
Based on ARM64 KGDB support patches, KGDB support for
FPSIMD is added. Only debugging of FPSIMD kernel context
is supported.
This patch requires Ard's patches where in kernel support and
below patch for holding thread's fpsimd state.
http://permalink.gmane.org/gmane.linux.ports.arm.kernel/277228
So CONFIG_KERNEL_MODE_NEON should be enabled.
with this, FPSIMD registers can be viewed or set from gdb tool.
Unlike CPU registers, the FPSIMD registers are not saved on
exception entry. With the known restriction that FPSIMD should not be
touched in interrupt/exception context, in this patch the FPSIMD
registers are directly read/written on gdb tool request
Here, the FPSIMD registers are read and restored for every FPSIMD
register read and write by GDB tool. So this has impact on
gdb tool response which is neglible. Other architectures like
mips are also implemented similarly
v2 changes:
- Added API to know thread fpsimd state by checking
TIF_FOREIGN_FPSTATE flag. This is based on below patch
http://permalink.gmane.org/gmane.linux.ports.arm.kernel/277228
- Allow FPSIMD registers access only when FPSIMD is under use
by current thread
v1 changes:
- Initial patch
Tested on ARM64 simulator
Vijaya Kumar K (1):
ARM64: KGDB: Add FP/SIMD debug support
arch/arm64/include/asm/fpsimd.h | 1 +
arch/arm64/kernel/fpsimd.c | 5 ++
arch/arm64/kernel/kgdb.c | 105 +++++++++++++++++++++++++--------------
3 files changed, 73 insertions(+), 38 deletions(-)
--
1.7.9.5
Patchsets related to hibernation resume:
- enhancement to make the use of an existing resume file more general
- enhance name_to_dev_t to ignore trailing newlines coming from userspace.
Both patches are based on the 3.12-rc3 tag. This was tested on a
Pandaboard with partial hibernation support, and compiled for x86.
[PATCH 1/2] init/do_mounts.c: ignore final \n in name_to_dev_t
init/do_mounts.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
Changes name_to_dev_t to handle a trailing newline in the
input buffer, which will allow name_to_dev_t to be used
directly with user buffers without requiring a copy.
Also adds a const to the name parameter which reflects
how name_to_dev_t is treating the input buffer currently.
This also allows direct use of user buffers
(from resume_store for example).
[PATCH 2/2] PM / Hibernate: use name_to_dev_t to parse resume
kernel/power/hibernate.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
Use name_to_dev_t to parse the /sys/power/resume file making the
syntax more flexible. It supports the previous use syntax
and additionally can support other formats such as
/dev/devicenode and UUID= formats.
By changing /sys/debug/resume to accept the same syntax as
the resume=device parameter, we can parse the resume=device
in the initrd init script and use the resume device directly
from the kernel command line.
Changes in v3:
--------------
* Dropped documentation patch as it went in through trivial
* Added patch for name_to_dev_t to support directly parsing userspace
buffer
Changes in v2:
--------------
* Added check for null return of kstrndup in hibernate.c
Thanks,
Sebastian
This is the 5th version of the previously named "packing small tasks" patchset.
"small" has been removed because the patchset doesn't only target small tasks
anymore.
This patchset takes advantage of the new per-task load tracking that is
available in the scheduler to pack the tasks in a minimum number of
CPU/Cluster/Core. The packing mechanism takes into account the power gating
topology of the CPUs to minimize the number of power domains that need to be
powered on simultaneously.
Most of the code has been put in fair.c file but it can be easily moved to
another location. This patchset tries to solve one part of the larger
energy-efficient scheduling problem and it should be merged with other
proposals that solve other parts like the power-scheduler made by Morten.
The packing is done in 3 steps:
The 1st step creates a topology of the power gating of the CPUs that will help
the scheduler to choose which CPUs will handle the current activity. This
topology is described thanks to a new flag SD_SHARE_POWERDOMAIN that indicates
whether the groups of CPUs of a scheduling domain share their power state. In
order to be efficient, a group of CPUs that share their power state will be
used (or not) simultaneously. By default, this flag is set in all sched_domain
in order to keep the current behavior of the scheduler unchanged.
The 2nd step evaluates the current activity of the system and creates a list of
CPUs for handling it. The average activity level of CPUs is set to 80% but is
configurable by changing the sched_packing_level knob. The activity level and
the involvement of a CPU in the packing effort is evaluated during the periodic
load balance similarly to cpu_power. Then, the default load balancing behavior
is used to balance tasks between this reduced list of CPUs.
As the current activity doesn't take into account a new task, an unused CPUs
can also be selected during the 1st wake up and until the activity is updated.
The 3rd step occurs when the scheduler selects a target CPU for a newly
awakened task. The current wakeup latency of idle CPUs is used to select the
one with the most shallow c-state. In some situation where the task load is
small compared to the latency, the newly awakened task can even stay on the
current CPU. Since the load is the main metric for the scheduler, the wakeup
latency is transposed into an equivalent load so that the current mechanism of
the load balance that is based on load comparison, is kept unchanged. A shared
structure has been created to exchange information between scheduler and
cpuidle (or any other framework that needs to share information). The wakeup
latency is the only field for the moment but it could be extended with
additional useful information like the target load or the expected sleep
duration of a CPU.
The patchset is based on v3.12-rc2 and is available in the git tree:
git://git.linaro.org/people/vingu/kernel.git
branch sched-packing-small-tasks-v5
If you want to test the patchset, you must enable CONFIG_PACKING_TASKS first.
Then, you also need to create a arch_sd_local_flags that will clear the
SD_SHARE_POWERDOMAIN flag at the appropriate level for your architecture. This
has already be done for ARM architecture in the patchset.
The figures below show the latency of cyclictest with and without the patchset
on an ARM platform with a v3.11. The test has been runned 10 times on each kernel.
#cyclictest -t 3 -q -e 1000000 -l 3000 -i 1800 -d 100
average (us) stdev
v3.11 381,5 79,86
v3.11 + patches 173,83 13,62
Change since V4:
- v4 posting:https://lkml.org/lkml/2013/4/25/396
- Keep only the aggressive packing mode.
- Add a finer grain power domain description mechanism that includes
DT description
- Add a structure to share information with other framework
- Use current wakeup latency of an idle CPU when selecting the target idle CPU
- All the task packing mechanism can be disabled with a single config option
Change since V3:
- v3 posting: https://lkml.org/lkml/2013/3/22/183
- Take into account comments on previous version.
- Add an aggressive packing mode and a knob to select between the various mode
Change since V2:
- v2 posting: https://lkml.org/lkml/2012/12/12/164
- Migrate only a task that wakes up
- Change the light tasks threshold to 20%
- Change the loaded CPU threshold to not pull tasks if the current number of
running tasks is null but the load average is already greater than 50%
- Fix the algorithm for selecting the buddy CPU.
Change since V1:
-v1 posting: https://lkml.org/lkml/2012/10/7/19
Patch 2/6
- Change the flag name which was not clear. The new name is
SD_SHARE_POWERDOMAIN.
- Create an architecture dependent function to tune the sched_domain flags
Patch 3/6
- Fix issues in the algorithm that looks for the best buddy CPU
- Use pr_debug instead of pr_info
- Fix for uniprocessor
Patch 4/6
- Remove the use of usage_avg_sum which has not been merged
Patch 5/6
- Change the way the coherency of runnable_avg_sum and runnable_avg_period is
ensured
Patch 6/6
- Use the arch dependent function to set/clear SD_SHARE_POWERDOMAIN for ARM
platform
Vincent Guittot (14):
sched: add a new arch_sd_local_flags for sched_domain init
ARM: sched: clear SD_SHARE_POWERDOMAIN
sched: define pack buddy CPUs
sched: do load balance only with packing cpus
sched: add a packing level knob
sched: create a new field with available capacity
sched: get CPU's activity statistic
sched: move load idx selection in find_idlest_group
sched: update the packing cpu list
sched: init this_load to max in find_idlest_group
sched: add a SCHED_PACKING_TASKS config
sched: create a statistic structure
sched: differantiate idle cpu
cpuidle: set the current wake up latency
arch/arm/include/asm/topology.h | 4 +
arch/arm/kernel/topology.c | 50 ++++-
arch/ia64/include/asm/topology.h | 3 +-
arch/tile/include/asm/topology.h | 3 +-
drivers/cpuidle/cpuidle.c | 11 ++
include/linux/sched.h | 13 +-
include/linux/sched/sysctl.h | 9 +
include/linux/topology.h | 11 +-
init/Kconfig | 11 ++
kernel/sched/core.c | 11 +-
kernel/sched/fair.c | 395 ++++++++++++++++++++++++++++++++++++--
kernel/sched/sched.h | 8 +-
kernel/sysctl.c | 17 ++
13 files changed, 521 insertions(+), 25 deletions(-)
--
1.7.9.5
Hi Rafael,
I know you asked me not to send any more patches before the earlier ones get
into kernel. I got to this as Nicolas Pitre required to send few CPUFreq patches
for ARM's big LITTLE In-Kernel-Switcher. And within linaro we have hacked these
bugs in a bad way..
Because of his dependency I am forced to send these.. These aren't introduced
recently and so they can be included in 3.13.
There are several problems/bugs in cpufreq-stats specially with cpufreq drivers
as modules and suspend/resume path. These are mentioned well in changelogs.
These are tested over my thinkpad (acpi-cpufreq) in following way:
[1] offline+online all CPUs except boot cpu in a while loop
[2] then do suspend resume
[3] repeat [1] and [2] several times.
No issues found..
Also tested on my exynos board:
- Added cpufreq_unregister/register while loop in exynos-cpufreq.c so that we
continuously register/unregister driver... Stats were working fine now..
- Compile cpufreq-stats as module and insert/remove it several times after
removing above hack (as that doesn't let linux boot :) )..
@Srivatsa: You also have fairly good idea of cpufreq now, so please give some
time to review this :)
@Nico: Can you remove the hacky code from IKS tree and test these instead to see
if we still have any issues?
--
viresh
Viresh Kumar (4):
cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume
properly
cpufreq: stats: remove hotplug notifiers
cpufreq: stats: free table and remove sysfs entry in a single routine
cpufreq: stats: create sysfs entries when cpufreq_stats is a module
drivers/cpufreq/cpufreq.c | 5 ++
drivers/cpufreq/cpufreq_stats.c | 109 ++++++++++++++++++----------------------
include/linux/cpufreq.h | 2 +
3 files changed, 55 insertions(+), 61 deletions(-)
--
1.7.12.rc2.18.g61b472e
This patch adds PM notifiers for handling suspend/resume of cpufreq governors.
This is required for early suspend and late resume of governors.
There are multiple reasons that support this patch:
- Firstly it looks very much logical to stop governors when we know we are going
into suspend. But the question is when? Is PM notifiers the right place?
Following reasons are the supporting hands for this decision.
- Nishanth Menon (TI) found an interesting problem on his platform, OMAP. His board
wasn't working well with suspend/resume as calls for removing non-boot CPUs
was turning out into a call to drivers ->target() which then tries to play
with regulators. But regulators and their I2C bus were already suspended and
this resulted in a failure. This is why we need a PM notifier here.
- Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found another issue where
tunables configuration for clusters/sockets with non-boot CPUs was getting
lost after suspend/resume, as we were notifying governors with
CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that policy and so
deallocating memory for tunables.
All above problems get fixed with having a PM notifier in place which will stop
any operation on governor. Hence no need to do any special handling of variables
like (frozen) in suspend/resume paths.
Reported-by: Lan Tianyu <tianyu.lan(a)intel.com>
Reported-by: Nishanth Menon <nm(a)ti.com>
Reported-by: Jinhyuk Choi <jinchoi(a)broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org>
---
Hi Guys,
Can you please verify if this fixes issues reported by you? I have tested this
for multiple suspend-resumes on my thinkpad. It doesn't crash :)
drivers/cpufreq/cpufreq.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 02d534d..c87ced9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -26,6 +26,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/slab.h>
+#include <linux/suspend.h>
#include <linux/syscore_ops.h>
#include <linux/tick.h>
#include <trace/events/power.h>
@@ -47,6 +48,9 @@ static LIST_HEAD(cpufreq_policy_list);
static DEFINE_PER_CPU(char[CPUFREQ_NAME_LEN], cpufreq_cpu_governor);
#endif
+/* Flag to suspend/resume CPUFreq governors */
+static bool cpufreq_suspended;
+
static inline bool has_target(void)
{
return cpufreq_driver->target_index || cpufreq_driver->target;
@@ -1462,6 +1466,54 @@ static struct subsys_interface cpufreq_interface = {
.remove_dev = cpufreq_remove_dev,
};
+/*
+ * PM Notifier for suspending governors as some platforms can't change frequency
+ * after this point in suspend cycle. Because some of the devices (like: i2c,
+ * regulators, etc) they use for changing frequency are suspended quickly after
+ * this point.
+ */
+static int cpufreq_pm_notify(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+ struct cpufreq_policy *policy;
+ unsigned long flags;
+
+ if (!has_target())
+ return NOTIFY_OK;
+
+ if (action == PM_SUSPEND_PREPARE) {
+ pr_debug("%s: Suspending Governors\n", __func__);
+
+ list_for_each_entry(policy, &cpufreq_policy_list, policy_list)
+ if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP))
+ pr_err("%s: Failed to stop governor for policy: %p\n",
+ __func__, policy);
+
+ write_lock_irqsave(&cpufreq_driver_lock, flags);
+ cpufreq_suspended = true;
+ write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ } else if (action == PM_POST_SUSPEND) {
+ pr_debug("%s: Resuming Governors\n", __func__);
+
+ write_lock_irqsave(&cpufreq_driver_lock, flags);
+ cpufreq_suspended = false;
+ write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
+ list_for_each_entry(policy, &cpufreq_policy_list, policy_list)
+ if (__cpufreq_governor(policy, CPUFREQ_GOV_START) ||
+ __cpufreq_governor(policy,
+ CPUFREQ_GOV_LIMITS))
+ pr_err("%s: Failed to start governor for policy: %p\n",
+ __func__, policy);
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block cpufreq_pm_notifier = {
+ .notifier_call = cpufreq_pm_notify,
+};
+
/**
* cpufreq_bp_suspend - Prepare the boot CPU for system suspend.
*
@@ -1752,6 +1804,8 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_target);
static int __cpufreq_governor(struct cpufreq_policy *policy,
unsigned int event)
{
+ unsigned long flags;
+ bool is_suspended;
int ret;
/* Only must be defined when default governor is known to have latency
@@ -1764,6 +1818,14 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
struct cpufreq_governor *gov = NULL;
#endif
+ /* Don't start any governor operations if we are entering suspend */
+ read_lock_irqsave(&cpufreq_driver_lock, flags);
+ is_suspended = cpufreq_suspended;
+ read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
+ if (is_suspended)
+ return 0;
+
if (policy->governor->max_transition_latency &&
policy->cpuinfo.transition_latency >
policy->governor->max_transition_latency) {
@@ -2222,6 +2284,7 @@ static int __init cpufreq_core_init(void)
cpufreq_global_kobject = kobject_create();
BUG_ON(!cpufreq_global_kobject);
register_syscore_ops(&cpufreq_syscore_ops);
+ register_pm_notifier(&cpufreq_pm_notifier);
return 0;
}
--
1.7.12.rc2.18.g61b472e