Jason,
Could you please review my patch below?
See also arm64 maintainer's comment:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-January/313712.h…
Thanks,
-Takahiro AKASHI
I tried to verify kgdb in vanilla kernel on fast model, but it seems that
the single stepping with kgdb doesn't work correctly since its first
appearance at v3.15.
On v3.15, 'stepi' command after breaking the kernel at some breakpoint
steps forward to the next instruction, but the succeeding 'stepi' never
goes beyond that.
On v3.16, 'stepi' moves forward and stops at the next instruction just
after enable_dbg in el1_dbg, and never goes beyond that. This variance of
behavior seems to come in with the following patch in v3.16:
commit 2a2830703a23 ("arm64: debug: avoid accessing mdscr_el1 on fault
paths where possible")
This patch
(1) moves kgdb_disable_single_step() from 'c' command handling to single
step handler.
This makes sure that single stepping gets effective at every 's' command.
Please note that, under the current implementation, single step bit in
spsr, which is cleared by the first single stepping, will not be set
again for the consecutive 's' commands because single step bit in mdscr
is still kept on (that is, kernel_active_single_step() in
kgdb_arch_handle_exception() is true).
(2) re-implements kgdb_roundup_cpus() because the current implementation
enabled interrupts naively. See below.
(3) removes 'enable_dbg' in el1_dbg.
Single step bit in mdscr is turned on in do_handle_exception()->
kgdb_handle_expection() before returning to debugged context, and if
debug exception is enabled in el1_dbg, we will see unexpected single-
stepping in el1_dbg.
Since v3.18, the following patch does the same:
commit 1059c6bf8534 ("arm64: debug: don't re-enable debug exceptions
on return from el1_dbg)
(4) masks interrupts while single-stepping one instruction.
If an interrupt is caught during processing a single-stepping, debug
exception is unintentionally enabled by el1_irq's 'enable_dbg' before
returning to debugged context.
Thus, like in (2), we will see unexpected single-stepping in el1_irq.
Basically (1) and (2) are for v3.15, (3) and (4) for v3.1[67].
* issue fixed by (2):
Without (2), we would see another problem if a breakpoint is set at
interrupt-sensible places, like gic_handle_irq():
KGDB: re-enter error: breakpoint removed ffffffc000081258
------------[ cut here ]------------
WARNING: CPU: 0 PID: 650 at kernel/debug/debug_core.c:435
kgdb_handle_exception+0x1dc/0x1f4()
Modules linked in:
CPU: 0 PID: 650 Comm: sh Not tainted 3.17.0-rc2+ #177
Call trace:
[<ffffffc000087fac>] dump_backtrace+0x0/0x130
[<ffffffc0000880ec>] show_stack+0x10/0x1c
[<ffffffc0004d683c>] dump_stack+0x74/0xb8
[<ffffffc0000ab824>] warn_slowpath_common+0x8c/0xb4
[<ffffffc0000ab90c>] warn_slowpath_null+0x14/0x20
[<ffffffc000121bfc>] kgdb_handle_exception+0x1d8/0x1f4
[<ffffffc000092ffc>] kgdb_brk_fn+0x18/0x28
[<ffffffc0000821c8>] brk_handler+0x9c/0xe8
[<ffffffc0000811e8>] do_debug_exception+0x3c/0xac
Exception stack(0xffffffc07e027650 to 0xffffffc07e027770)
...
[<ffffffc000083cac>] el1_dbg+0x14/0x68
[<ffffffc00012178c>] kgdb_cpu_enter+0x464/0x5c0
[<ffffffc000121bb4>] kgdb_handle_exception+0x190/0x1f4
[<ffffffc000092ffc>] kgdb_brk_fn+0x18/0x28
[<ffffffc0000821c8>] brk_handler+0x9c/0xe8
[<ffffffc0000811e8>] do_debug_exception+0x3c/0xac
Exception stack(0xffffffc07e027ac0 to 0xffffffc07e027be0)
...
[<ffffffc000083cac>] el1_dbg+0x14/0x68
[<ffffffc00032e4b4>] __handle_sysrq+0x11c/0x190
[<ffffffc00032e93c>] write_sysrq_trigger+0x4c/0x60
[<ffffffc0001e7d58>] proc_reg_write+0x54/0x84
[<ffffffc000192fa4>] vfs_write+0x98/0x1c8
[<ffffffc0001939b0>] SyS_write+0x40/0xa0
Once some interrupt occurs, a breakpoint at gic_handle_irq() triggers kgdb.
Kgdb then calls kgdb_roundup_cpus() to sync with other cpus.
Current kgdb_roundup_cpus() unmasks interrupts temporarily to
use smp_call_function().
This eventually allows another interrupt to occur and likely results in
hitting a breakpoint at gic_handle_irq() again since debug exception is
always enabled in el1_irq.
We can avoid this issue by specifying "nokgdbroundup" in kernel parameter,
but this will also leave other cpus be in unknown state in terms of kgdb,
and may result in interfering with kgdb activity.
Signed-off-by: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
---
arch/arm64/kernel/kgdb.c | 60 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 46 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a0d10c5..81b5910 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -19,9 +19,13 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
+#include <linux/cpumask.h>
#include <linux/irq.h>
+#include <linux/irq_work.h>
#include <linux/kdebug.h>
#include <linux/kgdb.h>
+#include <linux/percpu.h>
+#include <asm/ptrace.h>
#include <asm/traps.h>
struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
@@ -95,6 +99,9 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
{ "fpcr", 4, -1 },
};
+static DEFINE_PER_CPU(unsigned int, kgdb_pstate);
+static DEFINE_PER_CPU(struct irq_work, kgdb_irq_work);
+
char *dbg_get_reg(int regno, void *mem, struct pt_regs *regs)
{
if (regno >= DBG_MAX_REG_NUM || regno < 0)
@@ -176,18 +183,14 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
* over and over again.
*/
kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
- atomic_set(&kgdb_cpu_doing_single_step, -1);
- kgdb_single_step = 0;
-
- /*
- * Received continue command, disable single step
- */
- if (kernel_active_single_step())
- kernel_disable_single_step();
err = 0;
break;
case 's':
+ /* mask interrupts while single stepping */
+ __this_cpu_write(kgdb_pstate, linux_regs->pstate);
+ linux_regs->pstate |= PSR_I_BIT;
+
/*
* Update step address value with address passed
* with step packet.
@@ -198,8 +201,6 @@ int kgdb_arch_handle_exception(int exception_vector, int signo,
*/
kgdb_arch_update_addr(linux_regs, remcom_in_buffer);
atomic_set(&kgdb_cpu_doing_single_step, raw_smp_processor_id());
- kgdb_single_step = 1;
-
/*
* Enable single step handling
*/
@@ -229,6 +230,18 @@ static int kgdb_compiled_brk_fn(struct pt_regs *regs, unsigned int esr)
static int kgdb_step_brk_fn(struct pt_regs *regs, unsigned int esr)
{
+ unsigned int pstate;
+
+ kernel_disable_single_step();
+ atomic_set(&kgdb_cpu_doing_single_step, -1);
+
+ /* restore interrupt mask status */
+ pstate = __this_cpu_read(kgdb_pstate);
+ if (pstate & PSR_I_BIT)
+ regs->pstate |= PSR_I_BIT;
+ else
+ regs->pstate &= ~PSR_I_BIT;
+
kgdb_handle_exception(1, SIGTRAP, 0, regs);
return 0;
}
@@ -249,16 +262,27 @@ static struct step_hook kgdb_step_hook = {
.fn = kgdb_step_brk_fn
};
-static void kgdb_call_nmi_hook(void *ignored)
+static void kgdb_roundup_hook(struct irq_work *work)
{
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}
void kgdb_roundup_cpus(unsigned long flags)
{
- local_irq_enable();
- smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
+ int cpu;
+ struct cpumask mask;
+ struct irq_work *work;
+
+ mask = *cpu_online_mask;
+ cpumask_clear_cpu(smp_processor_id(), &mask);
+ cpu = cpumask_first(&mask);
+ if (cpu >= nr_cpu_ids)
+ return;
+
+ for_each_cpu(cpu, &mask) {
+ work = per_cpu_ptr(&kgdb_irq_work, cpu);
+ irq_work_queue_on(work, cpu);
+ }
}
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
@@ -299,6 +323,8 @@ static struct notifier_block kgdb_notifier = {
int kgdb_arch_init(void)
{
int ret = register_die_notifier(&kgdb_notifier);
+ int cpu;
+ struct irq_work *work;
if (ret != 0)
return ret;
@@ -306,6 +332,12 @@ int kgdb_arch_init(void)
register_break_hook(&kgdb_brkpt_hook);
register_break_hook(&kgdb_compiled_brkpt_hook);
register_step_hook(&kgdb_step_hook);
+
+ for_each_possible_cpu(cpu) {
+ work = per_cpu_ptr(&kgdb_irq_work, cpu);
+ init_irq_work(work, kgdb_roundup_hook);
+ }
+
return 0;
}
--
1.7.9.5
This patchset modifies the GIC driver to allow it, on supported
platforms, to route IPI interrupts to FIQ. It then uses this
feature to implement arch_trigger_all_cpu_backtrace for arm.
In order to neatly deliver the changes for the arm we also
rearrange some of the existing x86 NMI code to make it architecture
neutral.
The patches have been runtime tested on both a system capable of
supporting FIQ (Freescale i.MX6) and one that cannot (Qualcomm
Snapdragon 600). In addition older versions of this patchset
have been tested on STiH416 and vexpress-a9. The changes to the x86
logic were tested using qemu.
v21:
* Change the way SGIs are raised to try to increase robustness starting
secondary cores. This is a theoretic fix for a regression reported
by Mark Rutland on vexpress-tc2 but it also allows us to remove
igroup0_shadow entirely since it is no longer needed.
* Fix a couple of variable names and add comments to describe the
hardware behavior better (Mark Rutland).
* Improved MULTI_IRQ_HANDLER support by clearing FIQs using
handle_arch_irq (Marc Zygnier).
* Fix gic_cpu_if_down() to ensure group 1 interrupts are disabled
then the interface is brought down.
For changes in v20 and earlier see:
http://thread.gmane.org/gmane.linux.kernel/1928465
Daniel Thompson (6):
irqchip: gic: Optimize locking in gic_raise_softirq
irqchip: gic: Make gic_raise_softirq FIQ-safe
irqchip: gic: Introduce plumbing for IPI FIQ
printk: Simple implementation for NMI backtracing
x86/nmi: Use common printk functions
ARM: Add support for on-demand backtrace of other CPUs
arch/arm/Kconfig | 1 +
arch/arm/include/asm/hardirq.h | 2 +-
arch/arm/include/asm/irq.h | 5 +
arch/arm/include/asm/smp.h | 3 +
arch/arm/kernel/smp.c | 82 +++++++++++++++
arch/arm/kernel/traps.c | 13 ++-
arch/x86/Kconfig | 1 +
arch/x86/kernel/apic/hw_nmi.c | 104 ++-----------------
drivers/irqchip/irq-gic.c | 220 +++++++++++++++++++++++++++++++++++++---
include/linux/irqchip/arm-gic.h | 6 ++
include/linux/printk.h | 20 ++++
init/Kconfig | 3 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi_backtrace.c | 147 +++++++++++++++++++++++++++
14 files changed, 495 insertions(+), 113 deletions(-)
create mode 100644 kernel/printk/nmi_backtrace.c
--
2.4.3
MT8173 is a ARMv8 based SoC with 2 clusters. All CPUs in a single cluster
share the same power and clock domain. This series tries to add cpufreq support
for MT8173 SoC. The v6 of this series is resent with Acks added.
changes in v6:
- Move clock and regulator consumer properties document to the device tree
bindings documents of MT8173 CPU DVFS clock driver
- Add change log to describe what is implemented in the MT8173 cpufreq driver
- Add missed rcu_read_unlock() in the error path
- Move of_init_opp_table() call to make sure all required hardware resources
are already there before it is called
- Add comments to describe why both platform driver and deivce registration
codes are put in the initcall function
- Use the term "voltage tracking" instead of "voltage trace" according to an
internal SoC document
changes in v5:
- Move resource allocation code from init() into probe() and remove some unused
functions due to this change
- Fix descriptions for device tree binding document
- Address review comments for last version
- Register CPU cooling device
Changes in v4:
- Add bindings for MT8173 cpufreq driver
- Move OPP table back into device tree
- Address comments for last version
Changes in v3:
- Implement MT8173 specific standalone cpufreq driver instead of using
cpufreq-dt driver
- Define OPP table in the driver source code until new OPP binding is ready
Changes in v2:
- Add intermediate frequency support in cpufreq-dt driver
- Use voltage scaling code of cpufreq-dt for little cluster instead of
implementaion in notifier of mtk-cpufreq driver
- Code refinement for mtk-cpufreq driver
Pi-Cheng Chen (3):
dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
cpufreq: mediatek: Add MT8173 cpufreq driver
arm64: dts: mt8173: Add mt8173 cpufreq driver support
.../devicetree/bindings/clock/mt8173-cpu-dvfs.txt | 83 ++++
arch/arm64/boot/dts/mediatek/mt8173-evb.dts | 18 +
arch/arm64/boot/dts/mediatek/mt8173.dtsi | 64 +++
drivers/cpufreq/Kconfig.arm | 7 +
drivers/cpufreq/Makefile | 1 +
drivers/cpufreq/mt8173-cpufreq.c | 524 +++++++++++++++++++++
6 files changed, 697 insertions(+)
create mode 100644 Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt
create mode 100644 drivers/cpufreq/mt8173-cpufreq.c
--
1.9.1
Hi Rafael,
I was looking at cpufreq core this morning for some work and got these
minor fixes on the way. Please see if they look good.
--
viresh
Viresh Kumar (7):
cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
cpufreq: use memcpy() to copy policy
cpufreq: update user_policy.* on success
cpufreq: remove redundant 'governor' field from user_policy
cpufreq: remove redundant 'policy' field from user_policy
cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
Documentation/cpu-freq/core.txt | 7 ++-----
drivers/acpi/processor_perflib.c | 2 +-
drivers/cpufreq/cpufreq.c | 31 +++----------------------------
drivers/cpufreq/ppc_cbe_cpufreq_pmi.c | 4 ++--
drivers/video/fbdev/pxafb.c | 1 -
drivers/video/fbdev/sa1100fb.c | 1 -
include/linux/cpufreq.h | 15 ++++++---------
7 files changed, 14 insertions(+), 47 deletions(-)
--
2.4.0
Hi Guys,
This is rebased over following series that adds debugfs support to OPP
core: http://marc.info/?i=cover.1441354424.git.viresh.kumar%40linaro.org
This series extends V2 bindings support further to make it usable to
most of the platforms.
[1-2] update the bindings a bit to get them working for multiple
regulators case.
[3-4] cleanups.
[5-7] Multiple regulator support
[8-16] OPP transition support, so that the user drivers can directly ask
to switch device to a particular OPP, instead of them dealing
with the complexity of handling clocks and voltages.
I have also got cpufreq-dt driver updated to work with the new bindings,
but holded-off those changes to keep this series smaller. Those were
another Nine patches.
For curious developers/reviewers, all required code (debugfs, this and
cpufreq-dt) is pushed to:
https://git.linaro.org/people/viresh.kumar/linux.git opp/multi-regulator-v1
Please help in getting this reviewed :)
Viresh Kumar (16):
PM / OPP: Add 'supply-names' binding
PM / OPP: Add 'opp-microvolt-triplets' binding
PM / OPP: Improve debug print messages with pr_fmt
PM / OPP: Rename routines specific to old bindings with _v1
PM / OPP: Parse all power-supply related bindings together
PM / OPP: Create separate structure for regulator/supplies
PM / OPP: Add multiple regulators support
PM / OPP: get/put regulators from OPP core
PM / OPP: Disable OPPs that aren't supported by the regulators
PM / OPP: Introduce dev_pm_opp_get_max_volt_latency()
PM / OPP: Introduce dev_pm_opp_get_max_transition_latency()
PM / OPP: Parse clock and voltage tolerance for v1 bindings
PM / OPP: Manage device clk as well
PM / OPP: Add dev_pm_opp_set_regulator() to specify regulator
PM / OPP: Add dev_pm_opp_set_rate()
PM / OPP: don't print error message for deferred probing
Documentation/devicetree/bindings/opp/opp.txt | 40 +-
drivers/base/power/opp/core.c | 637 +++++++++++++++++++++++---
drivers/base/power/opp/cpu.c | 8 +-
drivers/base/power/opp/debugfs.c | 52 ++-
drivers/base/power/opp/opp.h | 44 +-
include/linux/pm_opp.h | 25 +
6 files changed, 722 insertions(+), 84 deletions(-)
--
2.4.0
The current implementation of load tracking invariance scales the load
tracking value with current frequency and uarch performance (only for
utilization) of the CPU.
One main result of the current formula is that the figures are capped by
the current capacity of the CPU. This limitation is the main reason of not
including the uarch invariance (arch_scale_cpu_capacity) in the calculation
of load_avg because capping the load can generate erroneous system load
statistic as described with this example [1]
Instead of scaling the complete value of PELT algo, we should only scale
the running time by the current capacity of the CPU. It seems more correct
to only scale the running time because the non running time of a task
(sleeping or waiting for a runqueue) is the same whatever the current freq
and the compute capacity of the CPU.
Then, one main advantage of this change is that the load of a task can
reach max value whatever the current freq and the uarch of the CPU on which
it run. It will just take more time at a lower freq than a max freq or on a
"little" CPU compared to a "big" one. The load and the utilization stay
invariant across system so we can still compared them between CPU but with
a wider range of values.
With this change, we don't have to test if a CPU is overloaded or not in
order to use one metric (util) or another (load) as all metrics are always
valid.
I have put below some examples of duration to reach some typical load value
according to the capacity of the CPU with current implementation
and with this patch.
Util (%) max capacity half capacity(mainline) half capacity(w/ patch)
972 (95%) 138ms not reachable 276ms
486 (47.5%) 30ms 138ms 60ms
256 (25%) 13ms 32ms 26ms
We can see that at half capacity, we need twice the duration of max
capacity with this patch whereas we have a non linear increase of the
duration with current implementation.
[1] https://lkml.org/lkml/2014/12/18/128
Signed-off-by: Vincent Guittot <vincent.guittot(a)linaro.org>
---
kernel/sched/fair.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 824aa9f..f2a18e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2560,10 +2560,9 @@ static __always_inline int
__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
unsigned long weight, int running, struct cfs_rq *cfs_rq)
{
- u64 delta, scaled_delta, periods;
+ u64 delta, periods;
u32 contrib;
- unsigned int delta_w, scaled_delta_w, decayed = 0;
- unsigned long scale_freq, scale_cpu;
+ unsigned int delta_w, decayed = 0;
delta = now - sa->last_update_time;
/*
@@ -2584,8 +2583,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
- scale_freq = arch_scale_freq_capacity(NULL, cpu);
- scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+ if (running) {
+ delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu));
+ delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu));
+ }
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
@@ -2601,16 +2602,15 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
* period and accrue it.
*/
delta_w = 1024 - delta_w;
- scaled_delta_w = cap_scale(delta_w, scale_freq);
if (weight) {
- sa->load_sum += weight * scaled_delta_w;
+ sa->load_sum += weight * delta_w;
if (cfs_rq) {
cfs_rq->runnable_load_sum +=
- weight * scaled_delta_w;
+ weight * delta_w;
}
}
if (running)
- sa->util_sum += scaled_delta_w * scale_cpu;
+ sa->util_sum += delta_w << SCHED_CAPACITY_SHIFT;
delta -= delta_w;
@@ -2627,25 +2627,23 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
/* Efficiently calculate \sum (1..n_period) 1024*y^i */
contrib = __compute_runnable_contrib(periods);
- contrib = cap_scale(contrib, scale_freq);
if (weight) {
sa->load_sum += weight * contrib;
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * contrib;
}
if (running)
- sa->util_sum += contrib * scale_cpu;
+ sa->util_sum += contrib << SCHED_CAPACITY_SHIFT;
}
/* Remainder of delta accrued against u_0` */
- scaled_delta = cap_scale(delta, scale_freq);
if (weight) {
- sa->load_sum += weight * scaled_delta;
+ sa->load_sum += weight * delta;
if (cfs_rq)
- cfs_rq->runnable_load_sum += weight * scaled_delta;
+ cfs_rq->runnable_load_sum += weight * delta;
}
if (running)
- sa->util_sum += scaled_delta * scale_cpu;
+ sa->util_sum += delta << SCHED_CAPACITY_SHIFT;
sa->period_contrib += delta;
--
1.9.1
Hi Rafael,
Rob only needs to Ack the modified 2/5 patch and then you can safely
apply this series.
The first patch enables us to select only a subset of OPPs from the
bigger table, based on what version of the hardware we are running on.
The second one enables us to select slightly different values for
multiple properties, based on what kind of hardware they are running on.
The third one removes an (unused) binding, which is replaced by the
second patch with a better solution.
The fourth patch is based on what Stephen suggested (and then reviewed)
in the earlier series, and the 5th one updates the existing users of
these bindings for it.
V2->V3:
- dropped turbo/suspend named properties
- Applied all the Acks
Viresh Kumar (5):
PM / OPP: Add "opp-supported-hw" binding
PM / OPP: Add {opp-microvolt|opp-microamp}-<name> binding
PM / OPP: Remove 'operating-points-names' binding
PM / OPP: Rename OPP nodes as opp@<opp-hz>
ARM: dts: exynos4412: Rename OPP nodes as opp@<opp-hz>
Documentation/devicetree/bindings/opp/opp.txt | 132 ++++++++++++++++++--------
arch/arm/boot/dts/exynos4412.dtsi | 28 +++---
2 files changed, 107 insertions(+), 53 deletions(-)
--
2.6.2.198.g614a2ac