Jason/Thomas:
This would be a resend except Steven Rostedt noticed a redundant
memory barrier I had copied from the x86 code. The redundant barrier
is now removed and there are no other changes since the code was posted
a fortnight ago. Any chance of taking the first five of these patches
via the irqchip route? The x86 patch has an ack from Ingo, printk has no
explicit maintainer and I've done plenty of bisectability tests on the
patchset so leaving the last patch for the next dev. cycle should be
no trouble.
This patchset modifies the GIC driver to allow it, on supported
platforms, to route IPI interrupts to FIQ. It then uses this
feature to implement arch_trigger_all_cpu_backtrace for arm.
In order to neatly bring in the changes for the arm we also rearrange
some of the existing x86 NMI code to make it architecture neutral.
The patchset http://thread.gmane.org/gmane.linux.kernel/1897765 , which
makes sched_clock() NMI/FIQ-safe, should be treated as a prerequisite
for the sixth and final patch in the series (which enables the feature
on ARM). Although sched_clock() is not called directly by any of the
code that runs from a FIQ handler it is possible for sched_clock() to be
called indirectly when the function tracer is enabled.
The patches have been runtime tested on two systems capable of
supporting FIQ (Freescale i.MX6 and STiH416) and two that do not
(vexpress-a9 and Qualcomm Snapdragon 600), the changes to the x86
logic were tested on qemu and all patches have been compile tested
on x86, arm and arm64.
Note: On platforms not capable of supporting FIQ, the IPI to generate a
backtrace will fall back to using IRQ for propagation instead.
The backtrace logic contains a timeout to we will not permanently
wedge the requesting CPU if other CPUs are not responsive.
v19:
* Remove redundant memory barrier inherited from the x86 code (Steven
Rostedt).
v18:
* Move printk_nmi_ functions out of printk.c and into their own
file, nmi_callback.c (Joe Perches/Steven Rostedt).
* Rename printk_nmi_ functions so their name matches their new home
(Joe Perches)
v17:
* Rename bl_migration_lock/unlock to gic_migration_lock/unlock
(Nicolas Pitre).
v16:
* Significant clean up of the printk patches (Thomas Gleixner).
Replacing macros with real functions, CONFIG_ARCH_WANT_NMI_PRINTK
-> CONFIG_PRINTK_NMI, prefixing global functions with printk_nmi,
removing pointless exports, removing cpu_mask from the interfaces,
removal of just-in-time initialization of trace buffers, prevented
call sites having to save state, rolled up variable declarations
into single lines.
* Dropped the sched_clock() patches from *this* patchset and managed
them separately (http://thread.gmane.org/gmane.linux.kernel/1879261 ).
The cross-dependancies between the patches are minimal; the backtrace
code only calls sched_clock() if we are ftracing and backtracing is
normally only triggered to report information about about a broken
system (although users can type SysRq-l for amusement, most use it
to find out why the system it dead).
* Squashed together the final two patches. Essentially these duplicated
the x86 code and slavishly avoided changing it before, in the next
patch, fixing it to work better on ARM. It seems better that the code
just works first time!
v15:
* Added a patch to make sched_clock safe to call from NMI (Stephen
Boyd). Note that sched_clock() is not called by the NMI handlers that
have been added for the arm but it could be called if tools such as
ftrace are deployed.
* Fixed some warnings picked up during bisectability testing.
v14:
* Moved a nmi_vprintk() and friends from arch/x86/kernel/apic/hw_nmi.c
to printk.c (Steven Rostedt)
v13:
* Updated the code to print the backtrace to replicate Steven Rostedt's
x86 work to make SysRq-l safe. This is pretty much a total rewrite of
patches 4 and 5.
v12:
* Squash first two patches into a single one and re-describe
(Thomas Gleixner).
* Improve description of "irqchip: gic: Make gic_raise_softirq FIQ-safe"
(Thomas Gleixner).
v11:
* Optimized gic_raise_softirq() by replacing a register read with
a memory read (Jason Cooper).
v10:
* Add a further patch to optimize away some of the locking on systems
where CONFIG_BL_SWITCHER is not set (Marc Zyngier). Compiles OK with
exynos_defconfig (which is the only defconfig to set this option).
* Whitespace fixes in patch 4. That patch previously used spaces for
alignment of new constants but the rest of the file used tabs.
v9:
* Improved documentation and structure of initial patch (now initial
two patches) to make gic_raise_softirq() safe to call from FIQ
(Thomas Gleixner).
* Avoid masking interrupts during gic_raise_softirq(). The use of the
read lock makes this redundant (because we can safely re-enter the
function).
v8:
* Fixed build on arm64 causes by a spurious include file in irq-gic.c.
v7-2 (accidentally released twice with same number):
* Fixed boot regression on vexpress-a9 (reported by Russell King).
* Rebased on v3.18-rc3; removed one patch from set that is already
included in mainline.
* Dropped arm64/fiq.h patch from the set (still useful but not related
to issuing backtraces).
v7:
* Re-arranged code within the patch series to fix a regression
introduced midway through the series and corrected by a later patch
(testing by Olof's autobuilder). Tested offending patch in isolation
using defconfig identified by the autobuilder.
v6:
* Renamed svc_entry's call_trace argument to just trace (example code
from Russell King).
* Fixed mismatched ENDPROC() in __fiq_abt (example code from Russell
King).
* Modified usr_entry to optional avoid calling into the trace code and
used this in FIQ entry from usr path. Modified corresponding exit code
to avoid calling into trace code and the scheduler (example code from
Russell King).
* Ensured the default FIQ register state is restored when the default
FIQ handler is reinstalled (example code from Russell King).
* Renamed no_fiq_insn to dfl_fiq_insn to reflect the effect of adopting
a default FIQ handler.
* Re-instated fiq_safe_migration_lock and associated logic in
gic_raise_softirq(). gic_raise_softirq() is called by wake_up_klogd()
in the console unlock logic.
v5:
* Rebased on 3.17-rc4.
* Removed a spurious line from the final "glue it together" patch
that broke the build.
v4:
* Replaced push/pop with stmfd/ldmfd respectively (review of Nicolas
Pitre).
* Really fix bad pt_regs pointer generation in __fiq_abt.
* Remove fiq_safe_migration_lock and associated logic in
gic_raise_softirq() (review of Russell King)
* Restructured to introduce the default FIQ handler first, before the
new features (review of Russell King).
v3:
* Removed redundant header guards from arch/arm64/include/asm/fiq.h
(review of Catalin Marinas).
* Moved svc_exit_via_fiq macro to entry-header.S (review of Nicolas
Pitre).
v2:
* Restructured to sit nicely on a similar FYI patchset from Russell
King. It now effectively replaces the work in progress final patch
with something much more complete.
* Implemented (and tested) a Thumb-2 implementation of svc_exit_via_fiq
(review of Nicolas Pitre)
* Dropped the GIC group 0 workaround patch. The issue of FIQ interrupts
being acknowledged by the IRQ handler does still exist but should be
harmless because the IRQ handler will still wind up calling
ipi_cpu_backtrace().
* Removed any dependency on CONFIG_FIQ; all cpu backtrace effectively
becomes a platform feature (although the use of non-maskable
interrupts to implement it is best effort rather than guaranteed).
* Better comments highlighting usage of RAZ/WI registers (and parts of
registers) in the GIC code.
Changes *before* v1:
* This patchset is a hugely cut-down successor to "[PATCH v11 00/19]
arm: KGDB NMI/FIQ support". Thanks to Thomas Gleixner for suggesting
the new structure. For historic details see:
https://lkml.org/lkml/2014/9/2/227
* Fix bug in __fiq_abt (no longer passes a bad struct pt_regs value).
In fixing this we also remove the useless indirection previously
found in the fiq_handler macro.
* Make default fiq handler "always on" by migrating from fiq.c to
traps.c and replace do_unexp_fiq with the new handler (review
of Russell King).
* Add arm64 version of fiq.h (review of Russell King)
* Removed conditional branching and code from irq-gic.c, this is
replaced by much simpler code that relies on the GIC specification's
heavy use of read-as-zero/write-ignored (review of Russell King)
Daniel Thompson (6):
irqchip: gic: Optimize locking in gic_raise_softirq
irqchip: gic: Make gic_raise_softirq FIQ-safe
irqchip: gic: Introduce plumbing for IPI FIQ
printk: Simple implementation for NMI backtracing
x86/nmi: Use common printk functions
ARM: Add support for on-demand backtrace of other CPUs
arch/arm/Kconfig | 1 +
arch/arm/include/asm/hardirq.h | 2 +-
arch/arm/include/asm/irq.h | 5 +
arch/arm/include/asm/smp.h | 3 +
arch/arm/kernel/smp.c | 81 ++++++++++++++++
arch/arm/kernel/traps.c | 8 +-
arch/x86/Kconfig | 1 +
arch/x86/kernel/apic/hw_nmi.c | 101 ++------------------
drivers/irqchip/irq-gic.c | 203 +++++++++++++++++++++++++++++++++++++---
include/linux/irqchip/arm-gic.h | 8 ++
include/linux/printk.h | 20 ++++
init/Kconfig | 3 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi_backtrace.c | 147 +++++++++++++++++++++++++++++
14 files changed, 473 insertions(+), 111 deletions(-)
create mode 100644 kernel/printk/nmi_backtrace.c
--
2.1.0
From: Kevin Hilman <khilman(a)linaro.org>
Using the current exynos_defconfig on the exynos5422-odroid-xu3, only
6 of 8 CPUs come online with MCPM boot. CPU0 is an A7, CPUs 1-4 are
A15s and CPU5-7 are the other A7s, but with the current code, CPUs 5
and 7 do not boot:
[...]
Exynos MCPM support installed
CPU1: update cpu_capacity 1535
CPU1: thread -1, cpu 0, socket 0, mpidr 80000000
CPU2: update cpu_capacity 1535
CPU2: thread -1, cpu 1, socket 0, mpidr 80000001
CPU3: update cpu_capacity 1535
CPU3: thread -1, cpu 2, socket 0, mpidr 80000002
CPU4: update cpu_capacity 1535
CPU4: thread -1, cpu 3, socket 0, mpidr 80000003
CPU5: failed to come online
CPU6: update cpu_capacity 448
CPU6: thread -1, cpu 2, socket 1, mpidr 80000102
CPU7: failed to come online
Brought up 6 CPUs
CPU: WARNING: CPU(s) started in wrong/inconsistent modes
(primary CPU mode 0x13)
CPU: This may indicate a broken bootloader or firmware.
Thanks to a tip from Abhilash, this patch gets all 8 CPUs booting
again, but the warning about CPUs started in inconsistent modes
remains. Also, not being terribly familiar with Exynos internals,
it's not at all obvious to me why this register write (done for *all*
secondaries) makes things work works for the 2 secondary CPUs that
didn't come online. It's also not obvious whether this is the right
general fix, since it doesn't seem to be needed on other 542x or 5800
platforms.
I suspect the "right" fix is in the bootloader someplace, but not
knowing this hardware well, I'm not sure if the fix is in u-boot
proper, or somewhere in the binary blobs (bl1/bl2/tz) that start
before u-boot. The u-boot I'm using is from the hardkernel u-boot
repo[1], and I'd welcome any suggestions to try. I'm able to rebuild
my own u-boot from there, but only have binaries for bl1/bl2/tz.
[1] branch "odroidxu3-v2012.07" of: https://github.com/hardkernel/u-boot.git
Cc: Mauro Ribeiro <mauro.ribeiro(a)hardkernel.com>
Cc: Abhilash Kesavan <a.kesavan(a)samsung.com>,
Cc: Andrew Bresticker <abrestic(a)chromium.org>
Cc: Doug Anderson <dianders(a)chromium.org>
Cc: Nicolas Pitre <nicolas.pitre(a)linaro.org>
Signed-off-by: Kevin Hilman <khilman(a)linaro.org>
---
arch/arm/mach-exynos/mcpm-exynos.c | 2 ++
arch/arm/mach-exynos/regs-pmu.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c
index b0d3c2e876fb..612a770d5284 100644
--- a/arch/arm/mach-exynos/mcpm-exynos.c
+++ b/arch/arm/mach-exynos/mcpm-exynos.c
@@ -88,6 +88,8 @@ static int exynos_power_up(unsigned int cpu, unsigned int cluster)
cluster >= EXYNOS5420_NR_CLUSTERS)
return -EINVAL;
+ pmu_raw_writel(0x1, S5P_PMU_SPARE2);
+
/*
* Since this is called with IRQs enabled, and no arch_spin_lock_irq
* variant exists, we need to disable IRQs manually here.
diff --git a/arch/arm/mach-exynos/regs-pmu.h b/arch/arm/mach-exynos/regs-pmu.h
index b5f4406fc1b5..70d9eb5a4fcc 100644
--- a/arch/arm/mach-exynos/regs-pmu.h
+++ b/arch/arm/mach-exynos/regs-pmu.h
@@ -49,6 +49,7 @@
#define S5P_INFORM5 0x0814
#define S5P_INFORM6 0x0818
#define S5P_INFORM7 0x081C
+#define S5P_PMU_SPARE2 0x0908
#define S5P_PMU_SPARE3 0x090C
#define EXYNOS_IROM_DATA2 0x0988
--
2.1.3
From: Rob Clark <robdclark(a)gmail.com>
For devices which have constraints about maximum number of segments in
an sglist. For example, a device which could only deal with contiguous
buffers would set max_segment_count to 1.
The initial motivation is for devices sharing buffers via dma-buf,
to allow the buffer exporter to know the constraints of other
devices which have attached to the buffer. The dma_mask and fields
in 'struct device_dma_parameters' tell the exporter everything else
that is needed, except whether the importer has constraints about
maximum number of segments.
Signed-off-by: Rob Clark <robdclark(a)gmail.com>
[sumits: Minor updates wrt comments]
Signed-off-by: Sumit Semwal <sumit.semwal(a)linaro.org>
---
v3: include Robin Murphy's fix[1] for handling '0' as a value for
max_segment_count
v2: minor updates wrt comments on the first version
[1]: http://article.gmane.org/gmane.linux.kernel.iommu/8175/
include/linux/device.h | 1 +
include/linux/dma-mapping.h | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/device.h b/include/linux/device.h
index fb506738f7b7..a32f9b67315c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -647,6 +647,7 @@ struct device_dma_parameters {
* sg limitations.
*/
unsigned int max_segment_size;
+ unsigned int max_segment_count; /* INT_MAX for unlimited */
unsigned long segment_boundary_mask;
};
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index c3007cb4bfa6..d3351a36d5ec 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -154,6 +154,25 @@ static inline unsigned int dma_set_max_seg_size(struct device *dev,
return -EIO;
}
+#define DMA_SEGMENTS_MAX_SEG_COUNT ((unsigned int) INT_MAX)
+
+static inline unsigned int dma_get_max_seg_count(struct device *dev)
+{
+ if (dev->dma_parms && dev->dma_parms->max_segment_count)
+ return dev->dma_parms->max_segment_count;
+ return DMA_SEGMENTS_MAX_SEG_COUNT;
+}
+
+static inline int dma_set_max_seg_count(struct device *dev,
+ unsigned int count)
+{
+ if (dev->dma_parms) {
+ dev->dma_parms->max_segment_count = count;
+ return 0;
+ }
+ return -EIO;
+}
+
static inline unsigned long dma_get_seg_boundary(struct device *dev)
{
return dev->dma_parms ?
--
1.9.1
Hi Rafael,
The aim of this series is to stop managing cpufreq sysfs directories on CPU
hotplugs.
Currently on removal of a 'cpu != policy->cpu', we remove its sysfs directories
by removing the soft-link. And on removal of policy->cpu, we migrate the sysfs
directories to the next cpu. But if policy->cpu was the last CPU, we remove the
policy completely and allocate it again as soon as the CPUs come back. This has
shortcomings:
- Code Complexity
- Slower hotplug
- sysfs file permissions are reset after all policy->cpus are offlined
- CPUFreq stats history lost after all policy->cpus are offlined
- Special management of sysfs stuff during suspend/resume
To make things simple we stop playing with sysfs files unless the driver is
getting removed. Also the policy is kept intact to be used later.
First few patches provide a clean base for others *more important* patches.
Rebased-over: your bleeding edge branch as there were dependencies on my earlier
patches.
Pushed here:
git://git.linaro.org/people/viresh.kumar/linux.git cpufreq/core/sysfs
v1->V2:
- Dropped the idea of using policy-lists for getting policy for any cpu
- Also dropped fallback list and its per-cpu variable
- Stopped cleaning cpufreq_cpu_data and doing list_del(policy) on logical
hotplug.
- Added support for physical hotplug of CPUs (Untested).
@Srivatsa: Can you please have a look at the above change? I have cc'd you only
on this one.
Saravana Kannan (1):
cpufreq: Track cpu managing sysfs kobjects separately
Cc: Srivatsa Bhat <srivatsa(a)mit.edu>
Viresh Kumar (19):
cpufreq: Add doc style comment about cpufreq_cpu_{get|put}()
cpufreq: Merge __cpufreq_add_dev() and cpufreq_add_dev()
cpufreq: Throw warning when we try to get policy for an invalid CPU
cpufreq: Keep a single path for adding managed CPUs
cpufreq: Clear policy->cpus even for the last CPU
cpufreq: Create for_each_{in}active_policy()
cpufreq: Call schedule_work() for the last active policy
cpufreq: Don't clear cpufreq_cpu_data and policy list for inactive
policies
cpufreq: Get rid of cpufreq_cpu_data_fallback
cpufreq: Don't traverse list of all policies for adding policy for a
cpu
cpufreq: Manage governor usage history with 'policy->last_governor'
cpufreq: Mark policy->governor = NULL for inactive policies
cpufreq: Don't allow updating inactive-policies from sysfs
cpufreq: Stop migrating sysfs files on hotplug
cpufreq: Remove cpufreq_update_policy()
cpufreq: Initialize policy->kobj while allocating policy
cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free()
cpufreq: Restart governor as soon as possible
cpufreq: Add support for physical hoplug of CPUs
drivers/cpufreq/cpufreq.c | 593 ++++++++++++++++++++++++++--------------------
include/linux/cpufreq.h | 5 +-
2 files changed, 340 insertions(+), 258 deletions(-)
--
2.3.0.rc0.44.ga94655d
Hi Ingo/Thomas/Peter,
While queuing a timer, we try to migrate it to a non-idle core if the local core
is idle, but we don't try that if the timer is re-armed from its handler.
There were few unsolved problems due to which it was avoided until now. But
there are cases where solving these problems can be useful. When the timer is
always re-armed from its handler, it never migrates to other cores. And many a
times, it ends up waking an idle core to just service the timer, which could
have been handled by a non-idle core.
Peter suggested [1] few changes which can make that work and the first patch
does exactly that. The second one is a minor improvement, that replaces
'running_timer' pointer with 'busy'. That variable was required as part of a
sanity check during CPU hot-unplug operation. I was not sure if we should drop
this extra variable ('running_timer' or 'busy') and the sanity check.
Because we are using another bit from base pointer to keep track of running
status of timer, we get a warning on blackfin, as it doesn't respect
____cacheline_aligned [2].
kernel/time/timer.c: In function 'init_timers':
kernel/time/timer.c:1731:2: error: call to '__compiletime_assert_1731' declared
with attribute error: BUILD_BUG_ON failed: __alignof__(struct tvec_base)
& TIMER_FLAG_MASK
--
viresh
[1] https://lkml.org/lkml/2015/3/28/32
[2] https://lkml.org/lkml/2015/3/29/178
Cc: Steven Miao <realmz6(a)gmail.com>
Viresh Kumar (2):
timer: Avoid waking up an idle-core by migrate running timer
timer: Replace base-> 'running_timer' with 'busy'
include/linux/timer.h | 3 +-
kernel/time/timer.c | 102 ++++++++++++++++++++++++++++++++++++++------------
2 files changed, 81 insertions(+), 24 deletions(-)
--
2.3.0.rc0.44.ga94655d
Currently WARN_ONCE() and similar macros set __warned *after* calling
the underlying macro. This risks infinite recursion if WARN_ONCE() is
used to implement sanity tests in any code that can be called by printk.
This can be fixed by restructuring the macros to set __warned before
calling further macros.
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
---
Notes:
I discovered this problem when I temporarily added sanity tests to the
irqflags macros during some of my development work but I suspect the
scope is a little wider. I admit I was tempted to throw this change
away after I had finished debugging but for two things prompted me to
post it.
1. It did cost me a few minutes head scratching and I'd like to spare
others the pain.
2. I realized the new code is potentially (and very fractionally) more
efficient: the register containing address of __warned can be reused
and a cache hit is a near certainty for the write.
Don't get too excited about the efficiency gains though they are
extremely modest. Measures as code size benefit and using v4.0-rc4 the
results are:
Kernel GCC version Code size reduction
arm multi_v7_defconfig Linaro 4.8-2014.01 224 bytes
arm64 defconfig Linaro 4.9-2014.09 32 bytes
i386_defconfig Redhat 4.9.2-6 62 bytes
x86_64_defconfig Redhat 4.9.2-6 380 bytes
include/asm-generic/bug.h | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 630dd2372238..f8c8a819c563 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -110,9 +110,10 @@ extern void warn_slowpath_null(const char *file, const int line);
static bool __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition); \
\
- if (unlikely(__ret_warn_once)) \
- if (WARN_ON(!__warned)) \
- __warned = true; \
+ if (unlikely(__ret_warn_once) && !__warned) { \
+ __warned = true; \
+ WARN_ON(true); \
+ } \
unlikely(__ret_warn_once); \
})
@@ -120,9 +121,10 @@ extern void warn_slowpath_null(const char *file, const int line);
static bool __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition); \
\
- if (unlikely(__ret_warn_once)) \
- if (WARN(!__warned, format)) \
- __warned = true; \
+ if (unlikely(__ret_warn_once) && !__warned) { \
+ __warned = true; \
+ WARN(true, format); \
+ } \
unlikely(__ret_warn_once); \
})
@@ -130,9 +132,10 @@ extern void warn_slowpath_null(const char *file, const int line);
static bool __section(.data.unlikely) __warned; \
int __ret_warn_once = !!(condition); \
\
- if (unlikely(__ret_warn_once)) \
- if (WARN_TAINT(!__warned, taint, format)) \
- __warned = true; \
+ if (unlikely(__ret_warn_once) && !__warned) { \
+ __warned = true; \
+ WARN_TAINT(true, taint, format); \
+ } \
unlikely(__ret_warn_once); \
})
--
2.1.0
This patch set enables kdump (crash dump kernel) support on arm64 on top of
Geoff's kexec patchset.
In this version, there are some arm64-specific usage/constraints:
1) "mem=" boot parameter must be specified on crash dump kernel
2) Kvm will not be enabled on crash dump kernel even if configured
See commit messages and Documentation/kdump/kdump.txt for details.
The only concern I have is whether or not we can use the exact same kernel
as both system kernel and crash dump kernel. The current arm64 kernel is
not relocatable in the exact sense but I have no problems in using the same
binary for testing kdump.
I tested the code with
- ATF v1.1 + EDK2(UEFI) v3.0-rc0
- kernel v4.0-rc4 + Geoff' kexec v8
on Base fast model, using my yet-to-be-submitted kexec-tools [1].
You may want to start a kernel with the following boot parameter:
crashkernel=64M@2240M
and try
$ kexec -p --load <vmlinux> --append ...
$ echo c > /proc/sysrq-trigger
To examine vmcore (/proc/vmcore), you may use
- gdb v7.7 or later
- crash + a small patch (to recognize v4.0 kernel)
[1] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git
AKASHI Takahiro (5):
arm64: kdump: reserve memory for crash dump kernel
arm64: kdump: implement machine_crash_shutdown()
arm64: kdump: do not go into EL2 before starting a crash dump kernel
arm64: add kdump support
arm64: enable kdump in the arm64 defconfig
Documentation/kdump/kdump.txt | 31 ++++++++++++++-
arch/arm64/Kconfig | 12 ++++++
arch/arm64/configs/defconfig | 1 +
arch/arm64/include/asm/kexec.h | 34 +++++++++++++++-
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/machine_kexec.c | 55 +++++++++++++++++++++++++-
arch/arm64/kernel/process.c | 7 +++-
arch/arm64/kernel/setup.c | 78 +++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/smp.c | 10 ++++-
10 files changed, 294 insertions(+), 6 deletions(-)
create mode 100644 arch/arm64/kernel/crash_dump.c
--
1.7.9.5
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing
the irqflags macros to modify the GIC PMR (the priority mask register is
accessible as a system register on GICv3 and later) rather than the
PSR. The pseudo-NMI changes are support by a prototype implementation of
arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
In addition to the arm64 changes I've bundled in a few patches from
other patchsets to make the patchset self-contained. Of particular note
of the serial break emulation patch which allows ^B^R^K to be used
instead of a serial break to trigger SysRq-L (FVP UART sockets don't
seem to support serial breaks). This makes it easy to run
arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with
interrupts masked so we are forced to preempt and take the NMI).
The code works-for-me (tm) but there are currently some pretty serious
limitations.
1. Exercised only on the foundation model with gicv3 support. It has
not been exercised on real silicon or even on the more advanced
ARM models.
2. It has been written without any documentation describing GICv3
architecture (which has not yet been released by ARM). I've been
guessing about the behaviour based on the ARMv8 and GICv2
architecture specs. The code works on the foundation model but
I cannot check that it conforms architecturally.
3. Requires GICv3+ hardware together with firmware support to enable
GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is
enabled the kernel will not boot on older hardware. It will be hard
to diagnose because we will crash very early in the boot (i.e.
before the call to start_kernel). Auto-detection might be possible
but the performance and code size cost of adding conditional code to
the irqflags macros probably makes it impractical. As such it may
never be possible to remove this limitation (although it might be
possible to find a way to survive long enough to panic and show the
results on the console).
4. No benchmarking (see #1 above). Unlike the PSR, updates to the PMR
do not self synchronize which requires me to sprinkle isb
instructions fairly liberally. I've been told the cost of isb varies
from almost-free (A53) to somewhat costly (A57) thus we export this
code to reduce kernel performance. However this needs to be
quantified and I am currently unable to do this. I'd really like to
but don't have any suitable hardware.
5. There is no code in el1_irq to detect NMI and switch from IRQ to NMI
handling. This means all the irq handling machinary is re-entered in
order to handle the NMI. This not safe and deadlocks are likely.
This is a severe limitation although, in this proof-of-concept
work, NMI can only be triggered by SysRq-L or severe kernel damage.
This means we just about get away with it for simple test (lockdep
detects that we are doing wrong and shows a backtrace). This is
definitely the first thing that needs to be tackled to take this
code further.
Note also that alternative approaches to implementing a pseudo-NMI on
arm64 are possible but only through runtime cooperation with other
software components in the system, potentially both those running at EL3
and at secure EL1. I should like to explore these options in future but,
as far as I know, this is the only sane way to provide NMI-like features
whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3
firmware (and already implemented by ARM trusted firmware).
Daniel Thompson (7):
serial: Emulate break using control characters
printk: Simple implementation for NMI backtracing
irqchip: gic-v3: Reset BPR during initialization
arm64: irqflags: Reorder the fiq & async macros
arm64: irqflags: Use ICC sysregs to implement IRQ masking
arm64: irqflags: Automatically identify I bit mis-management
arm64: Add support for on-demand backtrace of other CPUs
arch/arm64/Kconfig | 16 ++++
arch/arm64/include/asm/assembler.h | 72 +++++++++++++++--
arch/arm64/include/asm/hardirq.h | 2 +-
arch/arm64/include/asm/irq.h | 5 ++
arch/arm64/include/asm/irqflags.h | 130 ++++++++++++++++++++++++++++--
arch/arm64/include/asm/ptrace.h | 10 +++
arch/arm64/include/asm/smp.h | 4 +
arch/arm64/include/uapi/asm/ptrace.h | 8 ++
arch/arm64/kernel/entry.S | 70 ++++++++++++++---
arch/arm64/kernel/head.S | 27 +++++++
arch/arm64/kernel/irq.c | 6 ++
arch/arm64/kernel/smp.c | 70 +++++++++++++++++
arch/arm64/mm/cache.S | 4 +-
arch/arm64/mm/proc.S | 19 +++++
drivers/irqchip/irq-gic-v3.c | 70 ++++++++++++++++-
include/linux/irqchip/arm-gic-v3.h | 6 +-
include/linux/irqchip/arm-gic.h | 2 +-
include/linux/printk.h | 20 +++++
include/linux/serial_core.h | 83 +++++++++++++++-----
init/Kconfig | 3 +
kernel/printk/Makefile | 1 +
kernel/printk/nmi_backtrace.c | 148 +++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 15 ++++
23 files changed, 746 insertions(+), 45 deletions(-)
create mode 100644 kernel/printk/nmi_backtrace.c
--
2.1.0
This patchset consolidates several changes in the capacity and the usage
tracking of the CPU. It provides a frequency invariant metric of the usage of
CPUs and generally improves the accuracy of load/usage tracking in the
scheduler. The frequency invariant metric is the foundation required for the
consolidation of cpufreq and implementation of a fully invariant load tracking.
These are currently WIP and require several changes to the load balancer
(including how it will use and interprets load and capacity metrics) and
extensive validation. The frequency invariance is done with
arch_scale_freq_capacity and this patchset doesn't provide the backends of
the function which are architecture dependent.
As discussed at LPC14, Morten and I have consolidated our changes into a single
patchset to make it easier to review and merge.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_CAPACITY_SCALE.
This assumption generates wrong decision by creating ghost cores or by
removing real ones when the original capacity of CPUs is different from the
default SCHED_CAPACITY_SCALE. With this patch set, we don't try anymore to
evaluate the number of available cores based on the group_capacity but instead
we evaluate the usage of a group and compare it with its capacity.
This patchset mainly replaces the old capacity_factor method by a new one and
keeps the general policy almost unchanged. These new metrics will be also used
in later patches.
The CPU usage is based on a running time tracking version of the current
implementation of the load average tracking. I also have a version that is
based on the new implementation proposal [1] but I haven't provide the patches
and results as [1] is still under review. I can provide change above [1] to
change how CPU usage is computed and to adapt to new mecanism.
Change since V9
- add a dedicated patch for removing unused capacity_orig
- update some comments and fix typo
- change the condition for actively migrating task on CPU with higher capacity
Change since V8
- reorder patches
Change since V7
- add freq invariance for usage tracking
- add freq invariance for scale_rt
- update comments and commits' message
- fix init of utilization_avg_contrib
- fix prefer_sibling
Change since V6
- add group usage tracking
- fix some commits' messages
- minor fix like comments and argument order
Change since V5
- remove patches that have been merged since v5 : patches 01, 02, 03, 04, 05, 07
- update commit log and add more details on the purpose of the patches
- fix/remove useless code with the rebase on patchset [2]
- remove capacity_orig in sched_group_capacity as it is not used
- move code in the right patch
- add some helper function to factorize code
Change since V4
- rebase to manage conflicts with changes in selection of busiest group
Change since V3:
- add usage_avg_contrib statistic which sums the running time of tasks on a rq
- use usage_avg_contrib instead of runnable_avg_sum for cpu_utilization
- fix replacement power by capacity
- update some comments
Change since V2:
- rebase on top of capacity renaming
- fix wake_affine statistic update
- rework nohz_kick_needed
- optimize the active migration of a task from CPU with reduced capacity
- rename group_activity by group_utilization and remove unused total_utilization
- repair SD_PREFER_SIBLING and use it for SMT level
- reorder patchset to gather patches with same topics
Change since V1:
- add 3 fixes
- correct some commit messages
- replace capacity computation by activity
- take into account current cpu capacity
[1] https://lkml.org/lkml/2014/10/10/131
[2] https://lkml.org/lkml/2014/7/25/589
Morten Rasmussen (2):
sched: Track group sched_entity usage contributions
sched: Make sched entity usage tracking scale-invariant
Vincent Guittot (9):
sched: add utilization_avg_contrib
sched: remove frequency scaling from cpu_capacity
sched: make scale_rt invariant with frequency
sched: add per rq cpu_capacity_orig
sched: get CPU's usage statistic
sched: replace capacity_factor by usage
sched; remove unused capacity_orig from
sched: add SD_PREFER_SIBLING for SMT level
sched: move cfs task on a CPU with higher capacity
include/linux/sched.h | 21 ++-
kernel/sched/core.c | 15 +--
kernel/sched/debug.c | 12 +-
kernel/sched/fair.c | 366 +++++++++++++++++++++++++++++++-------------------
kernel/sched/sched.h | 15 ++-
5 files changed, 271 insertions(+), 158 deletions(-)
--
1.9.1