[RFC PATCH v2 0/7] Pseudo-NMI for arm64 using ICC_PMR_EL1 (GICv3)

List overview All Threads
Download

newer

older

next-20150923 build: 1 failures...

v4.2.1 build: 0 failures 72...

Daniel Thompson

14 Sep 2015 14 Sep '15

1:26 p.m.

This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The patchset includes an implementation of arch_trigger_all_cpu_backtrace() for arm64 allowing the new code to be exercised.

The code works-for-me (tm) and is much more "real" than the last time I shared these patches. However there remain a couple of limitations and caveats:

1. Requires GICv3+ hardware to be effective. The alternatives runtime patching system is employed so systems with earlier GIC architectures are still bootable but will not benefit from NMI simulation.

2. Currently hardcoded to use ICC_PMR_EL1. Extra work might be needed on the alternatives system so we can peacefully coexist with ARMv8.1 KVM support (when kernel will be running at EL2).

3. FVP needs a bit of hacking to be able to run <SysRq-L> from an ISR. That's a shame because <SysRq-L> is a great way to observe an NMI preempting an IRQ handler. Testers are welcome to ping me offline and I can share the hacks (and DTs) I have been using to test with.

4. Testing for non regression on a GICv2 system will require this patch to avoid crashes during <SysRq-L>: http://article.gmane.org/gmane.linux.kernel/2037558

v2:

* Removed the isb instructions. The PMR is self-synchronizing so these are not needed (Dave Martin)

* Use alternative runtime patching to allow the same kernel binary to boot systems with and without GICv3+ (Dave Martin).

* Added code to properly distinguish between NMI and normal IRQ and to call into NMI handling code where needed.

* Replaced the IPI backtrace logic with a newer version (from Russell King).

Daniel Thompson (7): irqchip: gic-v3: Reset BPR during initialization arm64: Add support for on-demand backtrace of other CPUs arm64: alternative: Apply alternatives early in boot process arm64: irqflags: Reorder the fiq & async macros arm64: irqflags: Use ICC sysregs to implement IRQ masking arm64: Implement IPI_CPU_BACKTRACE using pseudo-NMIs arm64: irqflags: Automatically identify I bit mis-management

arch/arm64/Kconfig | 15 ++++ arch/arm64/include/asm/alternative.h | 1 + arch/arm64/include/asm/assembler.h | 56 ++++++++++++- arch/arm64/include/asm/hardirq.h | 2 +- arch/arm64/include/asm/irq.h | 3 + arch/arm64/include/asm/irqflags.h | 154 +++++++++++++++++++++++++++++++++-- arch/arm64/include/asm/ptrace.h | 18 ++++ arch/arm64/include/asm/smp.h | 2 + arch/arm64/kernel/alternative.c | 15 ++++ arch/arm64/kernel/entry.S | 149 +++++++++++++++++++++++++++------ arch/arm64/kernel/head.S | 35 ++++++++ arch/arm64/kernel/setup.c | 13 +++ arch/arm64/kernel/smp.c | 44 ++++++++++ arch/arm64/mm/proc.S | 23 ++++++ drivers/irqchip/irq-gic-v3.c | 117 +++++++++++++++++++++++++- include/linux/irqchip/arm-gic-v3.h | 10 +++ include/linux/irqchip/arm-gic.h | 2 +- lib/nmi_backtrace.c | 8 +- 18 files changed, 629 insertions(+), 38 deletions(-)

-- 2.4.3

Show replies by date

Daniel Thompson

14 Sep 14 Sep

1:26 p.m.

New subject: [RFC PATCH v2 1/7] irqchip: gic-v3: Reset BPR during initialization

Currently, when running on FVP, CPU 0 boots up with its BPR changed from the reset value. This renders it impossible to (preemptively) prioritize interrupts on CPU 0.

This is harmless on normal systems since Linux typically does not support preemptive interrupts. It does however cause problems in systems with additional changes (such as patches for NMI simulation).

Many thanks to Andrew Thoelke for suggesting the BPR as having the potential to harm preemption.

Suggested-by: Andrew Thoelke andrew.thoelke@arm.com Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- drivers/irqchip/irq-gic-v3.c | 13 +++++++++++++ include/linux/irqchip/arm-gic-v3.h | 2 ++ 2 files changed, 15 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 7deed6ef54c2..b47bd971038e 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -127,6 +127,11 @@ static void __maybe_unused gic_write_pmr(u64 val) asm volatile("msr_s " __stringify(ICC_PMR_EL1) ", %0" : : "r" (val)); }

+static void __maybe_unused gic_write_bpr1(u64 val) +{ + asm volatile("msr_s " __stringify(ICC_BPR1_EL1) ", %0" : : "r" (val)); +} + static void __maybe_unused gic_write_ctlr(u64 val) { asm volatile("msr_s " __stringify(ICC_CTLR_EL1) ", %0" : : "r" (val)); @@ -501,6 +506,14 @@ static void gic_cpu_sys_reg_init(void) /* Set priority mask register */ gic_write_pmr(DEFAULT_PMR_VALUE);

+ /* + * Some firmwares hand over to the kernel with the BPR changed from + * its reset value (and with a value large enough to prevent + * any pre-emptive interrupts from working at all). Writing a zero + * to BPR restores is reset value. + */ + gic_write_bpr1(0); + if (static_key_true(&supports_deactivate)) { /* EOI drops priority only (mode 1) */ gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop); diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 9eeeb9589acf..60cc91749e7d 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -292,6 +292,8 @@ #define ICH_VMCR_PMR_SHIFT 24 #define ICH_VMCR_PMR_MASK (0xffUL << ICH_VMCR_PMR_SHIFT)

+#define ICC_BPR0_EL1 sys_reg(3, 0, 12, 8, 3) +#define ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3) #define ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1) #define ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1) #define ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0)

-- 2.4.3

Daniel Thompson

1:26 p.m.

New subject: [RFC PATCH v2 2/7] arm64: Add support for on-demand backtrace of other CPUs

Currently arm64 has no implementation of arch_trigger_all_cpu_backtace. The patch provides one using library code recently added by Russell King for for the majority of the implementation. Currently this is realized using regular irqs but could, in the future, be implemented using NMI-like mechanisms.

Note: There is a small (and nasty) change to the generic code to ensure good stack traces. The generic code currently assumes that show_regs() will include a stack trace but arch/arm64 does not do this so we must add extra code here. Ideas on a better approach here would be very welcome (is there any appetite to change arm64 show_regs() or should we just tease out the dump code into a callback?).

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Cc: Russell King rmk+kernel@arm.linux.org.uk --- arch/arm64/include/asm/hardirq.h | 2 +- arch/arm64/include/asm/irq.h | 3 +++ arch/arm64/kernel/smp.c | 26 ++++++++++++++++++++++++++ lib/nmi_backtrace.c | 8 ++++++-- 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h index 2bb7009bdac7..0af4cdb4b5a9 100644 --- a/arch/arm64/include/asm/hardirq.h +++ b/arch/arm64/include/asm/hardirq.h @@ -20,7 +20,7 @@ #include <linux/threads.h> #include <asm/irq.h>

-#define NR_IPI 5 +#define NR_IPI 6

typedef struct { unsigned int __softirq_pending; diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h index bbb251b14746..6b2724b2a803 100644 --- a/arch/arm64/include/asm/irq.h +++ b/arch/arm64/include/asm/irq.h @@ -21,4 +21,7 @@ static inline void acpi_irq_init(void) } #define acpi_irq_init acpi_irq_init

+extern void arch_trigger_all_cpu_backtrace(bool); +#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x) + #endif diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index dbdaacddd9a5..0f37a33499e2 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -37,6 +37,7 @@ #include <linux/completion.h> #include <linux/of.h> #include <linux/irq_work.h> +#include <linux/nmi.h>

#include <asm/alternative.h> #include <asm/atomic.h> @@ -70,6 +71,7 @@ enum ipi_msg_type { IPI_CPU_STOP, IPI_TIMER, IPI_IRQ_WORK, + IPI_CPU_BACKTRACE, };

/* @@ -623,6 +625,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { S(IPI_CPU_STOP, "CPU stop interrupts"), S(IPI_TIMER, "Timer broadcast interrupts"), S(IPI_IRQ_WORK, "IRQ work interrupts"), + S(IPI_CPU_BACKTRACE, "backtrace interrupts"), };

static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) @@ -743,6 +746,12 @@ void handle_IPI(int ipinr, struct pt_regs *regs) break; #endif

+ case IPI_CPU_BACKTRACE: + irq_enter(); + nmi_cpu_backtrace(regs); + irq_exit(); + break; + default: pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr); break; @@ -794,3 +803,20 @@ int setup_profiling_timer(unsigned int multiplier) { return -EINVAL; } + +static void raise_nmi(cpumask_t *mask) +{ + /* + * Generate the backtrace directly if we are running in a + * calling context that is not preemptible by the backtrace IPI. + */ + if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled()) + nmi_cpu_backtrace(NULL); + + smp_cross_call(mask, IPI_CPU_BACKTRACE); +} + +void arch_trigger_all_cpu_backtrace(bool include_self) +{ + nmi_trigger_all_cpu_backtrace(include_self, raise_nmi); +} diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index be0466a80d0b..29f7c4585f7f 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -149,10 +149,14 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) /* Replace printk to write into the NMI seq */ this_cpu_write(printk_func, nmi_vprintk); pr_warn("NMI backtrace for cpu %d\n", cpu); - if (regs) + if (regs) { show_regs(regs); - else +#ifdef CONFIG_ARM64 + show_stack(NULL, NULL); +#endif + } else { dump_stack(); + } this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));

-- 2.4.3

Daniel Thompson

1:26 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

Currently alternatives are applied very late in the boot process (and a long time after we enable scheduling). Some alternative sequences, such as those that alter the way CPU context is stored, must be applied much earlier in the boot sequence.

Introduce apply_alternatives_early() to allow some alternatives to be applied immediately after we detect the CPU features of the boot CPU.

Currently apply_alternatives_all() is not optimized and will re-patch code that has already been updated. This is harmless but could be removed by adding extra flags to the alternatives store.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/include/asm/alternative.h | 1 + arch/arm64/kernel/alternative.c | 15 +++++++++++++++ arch/arm64/kernel/setup.c | 7 +++++++ 3 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h index d56ec0715157..f9dad1b7c651 100644 --- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -17,6 +17,7 @@ struct alt_instr { u8 alt_len; /* size of new instruction(s), <= orig_len */ };

+void __init apply_alternatives_early(void); void __init apply_alternatives_all(void); void apply_alternatives(void *start, size_t length); void free_alternatives_memory(void); diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index ab9db0e9818c..59989a4bed7c 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -117,6 +117,21 @@ static void __apply_alternatives(void *alt_region) }

/* + * This is called very early in the boot process (directly after we run + * a feature detect on the boot CPU). No need to worry about other CPUs + * here. + */ +void apply_alternatives_early(void) +{ + struct alt_region region = { + .begin = __alt_instructions, + .end = __alt_instructions_end, + }; + + __apply_alternatives(&region); +} + +/* * We might be patching the stop_machine state machine, so implement a * really simple polling protocol here. */ diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 6bab21f84a9f..0cddc5ff8089 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -211,6 +211,13 @@ static void __init setup_processor(void) cpuinfo_store_boot_cpu();

/* + * We now know enough about the boot CPU to apply the + * alternatives that cannot wait until interrupt handling + * and/or scheduling is enabled. + */ + apply_alternatives_early(); + + /* * Check for sane CTR_EL0.CWG value. */ cwg = cache_type_cwg();

-- 2.4.3

Will Deacon

16 Sep 16 Sep

1:05 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:

...

Currently alternatives are applied very late in the boot process (and a long time after we enable scheduling). Some alternative sequences, such as those that alter the way CPU context is stored, must be applied much earlier in the boot sequence.

Introduce apply_alternatives_early() to allow some alternatives to be applied immediately after we detect the CPU features of the boot CPU.

Currently apply_alternatives_all() is not optimized and will re-patch code that has already been updated. This is harmless but could be removed by adding extra flags to the alternatives store.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org

arch/arm64/include/asm/alternative.h | 1 + arch/arm64/kernel/alternative.c | 15 +++++++++++++++ arch/arm64/kernel/setup.c | 7 +++++++ 3 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h index d56ec0715157..f9dad1b7c651 100644 --- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -17,6 +17,7 @@ struct alt_instr { u8 alt_len; /* size of new instruction(s), <= orig_len */ }; +void __init apply_alternatives_early(void); void __init apply_alternatives_all(void); void apply_alternatives(void *start, size_t length); void free_alternatives_memory(void); diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index ab9db0e9818c..59989a4bed7c 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -117,6 +117,21 @@ static void __apply_alternatives(void *alt_region) } /*

This is called very early in the boot process (directly after we run

a feature detect on the boot CPU). No need to worry about other CPUs

here.

*/

+void apply_alternatives_early(void) +{
struct alt_region region = {
.begin	= __alt_instructions,
.end	= __alt_instructions_end,
};

__apply_alternatives(&region);
+}

How do you choose which alternatives are applied early and which are applied later? AFAICT, this just applies everything before we've established the capabilities of the CPUs in the system, which could cause problems for big/little SoCs.

Also, why do we need this for the NMI?

Will

Daniel Thompson

3:51 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On 16/09/15 14:05, Will Deacon wrote:

...

On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:

...
Currently alternatives are applied very late in the boot process (and a long time after we enable scheduling). Some alternative sequences, such as those that alter the way CPU context is stored, must be applied much earlier in the boot sequence.

Introduce apply_alternatives_early() to allow some alternatives to be applied immediately after we detect the CPU features of the boot CPU.

Currently apply_alternatives_all() is not optimized and will re-patch code that has already been updated. This is harmless but could be removed by adding extra flags to the alternatives store.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org

[snip]

...
/*

This is called very early in the boot process (directly after we run

a feature detect on the boot CPU). No need to worry about other CPUs

here.

*/

+void apply_alternatives_early(void) +{
struct alt_region region = {
.begin	= __alt_instructions,
.end	= __alt_instructions_end,
};

__apply_alternatives(&region);
+}
How do you choose which alternatives are applied early and which are applied later? AFAICT, this just applies everything before we've established the capabilities of the CPUs in the system, which could cause problems for big/little SoCs.

They are applied twice. This relies for correctness on the fact that cpufeatures can be set but not unset.

In other words the boot CPU does a feature detect and, as a result, a subset of the required alternatives will be applied. However after this the other CPUs will boot and the the remaining alternatives applied as before.

The current implementation is inefficient (because it will redundantly patch the same code twice) but I don't think it is broken.

...

Also, why do we need this for the NMI?

I was/am concerned that a context saved before the alternatives are applied might be restored afterwards. If that happens the bit that indicates what value to put into the PMR would read during the restore without having been saved first. Applying early ensures that the context save/restore code is updated before it is ever used.

Daniel.

Will Deacon

4:24 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:

...

On 16/09/15 14:05, Will Deacon wrote:

...
On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:

...
Currently alternatives are applied very late in the boot process (and a long time after we enable scheduling). Some alternative sequences, such as those that alter the way CPU context is stored, must be applied much earlier in the boot sequence.

Introduce apply_alternatives_early() to allow some alternatives to be applied immediately after we detect the CPU features of the boot CPU.

Currently apply_alternatives_all() is not optimized and will re-patch code that has already been updated. This is harmless but could be removed by adding extra flags to the alternatives store.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org

[snip]

...
/*

This is called very early in the boot process (directly after we run

a feature detect on the boot CPU). No need to worry about other CPUs

here.

*/

+void apply_alternatives_early(void) +{
struct alt_region region = {
.begin	= __alt_instructions,
.end	= __alt_instructions_end,
};

__apply_alternatives(&region);
+}
How do you choose which alternatives are applied early and which are applied later? AFAICT, this just applies everything before we've established the capabilities of the CPUs in the system, which could cause problems for big/little SoCs.
They are applied twice. This relies for correctness on the fact that cpufeatures can be set but not unset.

In other words the boot CPU does a feature detect and, as a result, a subset of the required alternatives will be applied. However after this the other CPUs will boot and the the remaining alternatives applied as before.

The current implementation is inefficient (because it will redundantly patch the same code twice) but I don't think it is broken.

What about a big/little system where we boot on the big cores and only they support LSE atomics?

...

...
Also, why do we need this for the NMI?

I was/am concerned that a context saved before the alternatives are applied might be restored afterwards. If that happens the bit that indicates what value to put into the PMR would read during the restore without having been saved first. Applying early ensures that the context save/restore code is updated before it is ever used.

Damn, and stop_machine makes use of local_irq_restore immediately after the patching has completed, so it's a non-starter. Still, special-casing this feature via an explicit apply_alternatives call would be better than moving everything earlier, I think.

We also need to think about how an incoming NMI interacts with concurrent patching of later features. I suspect we want to set the I bit, like you do for WFI, unless you can guarantee that no patched sequences run in NMI context.

Will

Daniel Thompson

17 Sep 17 Sep

1:25 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On 16/09/15 17:24, Will Deacon wrote:

...

On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:

...
On 16/09/15 14:05, Will Deacon wrote:

...
On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:

...
Currently alternatives are applied very late in the boot process (and a long time after we enable scheduling). Some alternative sequences, such as those that alter the way CPU context is stored, must be applied much earlier in the boot sequence.

Introduce apply_alternatives_early() to allow some alternatives to be applied immediately after we detect the CPU features of the boot CPU.

Currently apply_alternatives_all() is not optimized and will re-patch code that has already been updated. This is harmless but could be removed by adding extra flags to the alternatives store.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org

[snip]

...
/*

This is called very early in the boot process (directly after we run

a feature detect on the boot CPU). No need to worry about other CPUs

here.

*/

+void apply_alternatives_early(void) +{
struct alt_region region = {
.begin	= __alt_instructions,
.end	= __alt_instructions_end,
};

__apply_alternatives(&region);
+}
How do you choose which alternatives are applied early and which are applied later? AFAICT, this just applies everything before we've established the capabilities of the CPUs in the system, which could cause problems for big/little SoCs.
They are applied twice. This relies for correctness on the fact that cpufeatures can be set but not unset.

In other words the boot CPU does a feature detect and, as a result, a subset of the required alternatives will be applied. However after this the other CPUs will boot and the the remaining alternatives applied as before.

The current implementation is inefficient (because it will redundantly patch the same code twice) but I don't think it is broken.
What about a big/little system where we boot on the big cores and only they support LSE atomics?

Hmmnn... I don't think this patch will impact that.

Once something in the boot sequence calls cpus_set_cap() then if there is a corresponding alternative then it is *going* to be applied isn't it? The patch only means that some of the alternatives will be applied early. Once the boot is complete the patched .text should be the same with and without the patch.

Have I overlooked some code in the current kernel that prevents a system with mis-matched LSE support from applying the alternatives?

...

...
...
Also, why do we need this for the NMI?

I was/am concerned that a context saved before the alternatives are applied might be restored afterwards. If that happens the bit that indicates what value to put into the PMR would read during the restore without having been saved first. Applying early ensures that the context save/restore code is updated before it is ever used.

Damn, and stop_machine makes use of local_irq_restore immediately after the patching has completed, so it's a non-starter. Still, special-casing this feature via an explicit apply_alternatives call would be better than moving everything earlier, I think.

Can you expand on you concerns here? Assuming I didn't miss anything about how the current machinery works then it really is only a matter of whether applying some alternatives early could harm the boot sequence. After we have booted the results should be the same.

...

We also need to think about how an incoming NMI interacts with concurrent patching of later features. I suspect we want to set the I bit, like you do for WFI, unless you can guarantee that no patched sequences run in NMI context.

Good point. I'll fix this in the next respin.

Daniel.

Will Deacon

2:01 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On Thu, Sep 17, 2015 at 02:25:56PM +0100, Daniel Thompson wrote:

...

On 16/09/15 17:24, Will Deacon wrote:

...
On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:

...
On 16/09/15 14:05, Will Deacon wrote:

...
On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:

...
/*

This is called very early in the boot process (directly after we run

a feature detect on the boot CPU). No need to worry about other CPUs

here.

*/

+void apply_alternatives_early(void) +{
struct alt_region region = {
.begin	= __alt_instructions,
.end	= __alt_instructions_end,
};

__apply_alternatives(&region);
+}
How do you choose which alternatives are applied early and which are applied later? AFAICT, this just applies everything before we've established the capabilities of the CPUs in the system, which could cause problems for big/little SoCs.
They are applied twice. This relies for correctness on the fact that cpufeatures can be set but not unset.

In other words the boot CPU does a feature detect and, as a result, a subset of the required alternatives will be applied. However after this the other CPUs will boot and the the remaining alternatives applied as before.

The current implementation is inefficient (because it will redundantly patch the same code twice) but I don't think it is broken.
What about a big/little system where we boot on the big cores and only they support LSE atomics?
Hmmnn... I don't think this patch will impact that.

Once something in the boot sequence calls cpus_set_cap() then if there is a corresponding alternative then it is *going* to be applied isn't it? The patch only means that some of the alternatives will be applied early. Once the boot is complete the patched .text should be the same with and without the patch.

Have I overlooked some code in the current kernel that prevents a system with mis-matched LSE support from applying the alternatives?

Sorry, I'm thinking slightly ahead of myself, but the series from Suzuki creates a shadow "safe" view of the ID registers in the system, corresponding to the intersection of CPU features:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370386....

In this case, it is necessary to inspect all of the possible CPUs before we can apply the patching, but as I say above, I'm prepared to make an exception for NMI because I don't think we can assume a safe value anyway for a system with mismatched GIC CPU interfaces. I just don't want to drag all of the alternatives patching earlier as well.

...

...
We also need to think about how an incoming NMI interacts with concurrent patching of later features. I suspect we want to set the I bit, like you do for WFI, unless you can guarantee that no patched sequences run in NMI context.

Good point. I'll fix this in the next respin.

Great, thanks. It probably also means that the NMI code needs __kprobes/__notrace annotations for similar reasons.

Will

Daniel Thompson

3:28 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On 17/09/15 15:01, Will Deacon wrote:

...

On Thu, Sep 17, 2015 at 02:25:56PM +0100, Daniel Thompson wrote:

...
On 16/09/15 17:24, Will Deacon wrote:

...
On Wed, Sep 16, 2015 at 04:51:12PM +0100, Daniel Thompson wrote:

...
On 16/09/15 14:05, Will Deacon wrote:

...
On Mon, Sep 14, 2015 at 02:26:17PM +0100, Daniel Thompson wrote:

...
/*
This is called very early in the boot process (directly after we run

a feature detect on the boot CPU). No need to worry about other CPUs

here.

*/

+void apply_alternatives_early(void) +{
struct alt_region region = {
.begin	= __alt_instructions,
.end	= __alt_instructions_end,
};

__apply_alternatives(&region);
+}
How do you choose which alternatives are applied early and which are applied later? AFAICT, this just applies everything before we've established the capabilities of the CPUs in the system, which could cause problems for big/little SoCs.
They are applied twice. This relies for correctness on the fact that cpufeatures can be set but not unset.

In other words the boot CPU does a feature detect and, as a result, a subset of the required alternatives will be applied. However after this the other CPUs will boot and the the remaining alternatives applied as before.

The current implementation is inefficient (because it will redundantly patch the same code twice) but I don't think it is broken.
What about a big/little system where we boot on the big cores and only they support LSE atomics?
Hmmnn... I don't think this patch will impact that.

Once something in the boot sequence calls cpus_set_cap() then if there is a corresponding alternative then it is *going* to be applied isn't it? The patch only means that some of the alternatives will be applied early. Once the boot is complete the patched .text should be the same with and without the patch.

Have I overlooked some code in the current kernel that prevents a system with mis-matched LSE support from applying the alternatives?
Sorry, I'm thinking slightly ahead of myself, but the series from Suzuki creates a shadow "safe" view of the ID registers in the system, corresponding to the intersection of CPU features:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370386....

In this case, it is necessary to inspect all of the possible CPUs before we can apply the patching, but as I say above, I'm prepared to make an exception for NMI because I don't think we can assume a safe value anyway for a system with mismatched GIC CPU interfaces. I just don't want to drag all of the alternatives patching earlier as well.

Thanks. I'll take a close look at this patch set and work out how to cooperate with it.

However I would like, if I can, to persuade you that we are making an exception ARM64_HAS_SYSREG_GIC_CPUIF rather than specifically for things that are NMI related. AFAIK all ARMv8 cores have a GIC_CPUIF and the system either has a GICv3+ or it doesn't so it shouldn't matter what core you check the feature on; it is in the nature of the feature we are detecting that it is safe to patch early.

To some extent this is quibbling about semantics but:

1. Treating this as a general case will put us in a good position if we ever have to deal with an errata that cannot wait until the system has nearly finished booting.

2. It makes the resulting code very simple because we can just have a bitmask indicating which cpufeatures we need should apply early and which we apply late. That in turn means we don't have to differentiate NMI alternatives from other alternatives (thus avoiding a bunch of new alternative macros).

I'm not seeking any kind binding agreement from you before you see the patch but if you *know* right now that you would nack something that follows the above thinking then please let me know so I don't waste time writing it ;-) . If you're on the fence I'll happily write the patch and you can see what I think then.

...

...
...
We also need to think about how an incoming NMI interacts with concurrent patching of later features. I suspect we want to set the I bit, like you do for WFI, unless you can guarantee that no patched sequences run in NMI context.

Good point. I'll fix this in the next respin.

Great, thanks. It probably also means that the NMI code needs __kprobes/__notrace annotations for similar reasons.

Oops. That I really should have thought about already (but I didn't).

Daniel.

Will Deacon

3:43 p.m.

New subject: [RFC PATCH v2 3/7] arm64: alternative: Apply alternatives early in boot process

On Thu, Sep 17, 2015 at 04:28:11PM +0100, Daniel Thompson wrote:

...

On 17/09/15 15:01, Will Deacon wrote:

...
Sorry, I'm thinking slightly ahead of myself, but the series from Suzuki creates a shadow "safe" view of the ID registers in the system, corresponding to the intersection of CPU features:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-September/370386....

In this case, it is necessary to inspect all of the possible CPUs before we can apply the patching, but as I say above, I'm prepared to make an exception for NMI because I don't think we can assume a safe value anyway for a system with mismatched GIC CPU interfaces. I just don't want to drag all of the alternatives patching earlier as well.

Thanks. I'll take a close look at this patch set and work out how to cooperate with it.

Brill, thanks.

...

However I would like, if I can, to persuade you that we are making an exception ARM64_HAS_SYSREG_GIC_CPUIF rather than specifically for things that are NMI related.

Sure, I conflated the two above.

...

AFAIK all ARMv8 cores have a GIC_CPUIF and the system either has a GICv3+ or it doesn't so it shouldn't matter what core you check the feature on; it is in the nature of the feature we are detecting that it is safe to patch early.

I'm at all convinced that its not possible to build something with mismatched CPU interfaces, but that's not something we can support in Linux without significant rework of the GIC code, so we can ignore that possibility for now.

...

To some extent this is quibbling about semantics but:

Treating this as a general case will put us in a good position if we ever have to deal with an errata that cannot wait until the system has nearly finished booting.

It makes the resulting code very simple because we can just have a bitmask indicating which cpufeatures we need should apply early and which we apply late. That in turn means we don't have to differentiate NMI alternatives from other alternatives (thus avoiding a bunch of new alternative macros).

I'm not seeking any kind binding agreement from you before you see the patch but if you *know* right now that you would nack something that follows the above thinking then please let me know so I don't waste time writing it ;-) . If you're on the fence I'll happily write the patch and you can see what I think then

I don't object to the early patching if it's done on an opt-in basis for features that (a) really need it and (b) are guaranteed to work across the whole system for anything that Linux supports.

Deal? I think it gives you the rope you need :)

Will

Daniel Thompson

14 Sep 14 Sep

1:26 p.m.

New subject: [RFC PATCH v2 4/7] arm64: irqflags: Reorder the fiq & async macros

Separate out the local fiq & async macros from the various arch inlines. This makes is easier for us (in a later patch) to provide an alternative implementation of these inlines.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/include/asm/irqflags.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index 11cc941bd107..df7477af6389 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -53,12 +53,6 @@ static inline void arch_local_irq_disable(void) : "memory"); }

-#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") -#define local_fiq_disable() asm("msr daifset, #1" : : : "memory") - -#define local_async_enable() asm("msr daifclr, #4" : : : "memory") -#define local_async_disable() asm("msr daifset, #4" : : : "memory") - /* * Save the current interrupt enable state. */ @@ -90,6 +84,12 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) return flags & PSR_I_BIT; }

+#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") +#define local_fiq_disable() asm("msr daifset, #1" : : : "memory") + +#define local_async_enable() asm("msr daifclr, #4" : : : "memory") +#define local_async_disable() asm("msr daifset, #4" : : : "memory") + /* * save and restore debug state */

-- 2.4.3

Daniel Thompson

1:26 p.m.

New subject: [RFC PATCH v2 5/7] arm64: irqflags: Use ICC sysregs to implement IRQ masking

Currently irqflags is implemented using the PSR's I bit. It is possible to implement irqflags by using the co-processor interface to the GIC. Using the co-processor interface makes it feasible to simulate NMIs using GIC interrupt prioritization.

This patch changes the irqflags macros to modify, save and restore ICC_PMR_EL1. This has a substantial knock on effect for the rest of the kernel. There are four reasons for this:

1. The state of the ICC_PMR_EL1_G_BIT becomes part of the CPU context and must be saved and restored during traps. To simplify the additional context management the ICC_PMR_EL1_G_BIT is converted into a fake (reserved) bit within the PSR (PSR_G_BIT). Naturally this approach will need to be changed if future ARM architecture extensions make use of this bit.

2. The hardware automatically masks the I bit (at boot, during traps, etc). When the I bit is set by hardware we must add code to switch from I bit masking and PMR masking.

3. Some instructions, noteably wfi, require that the PMR not be used for interrupt masking. Before calling these instructions we must switch from PMR masking to I bit masking.

4. We use the alternatives system to all a single kernel to boot and switch between the different masking approaches.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/Kconfig | 15 ++++++ arch/arm64/include/asm/assembler.h | 33 ++++++++++-- arch/arm64/include/asm/irqflags.h | 107 +++++++++++++++++++++++++++++++++++++ arch/arm64/include/asm/ptrace.h | 18 +++++++ arch/arm64/kernel/entry.S | 75 ++++++++++++++++++++++---- arch/arm64/kernel/head.S | 35 ++++++++++++ arch/arm64/kernel/setup.c | 6 +++ arch/arm64/mm/proc.S | 23 ++++++++ drivers/irqchip/irq-gic-v3.c | 35 +++++++++++- include/linux/irqchip/arm-gic-v3.h | 8 +++ include/linux/irqchip/arm-gic.h | 2 +- 11 files changed, 341 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7d95663c0160..72fd87fa17a9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -490,6 +490,21 @@ config FORCE_MAX_ZONEORDER default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE) default "11"

+config USE_ICC_SYSREGS_FOR_IRQFLAGS + bool "Use ICC system registers for IRQ masking" + select CONFIG_ARM_GIC_V3 + help + Using the ICC system registers for IRQ masking makes it possible + to simulate NMI on ARM64 systems. This allows several interesting + features (especially debug features) to be used on these systems. + + Say Y here to implement IRQ masking using ICC system + registers. This will result in an unbootable kernel if these + registers are not implemented or made inaccessible by the + EL3 firmare or EL2 hypervisor (if present). + + If unsure, say N + menuconfig ARMV8_DEPRECATED bool "Emulate deprecated/obsolete ARMv8 instructions" depends on COMPAT diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index b51f2cc22ca9..ab7c3ffd6104 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -23,6 +23,9 @@ #ifndef __ASM_ASSEMBLER_H #define __ASM_ASSEMBLER_H

+#include <linux/irqchip/arm-gic-v3.h> +#include <asm/alternative.h> +#include <asm/cpufeature.h> #include <asm/ptrace.h> #include <asm/thread_info.h>

@@ -41,12 +44,30 @@ /* * Enable and disable interrupts. */ - .macro disable_irq + .macro disable_irq, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mov \tmp, #ICC_PMR_EL1_MASKED +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF msr daifset, #2 +alternative_else + msr_s ICC_PMR_EL1, \tmp +alternative_endif +#else + msr daifset, #2 +#endif .endm

- .macro enable_irq + .macro enable_irq, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mov \tmp, #ICC_PMR_EL1_UNMASKED +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + msr daifclr, #2 +alternative_else + msr_s ICC_PMR_EL1, \tmp +alternative_endif +#else msr daifclr, #2 +#endif .endm

/* @@ -78,13 +99,19 @@ 9990: .endm

+ /* * Enable both debug exceptions and interrupts. This is likely to be * faster than two daifclr operations, since writes to this register * are self-synchronising. */ - .macro enable_dbg_and_irq + .macro enable_dbg_and_irq, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + enable_dbg + enable_irq \tmp +#else msr daifclr, #(8 | 2) +#endif .endm

/* diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index df7477af6389..cf8a5184fce7 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -18,8 +18,14 @@

#ifdef __KERNEL__

+#include <linux/irqchip/arm-gic-v3.h> + +#include <asm/alternative.h> +#include <asm/cpufeature.h> #include <asm/ptrace.h>

+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + /* * CPU interrupt mask handling. */ @@ -84,6 +90,107 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) return flags & PSR_I_BIT; }

+static inline void maybe_switch_to_sysreg_gic_cpuif(void) {} + +#else /* CONFIG_IRQFLAGS_GIC_MASKING */ + +/* + * CPU interrupt mask handling. + */ +static inline unsigned long arch_local_irq_save(void) +{ + unsigned long flags, masked = ICC_PMR_EL1_MASKED; + + asm volatile(ALTERNATIVE( + "mrs %0, daif // arch_local_irq_save\n" + "msr daifset, #2", + /* --- */ + "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n" + "msr_s " __stringify(ICC_PMR_EL1) ",%1", + ARM64_HAS_SYSREG_GIC_CPUIF) + : "=&r" (flags) + : "r" (masked) + : "memory"); + + return flags; +} + +static inline void arch_local_irq_enable(void) +{ + unsigned long unmasked = ICC_PMR_EL1_UNMASKED; + + asm volatile(ALTERNATIVE( + "msr daifclr, #2 // arch_local_irq_enable", + "msr_s " __stringify(ICC_PMR_EL1) ",%0", + ARM64_HAS_SYSREG_GIC_CPUIF) + : + : "r" (unmasked) + : "memory"); +} + +static inline void arch_local_irq_disable(void) +{ + unsigned long masked = ICC_PMR_EL1_MASKED; + + asm volatile(ALTERNATIVE( + "msr daifset, #2 // arch_local_irq_disable", + "msr_s " __stringify(ICC_PMR_EL1) ",%0", + ARM64_HAS_SYSREG_GIC_CPUIF) + : + : "r" (masked) + : "memory"); +} + +/* + * Save the current interrupt enable state. + */ +static inline unsigned long arch_local_save_flags(void) +{ + unsigned long flags; + + asm volatile(ALTERNATIVE( + "mrs %0, daif // arch_local_save_flags", + "mrs_s %0, " __stringify(ICC_PMR_EL1), + ARM64_HAS_SYSREG_GIC_CPUIF) + : "=r" (flags) + : + : "memory"); + + return flags; +} + +/* + * restore saved IRQ state + */ +static inline void arch_local_irq_restore(unsigned long flags) +{ + asm volatile(ALTERNATIVE( + "msr daif, %0 // arch_local_irq_restore", + "msr_s " __stringify(ICC_PMR_EL1) ",%0", + ARM64_HAS_SYSREG_GIC_CPUIF) + : + : "r" (flags) + : "memory"); +} + +static inline int arch_irqs_disabled_flags(unsigned long flags) +{ + asm volatile(ALTERNATIVE( + "and %0, %0, #" __stringify(PSR_I_BIT) "\n" + "nop", + /* --- */ + "and %0, %0, # " __stringify(ICC_PMR_EL1_G_BIT) "\n" + "eor %0, %0, # " __stringify(ICC_PMR_EL1_G_BIT), + ARM64_HAS_SYSREG_GIC_CPUIF) + : "+r" (flags)); + + return flags; +} + +void maybe_switch_to_sysreg_gic_cpuif(void); + +#endif /* CONFIG_IRQFLAGS_GIC_MASKING */ + #define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") #define local_fiq_disable() asm("msr daifset, #1" : : : "memory")

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index 536274ed292e..2cd29f2b957d 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -25,6 +25,24 @@ #define CurrentEL_EL1 (1 << 2) #define CurrentEL_EL2 (2 << 2)

+/* PMR values used to mask/unmask interrupts */ +#define ICC_PMR_EL1_G_SHIFT 6 +#define ICC_PMR_EL1_G_BIT (1 << ICC_PMR_EL1_G_SHIFT) +#define ICC_PMR_EL1_UNMASKED 0xf0 +#define ICC_PMR_EL1_MASKED (ICC_PMR_EL1_UNMASKED ^ ICC_PMR_EL1_G_BIT) + +/* + * This is the GIC interrupt mask bit. It is not actually part of the + * PSR and so does not appear in the user API, we are simply using some + * reserved bits in the PSR to store some state from the interrupt + * controller. The context save/restore functions will extract the + * ICC_PMR_EL1_G_BIT and save it as the PSR_G_BIT. + */ +#define PSR_G_BIT 0x00400000 +#define PSR_G_SHIFT 22 +#define PSR_G_PMR_G_SHIFT (PSR_G_SHIFT - ICC_PMR_EL1_G_SHIFT) +#define PSR_I_PMR_G_SHIFT (7 - ICC_PMR_EL1_G_SHIFT) + /* AArch32-specific ptrace requests */ #define COMPAT_PTRACE_GETREGS 12 #define COMPAT_PTRACE_SETREGS 13 diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 4306c937b1ff..ccbe867c7734 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -20,6 +20,7 @@

#include <linux/init.h> #include <linux/linkage.h> +#include <linux/irqchip/arm-gic-v3.h>

#include <asm/alternative.h> #include <asm/assembler.h> @@ -96,6 +97,26 @@ .endif mrs x22, elr_el1 mrs x23, spsr_el1 +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + /* + * Save the context held in the PMR register and copy the current + * I bit state to the PMR. Re-enable of the I bit happens in later + * code that knows what type of trap we are handling. + */ +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + b 1f +alternative_else + mrs_s x20, ICC_PMR_EL1 // Get PMR +alternative_endif + and x20, x20, #ICC_PMR_EL1_G_BIT // Extract mask bit + lsl x20, x20, #PSR_G_PMR_G_SHIFT // Shift to a PSTATE RES0 bit + eor x20, x20, #PSR_G_BIT // Invert bit + orr x23, x20, x23 // Store PMR within PSTATE + mov x20, #ICC_PMR_EL1_MASKED + msr_s ICC_PMR_EL1, x20 // Mask normal interrupts at PMR +1: +#endif + stp lr, x21, [sp, #S_LR] stp x22, x23, [sp, #S_PC]

@@ -141,6 +162,22 @@ alternative_else alternative_endif #endif .endif +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + /* + * Restore the context to the PMR (and ensure the reserved bit is + * restored to zero before being copied back to the PSR). + */ +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + b 1f +alternative_else + and x20, x22, #PSR_G_BIT // Get stolen PSTATE bit +alternative_endif + and x22, x22, #~PSR_G_BIT // Clear stolen bit + lsr x20, x20, #PSR_G_PMR_G_SHIFT // Shift back to PMR mask + eor x20, x20, #ICC_PMR_EL1_UNMASKED // x20 gets 0xf0 or 0xb0 + msr_s ICC_PMR_EL1, x20 // Write to PMR +1: +#endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22 ldp x0, x1, [sp, #16 * 0] @@ -306,14 +343,30 @@ el1_da: mrs x0, far_el1 enable_dbg // re-enable interrupts if they were enabled in the aborted context +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF tbnz x23, #7, 1f // PSR_I_BIT - enable_irq + nop + nop + msr daifclr, #2 +1: +alternative_else + tbnz x23, #PSR_G_SHIFT, 1f // PSR_G_BIT + mov x2, #ICC_PMR_EL1_UNMASKED + msr_s ICC_PMR_EL1, x2 + msr daifclr, #2 1: +alternative_endif +#else + tbnz x23, #7, 1f // PSR_I_BIT + enable_irq x2 +1: +#endif mov x2, sp // struct pt_regs bl do_mem_abort

// disable interrupts before pulling preserved data off the stack - disable_irq + disable_irq x21 kernel_exit 1 el1_sp_pc: /* @@ -466,7 +519,7 @@ el0_da: */ mrs x26, far_el1 // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_dbg_and_irq x0 ct_user_exit bic x0, x26, #(0xff << 56) mov x1, x25 @@ -479,7 +532,7 @@ el0_ia: */ mrs x26, far_el1 // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_dbg_and_irq x0 ct_user_exit mov x0, x26 orr x1, x25, #1 << 24 // use reserved ISS bit for instruction aborts @@ -512,7 +565,7 @@ el0_sp_pc: */ mrs x26, far_el1 // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_dbg_and_irq x0 ct_user_exit mov x0, x26 mov x1, x25 @@ -524,7 +577,7 @@ el0_undef: * Undefined instruction */ // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_dbg_and_irq x0 ct_user_exit mov x0, sp bl do_undefinstr @@ -605,7 +658,7 @@ ENDPROC(cpu_switch_to) * and this includes saving x0 back into the kernel stack. */ ret_fast_syscall: - disable_irq // disable interrupts + disable_irq x21 // disable interrupts str x0, [sp, #S_X0] // returned x0 ldr x1, [tsk, #TI_FLAGS] // re-check for syscall tracing and x2, x1, #_TIF_SYSCALL_WORK @@ -615,7 +668,7 @@ ret_fast_syscall: enable_step_tsk x1, x2 kernel_exit 0 ret_fast_syscall_trace: - enable_irq // enable interrupts + enable_irq x0 // enable interrupts b __sys_trace_return_skipped // we already saved x0

/* @@ -628,7 +681,7 @@ work_pending: mov x0, sp // 'regs' tst x2, #PSR_MODE_MASK // user mode regs? b.ne no_work_pending // returning to kernel - enable_irq // enable interrupts for do_notify_resume() + enable_irq x21 // enable interrupts for do_notify_resume() bl do_notify_resume b ret_to_user work_resched: @@ -638,7 +691,7 @@ work_resched: * "slow" syscall return path. */ ret_to_user: - disable_irq // disable interrupts + disable_irq x21 // disable interrupts ldr x1, [tsk, #TI_FLAGS] and x2, x1, #_TIF_WORK_MASK cbnz x2, work_pending @@ -669,7 +722,7 @@ el0_svc: mov sc_nr, #__NR_syscalls el0_svc_naked: // compat entry point stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number - enable_dbg_and_irq + enable_dbg_and_irq x16 ct_user_exit 1

ldr x16, [tsk, #TI_FLAGS] // check for syscall hooks diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index a055be6125cf..5c27fd2a15c4 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -427,6 +427,38 @@ __create_page_tables: ENDPROC(__create_page_tables) .ltorg

+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +/* + * void maybe_switch_to_sysreg_gic_cpuif(void) + * + * Enable interrupt controller system register access if this feature + * has been detected by the alternatives system. + * + * Before we jump into generic code we must enable interrupt controller system + * register access because this is required by the irqflags macros. We must + * also mask interrupts at the PMR and unmask them within the PSR. That leaves + * us set up and ready for the kernel to make its first call to + * arch_local_irq_enable(). + + * + */ +ENTRY(maybe_switch_to_sysreg_gic_cpuif) +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + b 1f +alternative_else + mrs_s x0, ICC_SRE_EL1 +alternative_endif + orr x0, x0, #1 + msr_s ICC_SRE_EL1, x0 // Set ICC_SRE_EL1.SRE==1 + isb // Make sure SRE is now set + mov x0, ICC_PMR_EL1_MASKED + msr_s ICC_PMR_EL1, x0 // Prepare for unmask of I bit + msr daifclr, #2 // Clear the I bit +1: + ret +ENDPROC(maybe_switch_to_sysreg_gic_cpuif) +#endif + /* * The following fragment of code is executed with the MMU enabled. */ @@ -613,6 +645,9 @@ ENDPROC(secondary_startup) ENTRY(__secondary_switched) ldr x0, [x21] // get secondary_data.stack mov sp, x0 +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + bl maybe_switch_to_sysreg_gic_cpuif +#endif mov x29, #0 b secondary_start_kernel ENDPROC(__secondary_switched) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 0cddc5ff8089..f1685db4255a 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -218,6 +218,12 @@ static void __init setup_processor(void) apply_alternatives_early();

/* + * Conditionally switch to GIC PMR for interrupt masking + * (this will be a nop if we are using normal interrupt masking) + */ + maybe_switch_to_sysreg_gic_cpuif(); + + /* * Check for sane CTR_EL0.CWG value. */ cwg = cache_type_cwg(); diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index e4ee7bd8830a..b53c5de75479 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -20,6 +20,7 @@

#include <linux/init.h> #include <linux/linkage.h> +#include <linux/irqchip/arm-gic-v3.h> #include <asm/assembler.h> #include <asm/asm-offsets.h> #include <asm/hwcap.h> @@ -45,11 +46,33 @@ * cpu_do_idle() * * Idle the processor (wait for interrupt). + * + * If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is set we must do additional + * work to ensure that interrupts are not masked at the PMR (because the + * core will not wake up if we block the wake up signal in the interrupt + * controller). */ ENTRY(cpu_do_idle) +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF +#endif + dsb sy // WFI may enter a low-power mode + wfi + ret +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_else + mrs x0, daif // save I bit + msr daifset, #2 // set I bit + mrs_s x1, ICC_PMR_EL1 // save PMR +alternative_endif + mov x2, #ICC_PMR_EL1_UNMASKED + msr_s ICC_PMR_EL1, x2 // unmask at PMR dsb sy // WFI may enter a low-power mode wfi + msr_s ICC_PMR_EL1, x1 // restore PMR + msr daif, x0 // restore I bit ret +#endif ENDPROC(cpu_do_idle)

#ifdef CONFIG_CPU_PM diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index b47bd971038e..48cc3dfe1a0a 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -35,6 +35,12 @@

#include "irq-gic-common.h"

+/* + * Copied from arm-gic.h (which we cannot include here because it conflicts + * with arm-gic-v3.h) + */ +#define GIC_DIST_PRI 0x400 + struct redist_region { void __iomem *redist_base; phys_addr_t phys_base; @@ -117,8 +123,33 @@ static void gic_redist_wait_for_rwp(void) static u64 __maybe_unused gic_read_iar(void) { u64 irqstat; + u64 __maybe_unused daif; + u64 __maybe_unused pmr; + u64 __maybe_unused default_pmr_value = DEFAULT_PMR_VALUE;

+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r" (irqstat)); +#else + /* + * The PMR may be configured to mask interrupts when this code is + * called, thus in order to acknowledge interrupts we must set the + * PMR to its default value before reading from the IAR. + * + * To do this without taking an interrupt we also ensure the I bit + * is set whilst we are interfering with the value of the PMR. + */ + asm volatile( + "mrs %1, daif\n" /* save I bit */ + "msr daifset, #2\n" /* set I bit */ + "mrs_s %2, " __stringify(ICC_PMR_EL1) "\n" /* save PMR */ + "msr_s " __stringify(ICC_PMR_EL1) ",%3\n" /* set PMR */ + "mrs_s %0, " __stringify(ICC_IAR1_EL1) "\n" /* ack int */ + "msr_s " __stringify(ICC_PMR_EL1) ",%2\n" /* restore PMR */ + "isb\n" + "msr daif, %1" /* restore I */ + : "=r" (irqstat), "=&r" (daif), "=&r" (pmr) + : "r" (default_pmr_value)); +#endif return irqstat; }

@@ -149,7 +180,7 @@ static void __maybe_unused gic_write_sgi1r(u64 val) asm volatile("msr_s " __stringify(ICC_SGI1R_EL1) ", %0" : : "r" (val)); }

-static void gic_enable_sre(void) +static void __maybe_unused gic_enable_sre(void) { u64 val;

@@ -500,11 +531,13 @@ static int gic_populate_rdist(void)

static void gic_cpu_sys_reg_init(void) { +#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS /* Enable system registers */ gic_enable_sre();

/* Set priority mask register */ gic_write_pmr(DEFAULT_PMR_VALUE); +#endif

/* * Some firmwares hand over to the kernel with the BPR changed from diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 60cc91749e7d..5672ca0d253a 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -18,6 +18,7 @@ #ifndef __LINUX_IRQCHIP_ARM_GIC_V3_H #define __LINUX_IRQCHIP_ARM_GIC_V3_H

+#include <asm/barrier.h> #include <asm/sysreg.h>

/* @@ -371,6 +372,13 @@ #include <asm/msi.h>

/* + * This header is included from a lot of critical places (including + * from asm/irqflags.h). We must forward declare a few bits and pieces + * needed later in this file to avoid header loops. + */ +struct device_node; + +/* * We need a value to serve as a irq-type for LPIs. Choose one that will * hopefully pique the interest of the reviewer. */ diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h index b8901dfd9e95..64a38b3248cb 100644 --- a/include/linux/irqchip/arm-gic.h +++ b/include/linux/irqchip/arm-gic.h @@ -53,7 +53,7 @@ #define GICD_INT_EN_CLR_X32 0xffffffff #define GICD_INT_EN_SET_SGI 0x0000ffff #define GICD_INT_EN_CLR_PPI 0xffff0000 -#define GICD_INT_DEF_PRI 0xa0 +#define GICD_INT_DEF_PRI 0xc0 #define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\ (GICD_INT_DEF_PRI << 16) |\ (GICD_INT_DEF_PRI << 8) |\

-- 2.4.3

Daniel Thompson

1:26 p.m.

New subject: [RFC PATCH v2 6/7] arm64: Implement IPI_CPU_BACKTRACE using pseudo-NMIs

Recently arm64 gained the capability to (optionally) mask interrupts using the GIC PMR rather than the CPU PSR. That allows us to introduce an NMI-like means to handle backtrace requests.

This provides a useful debug aid by allowing the kernel to robustly show a backtrace for every processor in the system when, for example, we hang trying to acquire a spin lock.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/include/asm/assembler.h | 23 +++++++++++ arch/arm64/include/asm/smp.h | 2 + arch/arm64/kernel/entry.S | 78 ++++++++++++++++++++++++++++++-------- arch/arm64/kernel/smp.c | 20 +++++++++- drivers/irqchip/irq-gic-v3.c | 69 +++++++++++++++++++++++++++++++++ 5 files changed, 176 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index ab7c3ffd6104..da6b8d9913de 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -42,6 +42,29 @@ .endm

/* + * Enable and disable pseudo NMI. + */ + .macro disable_nmi +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + nop +alternative_else + msr daifset, #2 +alternative_endif +#endif + .endm + + .macro enable_nmi +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + nop +alternative_else + msr daifclr, #2 +alternative_endif +#endif + .endm + +/* * Enable and disable interrupts. */ .macro disable_irq, tmp diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index d9c3d6a6100a..fc310b6486b1 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -20,6 +20,8 @@ #include <linux/cpumask.h> #include <linux/thread_info.h>

+#define SMP_IPI_NMI_MASK (1 << 5) + #define raw_smp_processor_id() (current_thread_info()->cpu)

struct seq_file; diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index ccbe867c7734..2f4d69f62138 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -205,6 +205,40 @@ alternative_endif and \rd, \rd, #~(THREAD_SIZE - 1) // top of stack .endm

+ .macro trace_hardirqs_off, pstate +#ifdef CONFIG_TRACE_IRQFLAGS +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + bl trace_hardirqs_off + nop +alternative_else + tbnz \pstate, #PSR_G_SHIFT, 1f // PSR_G_BIT + bl trace_hardirqs_off +1: +alternative_endif +#else + bl trace_hardirqs_off +#endif +#endif + .endm + + .macro trace_hardirqs_on, pstate +#ifdef CONFIG_TRACE_IRQFLAGS +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF + bl trace_hardirqs_on + nop +alternative_else + tbnz \pstate, #PSR_G_SHIFT, 1f // PSR_G_BIT + bl trace_hardirqs_on +1: +alternative_endif +#else + bl trace_hardirqs_on +#endif +#endif + .endm + /* * These are the registers used in the syscall handler, and allow us to * have in theory up to 7 arguments to a function - x0 to x6. @@ -341,20 +375,19 @@ el1_da: * Data abort handling */ mrs x0, far_el1 + enable_nmi enable_dbg // re-enable interrupts if they were enabled in the aborted context #ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF tbnz x23, #7, 1f // PSR_I_BIT nop - nop msr daifclr, #2 1: alternative_else tbnz x23, #PSR_G_SHIFT, 1f // PSR_G_BIT mov x2, #ICC_PMR_EL1_UNMASKED msr_s ICC_PMR_EL1, x2 - msr daifclr, #2 1: alternative_endif #else @@ -367,6 +400,7 @@ alternative_endif

// disable interrupts before pulling preserved data off the stack disable_irq x21 + disable_nmi kernel_exit 1 el1_sp_pc: /* @@ -407,10 +441,14 @@ ENDPROC(el1_sync) el1_irq: kernel_entry 1 enable_dbg -#ifdef CONFIG_TRACE_IRQFLAGS - bl trace_hardirqs_off -#endif + trace_hardirqs_off x23

+ /* + * On systems with CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS then + * we do not yet know if this IRQ is a pseudo-NMI or a normal + * interrupt. For that reason we must rely on the irq_handler to + * enable the NMI once the interrupt type is determined. + */ irq_handler

#ifdef CONFIG_PREEMPT @@ -422,9 +460,9 @@ el1_irq: bl el1_preempt 1: #endif -#ifdef CONFIG_TRACE_IRQFLAGS - bl trace_hardirqs_on -#endif + + disable_nmi + trace_hardirqs_on x23 kernel_exit 1 ENDPROC(el1_irq)

@@ -519,6 +557,7 @@ el0_da: */ mrs x26, far_el1 // enable interrupts before calling the main handler + enable_nmi enable_dbg_and_irq x0 ct_user_exit bic x0, x26, #(0xff << 56) @@ -532,6 +571,7 @@ el0_ia: */ mrs x26, far_el1 // enable interrupts before calling the main handler + enable_nmi enable_dbg_and_irq x0 ct_user_exit mov x0, x26 @@ -565,6 +605,7 @@ el0_sp_pc: */ mrs x26, far_el1 // enable interrupts before calling the main handler + enable_nmi enable_dbg_and_irq x0 ct_user_exit mov x0, x26 @@ -577,6 +618,7 @@ el0_undef: * Undefined instruction */ // enable interrupts before calling the main handler + enable_nmi enable_dbg_and_irq x0 ct_user_exit mov x0, sp @@ -609,16 +651,18 @@ el0_irq: kernel_entry 0 el0_irq_naked: enable_dbg -#ifdef CONFIG_TRACE_IRQFLAGS - bl trace_hardirqs_off -#endif - + trace_hardirqs_off x23 ct_user_exit + + /* + * On systems with CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS then + * we do not yet know if this IRQ is a pseudo-NMI or a normal + * interrupt. For that reason we must rely on the irq_handler to + * enable the NMI once the interrupt type is determined. + */ irq_handler

-#ifdef CONFIG_TRACE_IRQFLAGS - bl trace_hardirqs_on -#endif + trace_hardirqs_on x23 b ret_to_user ENDPROC(el0_irq)

@@ -666,6 +710,7 @@ ret_fast_syscall: and x2, x1, #_TIF_WORK_MASK cbnz x2, work_pending enable_step_tsk x1, x2 + disable_nmi kernel_exit 0 ret_fast_syscall_trace: enable_irq x0 // enable interrupts @@ -681,6 +726,7 @@ work_pending: mov x0, sp // 'regs' tst x2, #PSR_MODE_MASK // user mode regs? b.ne no_work_pending // returning to kernel + enable_nmi enable_irq x21 // enable interrupts for do_notify_resume() bl do_notify_resume b ret_to_user @@ -697,6 +743,7 @@ ret_to_user: cbnz x2, work_pending enable_step_tsk x1, x2 no_work_pending: + disable_nmi kernel_exit 0 ENDPROC(ret_to_user)

@@ -722,6 +769,7 @@ el0_svc: mov sc_nr, #__NR_syscalls el0_svc_naked: // compat entry point stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number + enable_nmi enable_dbg_and_irq x16 ct_user_exit 1

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 0f37a33499e2..d5539291ac55 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -804,13 +804,31 @@ int setup_profiling_timer(unsigned int multiplier) return -EINVAL; }

+/* + * IPI_CPU_BACKTRACE is either implemented either as a normal IRQ or, + * if the hardware can supports it, using a pseudo-NMI. + * + * The mechanism used to implement pseudo-NMI means that in both cases + * testing if the backtrace IPI is disabled requires us to check the + * PSR I bit. However in the later case we cannot use irqs_disabled() + * to check the I bit because, when the pseudo-NMI is active that + * function examines the GIC PMR instead. + */ +static unsigned long nmi_disabled(void) +{ + unsigned long flags; + + asm volatile("mrs %0, daif" : "=r"(flags) :: "memory"); + return flags & PSR_I_BIT; +} + static void raise_nmi(cpumask_t *mask) { /* * Generate the backtrace directly if we are running in a * calling context that is not preemptible by the backtrace IPI. */ - if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled()) + if (cpumask_test_cpu(smp_processor_id(), mask) && nmi_disabled()) nmi_cpu_backtrace(NULL);

smp_cross_call(mask, IPI_CPU_BACKTRACE); diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 48cc3dfe1a0a..a389a387c5a6 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -19,6 +19,7 @@ #include <linux/cpu_pm.h> #include <linux/delay.h> #include <linux/interrupt.h> +#include <linux/nmi.h> #include <linux/of.h> #include <linux/of_address.h> #include <linux/of_irq.h> @@ -409,10 +410,60 @@ static u64 gic_mpidr_to_affinity(u64 mpidr) return aff; }

+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +static bool gic_handle_nmi(struct pt_regs *regs) +{ + u64 irqnr; + struct pt_regs *old_regs; + + asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r"(irqnr)); + + /* + * If no IRQ is acknowledged at this point then we have entered the + * handler due to an normal interrupt (rather than a pseudo-NMI). + * If so then unmask the I-bit and return to normal handling. + */ + if (irqnr == ICC_IAR1_EL1_SPURIOUS) { + asm volatile("msr daifclr, #2" : : : "memory"); + return false; + } + + old_regs = set_irq_regs(regs); + nmi_enter(); + + do { + if (SMP_IPI_NMI_MASK & (1 << irqnr)) { + gic_write_eoir(irqnr); + if (static_key_true(&supports_deactivate)) + gic_write_dir(irqnr); + nmi_cpu_backtrace(regs); + } else if (unlikely(irqnr != ICC_IAR1_EL1_SPURIOUS)) { + gic_write_eoir(irqnr); + if (static_key_true(&supports_deactivate)) + gic_write_dir(irqnr); + WARN_ONCE(true, "Unexpected NMI received!\n"); + } + + asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) + : "=r"(irqnr)); + } while (irqnr != ICC_IAR1_EL1_SPURIOUS); + + nmi_exit(); + set_irq_regs(old_regs); + + return true; +} +#else +static bool gic_handle_nmi(struct pt_regs *regs) { return false; } +#endif + static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs) { u64 irqnr;

+ if (gic_handle_nmi(regs)) + return; + do { irqnr = gic_read_iar();

@@ -567,6 +618,7 @@ static int gic_dist_supports_lpis(void) static void gic_cpu_init(void) { void __iomem *rbase; + unsigned long nmimask, hwirq;

/* Register ourselves with the rest of the world */ if (gic_populate_rdist()) @@ -584,6 +636,23 @@ static void gic_cpu_init(void)

/* initialise system registers */ gic_cpu_sys_reg_init(); + + /* Boost the priority of any IPI in the mask */ + nmimask = SMP_IPI_NMI_MASK; + for_each_set_bit(hwirq, &nmimask, 16) { + unsigned int pri_reg = (hwirq / 4) * 4; + u32 pri_mask = BIT(6 + ((hwirq % 4) * 8)); + u32 pri_val = readl_relaxed(rbase + GIC_DIST_PRI + pri_reg); + u32 actual; + + pri_mask |= BIT(7 + ((hwirq % 4) * 8)); + pri_val &= ~pri_mask; /* priority boost */ + writel_relaxed(pri_val, rbase + GIC_DIST_PRI + pri_reg); + + actual = readl_relaxed(rbase + GIC_DIST_PRI + pri_reg); + } + gic_dist_wait_for_rwp(); + gic_redist_wait_for_rwp(); }

#ifdef CONFIG_SMP

-- 2.4.3

Daniel Thompson

1:26 p.m.

New subject: [RFC PATCH v2 7/7] arm64: irqflags: Automatically identify I bit mis-management

This is self-test code to identify circumstances where the I bit is set by hardware but no software exists to copy its state to the PMR.

I don't really expect this patch to be retained much after the RFC stage. However I have included it in this RFC series to document the testing I have done and to allow further testing under different workloads.

Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/include/asm/irqflags.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index cf8a5184fce7..b2998b7946b6 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -19,8 +19,10 @@ #ifdef __KERNEL__

#include <linux/irqchip/arm-gic-v3.h> +#include <linux/preempt.h>

#include <asm/alternative.h> +#include <asm/bug.h> #include <asm/cpufeature.h> #include <asm/ptrace.h>

@@ -94,6 +96,33 @@ static inline void maybe_switch_to_sysreg_gic_cpuif(void) {}

#else /* CONFIG_IRQFLAGS_GIC_MASKING */

+static inline void check_for_i_bit(void) +{ +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + unsigned long flags; + + /* check whether the I-bit is spuriously enabled */ + if (!in_nmi()) { + asm volatile(ALTERNATIVE( + "mov %0, #0", + "mrs %0, daif", + ARM64_HAS_SYSREG_GIC_CPUIF) + : "=r" (flags)); + + WARN_ONCE(flags & PSR_I_BIT, "I bit is set: %08lx\n", flags); + } + + /* check that the PMR has a legal value */ + asm volatile(ALTERNATIVE( + "mov %0, #" __stringify(ICC_PMR_EL1_MASKED), + "mrs_s %0, " __stringify(ICC_PMR_EL1), + ARM64_HAS_SYSREG_GIC_CPUIF) + : "=r" (flags)); + WARN_ONCE((flags & ICC_PMR_EL1_MASKED) != ICC_PMR_EL1_MASKED, + "ICC_PMR_EL1 has a bad value: %08lx\n", flags); +#endif +} + /* * CPU interrupt mask handling. */ @@ -101,6 +130,7 @@ static inline unsigned long arch_local_irq_save(void) { unsigned long flags, masked = ICC_PMR_EL1_MASKED;

+ check_for_i_bit(); asm volatile(ALTERNATIVE( "mrs %0, daif // arch_local_irq_save\n" "msr daifset, #2", @@ -119,6 +149,7 @@ static inline void arch_local_irq_enable(void) { unsigned long unmasked = ICC_PMR_EL1_UNMASKED;

+ check_for_i_bit(); asm volatile(ALTERNATIVE( "msr daifclr, #2 // arch_local_irq_enable", "msr_s " __stringify(ICC_PMR_EL1) ",%0", @@ -132,6 +163,7 @@ static inline void arch_local_irq_disable(void) { unsigned long masked = ICC_PMR_EL1_MASKED;

+ check_for_i_bit(); asm volatile(ALTERNATIVE( "msr daifset, #2 // arch_local_irq_disable", "msr_s " __stringify(ICC_PMR_EL1) ",%0", @@ -148,6 +180,7 @@ static inline unsigned long arch_local_save_flags(void) { unsigned long flags;

+ check_for_i_bit(); asm volatile(ALTERNATIVE( "mrs %0, daif // arch_local_save_flags", "mrs_s %0, " __stringify(ICC_PMR_EL1), @@ -164,6 +197,7 @@ static inline unsigned long arch_local_save_flags(void) */ static inline void arch_local_irq_restore(unsigned long flags) { + check_for_i_bit(); asm volatile(ALTERNATIVE( "msr daif, %0 // arch_local_irq_restore", "msr_s " __stringify(ICC_PMR_EL1) ",%0", @@ -175,6 +209,7 @@ static inline void arch_local_irq_restore(unsigned long flags)

static inline int arch_irqs_disabled_flags(unsigned long flags) { + check_for_i_bit(); asm volatile(ALTERNATIVE( "and %0, %0, #" __stringify(PSR_I_BIT) "\n" "nop",

-- 2.4.3

Jon Masters

18 Sep 18 Sep

5:11 a.m.

(Apologies for top posting)

I think there is a need to connect a few dots on this next week during Connect. Some other conversations have discussed alternative implementations elsewhere. I will assist.

-- Computer Architect | Sent from my 64-bit #ARM Powered phone > On Sep 14, 2015, at 06:26, Daniel Thompson daniel.thompson@linaro.org wrote: > > This patchset provides a pseudo-NMI for arm64 kernels by reimplementing > the irqflags macros to modify the GIC PMR (the priority mask register is > accessible as a system register on GICv3 and later) rather than the > PSR. The patchset includes an implementation of > arch_trigger_all_cpu_backtrace() for arm64 allowing the new code to be > exercised. > > The code works-for-me (tm) and is much more "real" than the last time I > shared these patches. However there remain a couple of limitations and > caveats: > > 1. Requires GICv3+ hardware to be effective. The alternatives runtime > patching system is employed so systems with earlier GIC architectures > are still bootable but will not benefit from NMI simulation. > > 2. Currently hardcoded to use ICC_PMR_EL1. Extra work might be needed > on the alternatives system so we can peacefully coexist with ARMv8.1 > KVM support (when kernel will be running at EL2). > > 3. FVP needs a bit of hacking to be able to run <SysRq-L> from an ISR. > That's a shame because <SysRq-L> is a great way to observe an NMI > preempting an IRQ handler. Testers are welcome to ping me offline > and I can share the hacks (and DTs) I have been using to test with. > > 4. Testing for non regression on a GICv2 system will require this patch > to avoid crashes during <SysRq-L>: > http://article.gmane.org/gmane.linux.kernel/2037558 > > v2: > > * Removed the isb instructions. The PMR is self-synchronizing so > these are not needed (Dave Martin) > > * Use alternative runtime patching to allow the same kernel binary > to boot systems with and without GICv3+ (Dave Martin). > > * Added code to properly distinguish between NMI and normal IRQ and to > call into NMI handling code where needed. > > * Replaced the IPI backtrace logic with a newer version (from Russell > King). > > > Daniel Thompson (7): > irqchip: gic-v3: Reset BPR during initialization > arm64: Add support for on-demand backtrace of other CPUs > arm64: alternative: Apply alternatives early in boot process > arm64: irqflags: Reorder the fiq & async macros > arm64: irqflags: Use ICC sysregs to implement IRQ masking > arm64: Implement IPI_CPU_BACKTRACE using pseudo-NMIs > arm64: irqflags: Automatically identify I bit mis-management > > arch/arm64/Kconfig | 15 ++++ > arch/arm64/include/asm/alternative.h | 1 + > arch/arm64/include/asm/assembler.h | 56 ++++++++++++- > arch/arm64/include/asm/hardirq.h | 2 +- > arch/arm64/include/asm/irq.h | 3 + > arch/arm64/include/asm/irqflags.h | 154 +++++++++++++++++++++++++++++++++-- > arch/arm64/include/asm/ptrace.h | 18 ++++ > arch/arm64/include/asm/smp.h | 2 + > arch/arm64/kernel/alternative.c | 15 ++++ > arch/arm64/kernel/entry.S | 149 +++++++++++++++++++++++++++------ > arch/arm64/kernel/head.S | 35 ++++++++ > arch/arm64/kernel/setup.c | 13 +++ > arch/arm64/kernel/smp.c | 44 ++++++++++ > arch/arm64/mm/proc.S | 23 ++++++ > drivers/irqchip/irq-gic-v3.c | 117 +++++++++++++++++++++++++- > include/linux/irqchip/arm-gic-v3.h | 10 +++ > include/linux/irqchip/arm-gic.h | 2 +- > lib/nmi_backtrace.c | 8 +- > 18 files changed, 629 insertions(+), 38 deletions(-) > > -- > 2.4.3 > > _______________________________________________ > linaro-kernel mailing list > linaro-kernel@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/linaro-kernel

Daniel Thompson

11:23 a.m.

On 18/09/15 06:11, Jon Masters wrote:

...

...
On Sep 14, 2015, at 06:26, Daniel Thompson daniel.thompson@linaro.org wrote: This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The patchset includes an implementation of arch_trigger_all_cpu_backtrace() for arm64 allowing the new code to be exercised.

I think there is a need to connect a few dots on this next week during Connect. Some other conversations have discussed alternative implementations elsewhere. I will assist.

Fine by me.

I'd be very happy to talk about alternative approaches. In the past I've had long conversations about trapping to ARM TF as a means to simulate NMI. I haven't written any code to move in this direction but I still think of it as being the future-areas-of-interest pile.

That said, whenever I search for (what I think are) sensible keywords for this subject I generally only find my own work! I may be selecting a rather blinkered set of keywords when I search but nevertheless it does mean I will probably have to rely on you to make introductions!

Daniel.

Dingtianhong

22 Sep 22 Sep

6:08 p.m.

New subject: 答复: [RFC PATCH v2 0/7] Pseudo-NMI for arm64 using ICC_PMR_EL1 (GICv3)

________________________________________ 发件人: linux-arm-kernel [linux-arm-kernel-bounces@lists.infradead.org] 代表 Daniel Thompson [daniel.thompson@linaro.org] 发送时间: 2015年9月18日 19:23 收件人: Jon Masters 抄送: linaro-kernel@lists.linaro.org; patches@linaro.org; Marc Zyngier; Catalin Marinas; linux-kernel@vger.kernel.org; Andrew Thoelke; Dave Martin; linux-arm-kernel@lists.infradead.org 主题: Re: [RFC PATCH v2 0/7] Pseudo-NMI for arm64 using ICC_PMR_EL1 (GICv3)

On 18/09/15 06:11, Jon Masters wrote:

...

...
On Sep 14, 2015, at 06:26, Daniel Thompson daniel.thompson@linaro.org wrote: This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The patchset includes an implementation of arch_trigger_all_cpu_backtrace() for arm64 allowing the new code to be exercised.

I think there is a need to connect a few dots on this next week during Connect. Some other conversations have discussed alternative implementations elsewhere. I will assist.

Fine by me.

Hi Daniel:

I have checked that trapping to ARM TF could work well for aarch64 as NMI, and maybe we could discussion about it. :)

Ding

Daniel.

_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

3624

days inactive

3632

days old

linaro-kernel@lists.linaro.org

17 comments

participants

tags (0)

participants (4)

Daniel Thompson
Dingtianhong
Jon Masters
Will Deacon