This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The pseudo-NMI changes are support by a prototype implementation of arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
In addition to the arm64 changes I've bundled in a few patches from other patchsets to make the patchset self-contained. Of particular note of the serial break emulation patch which allows ^B^R^K to be used instead of a serial break to trigger SysRq-L (FVP UART sockets don't seem to support serial breaks). This makes it easy to run arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with interrupts masked so we are forced to preempt and take the NMI).
The code works-for-me (tm) but there are currently some pretty serious limitations.
1. Exercised only on the foundation model with gicv3 support. It has not been exercised on real silicon or even on the more advanced ARM models.
2. It has been written without any documentation describing GICv3 architecture (which has not yet been released by ARM). I've been guessing about the behaviour based on the ARMv8 and GICv2 architecture specs. The code works on the foundation model but I cannot check that it conforms architecturally.
3. Requires GICv3+ hardware together with firmware support to enable GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is enabled the kernel will not boot on older hardware. It will be hard to diagnose because we will crash very early in the boot (i.e. before the call to start_kernel). Auto-detection might be possible but the performance and code size cost of adding conditional code to the irqflags macros probably makes it impractical. As such it may never be possible to remove this limitation (although it might be possible to find a way to survive long enough to panic and show the results on the console).
4. No benchmarking (see #1 above). Unlike the PSR, updates to the PMR do not self synchronize which requires me to sprinkle isb instructions fairly liberally. I've been told the cost of isb varies from almost-free (A53) to somewhat costly (A57) thus we export this code to reduce kernel performance. However this needs to be quantified and I am currently unable to do this. I'd really like to but don't have any suitable hardware.
5. There is no code in el1_irq to detect NMI and switch from IRQ to NMI handling. This means all the irq handling machinary is re-entered in order to handle the NMI. This not safe and deadlocks are likely. This is a severe limitation although, in this proof-of-concept work, NMI can only be triggered by SysRq-L or severe kernel damage. This means we just about get away with it for simple test (lockdep detects that we are doing wrong and shows a backtrace). This is definitely the first thing that needs to be tackled to take this code further.
Note also that alternative approaches to implementing a pseudo-NMI on arm64 are possible but only through runtime cooperation with other software components in the system, potentially both those running at EL3 and at secure EL1. I should like to explore these options in future but, as far as I know, this is the only sane way to provide NMI-like features whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3 firmware (and already implemented by ARM trusted firmware).
Daniel Thompson (7): serial: Emulate break using control characters printk: Simple implementation for NMI backtracing irqchip: gic-v3: Reset BPR during initialization arm64: irqflags: Reorder the fiq & async macros arm64: irqflags: Use ICC sysregs to implement IRQ masking arm64: irqflags: Automatically identify I bit mis-management arm64: Add support for on-demand backtrace of other CPUs
arch/arm64/Kconfig | 16 ++++ arch/arm64/include/asm/assembler.h | 72 +++++++++++++++-- arch/arm64/include/asm/hardirq.h | 2 +- arch/arm64/include/asm/irq.h | 5 ++ arch/arm64/include/asm/irqflags.h | 130 ++++++++++++++++++++++++++++-- arch/arm64/include/asm/ptrace.h | 10 +++ arch/arm64/include/asm/smp.h | 4 + arch/arm64/include/uapi/asm/ptrace.h | 8 ++ arch/arm64/kernel/entry.S | 70 ++++++++++++++--- arch/arm64/kernel/head.S | 27 +++++++ arch/arm64/kernel/irq.c | 6 ++ arch/arm64/kernel/smp.c | 70 +++++++++++++++++ arch/arm64/mm/cache.S | 4 +- arch/arm64/mm/proc.S | 19 +++++ drivers/irqchip/irq-gic-v3.c | 70 ++++++++++++++++- include/linux/irqchip/arm-gic-v3.h | 6 +- include/linux/irqchip/arm-gic.h | 2 +- include/linux/printk.h | 20 +++++ include/linux/serial_core.h | 83 +++++++++++++++----- init/Kconfig | 3 + kernel/printk/Makefile | 1 + kernel/printk/nmi_backtrace.c | 148 +++++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 15 ++++ 23 files changed, 746 insertions(+), 45 deletions(-) create mode 100644 kernel/printk/nmi_backtrace.c
-- 2.1.0
Currently the magic SysRq functions can accessed by sending a break on the serial port. Unfortunately some networked serial proxies make it difficult to send a break meaning SysRq functions cannot be used. This patch provides a workaround by allowing the (fairly unlikely) sequence of ^B^R^K characters to emulate a real break.
This approach is very nearly as robust as normal sysrq/break handling because all trigger recognition happens during interrupt handling. Only major difference is that to emulate a break we must enter the ISR four times (instead of twice) and manage an extra byte of state.
No means is provided to escape the trigger sequence (and pass ^B^R^K to the underlying process) however the sequence is proved reasonably pretty collision resistant in practice. The most significant consequence is that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like backward-char bindings (^B) because the character movement will become lumpy (two characters every two key presses rather than one character per key press). Arrow keys or ^B^B^F provide workarounds.
Special note for tmux users: tmux defaults to using ^B as its escape character but does not have a default binding for ^B^R. Likewise tmux had no visual indicator showing the beginning of break sequence meaning delayed the delivery of ^B is not observable. Thus serial break emulation does not interfere with the use of tmux's default key bindings.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++---------- lib/Kconfig.debug | 15 ++++++++ 2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cf9c2ce9479d..56f8e3daf42c 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -160,6 +160,9 @@ struct uart_port { struct console *cons; /* struct console, if any */ #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ) unsigned long sysrq; /* sysrq timeout */ +#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION + char sysrq_emul; /* emulation state */ +#endif #endif
/* flags must be updated while holding port mutex */ @@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport, extern void uart_insert_char(struct uart_port *port, unsigned int status, unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ -static inline int -uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) -{ - if (port->sysrq) { - if (ch && time_before(jiffies, port->sysrq)) { - handle_sysrq(ch); - port->sysrq = 0; - return 1; - } - port->sysrq = 0; - } - return 0; -} -#else -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; }) -#endif - /* * We do the SysRQ and SAK checking like this... */ @@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port) return 0; }
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION) +/* + * Emulate a break if we are the serial console and receive ^B, ^R, ^K. + */ +static inline int +uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch) +{ + const unsigned int ctrlb = 'B' & 31; + const unsigned int ctrlr = 'R' & 31; + const unsigned int ctrlk = 'K' & 31; + + if (uart_console(port)) { + if ((port->sysrq_emul == 0 && ch == ctrlb) || + (port->sysrq_emul == ctrlb && ch == ctrlr)) { + /* for either of the first two trigger characters + * update the state variable and move on. + */ + port->sysrq_emul = ch; + return 1; + } else if (port->sysrq_emul == ctrlr && ch == ctrlk && + uart_handle_break(port)) { + /* the break has already been emulated whilst + * evaluating the condition, tidy up and move on + */ + port->sysrq_emul = 0; + return 1; + } + } + + if (port->sysrq_emul) { + /* received a partial (false) trigger, tidy up and move on */ + uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL); + if (port->sysrq_emul == ctrlr) + uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL); + port->sysrq_emul = 0; + } + + return 0; +} +#else +#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; }) +#endif + +#ifdef SUPPORT_SYSRQ +static inline int +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +{ + if (port->sysrq) { + if (ch && time_before(jiffies, port->sysrq)) { + handle_sysrq(ch); + port->sysrq = 0; + return 1; + } + port->sysrq = 0; + } + + return uart_handle_sysrq_break_emulation(port, ch); +} +#else +#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; }) +#endif + /* * UART_ENABLE_MS - determine if port should enable modem status irqs */ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c2d51af327bc..3f54e85c27d2 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE This may be set to 1 or 0 to enable or disable them all, or to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION + bool "Enable magic SysRq serial break emulation" + depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE + default n + help + If you say Y here, then you can use the character sequence ^B^R^K + to simulate a BREAK on the serial console. This is useful if for + some reason you cannot send a BREAK to your console's serial port. + For example, if you have a serial device server that cannot + send a BREAK. Enabling this feature can delay the delivery of + characters to the TTY because the ^B and a subsequent ^R will be + delayed until we know what the next character is. + + If unsure, say N. + config DEBUG_KERNEL bool "Kernel debugging" help
[ + linux-serial ]
Hi Daniel,
I apologize for not reviewing this back in Sept when you first RFC'd this patch.
On 03/18/2015 10:20 AM, Daniel Thompson wrote:
Currently the magic SysRq functions can accessed by sending a break on the serial port. Unfortunately some networked serial proxies make it difficult to send a break meaning SysRq functions cannot be used. This patch provides a workaround by allowing the (fairly unlikely) sequence of ^B^R^K characters to emulate a real break.
I really feel that support for this belongs in the terminal server; terminal servers designed to be console servers already provide this functionality.
That said, my review comments are below in case others feel differently.
This approach is very nearly as robust as normal sysrq/break handling because all trigger recognition happens during interrupt handling. Only major difference is that to emulate a break we must enter the ISR four times (instead of twice) and manage an extra byte of state.
It may not be immediately obvious to others that this means that portions of the sequence are not processed as input until the sequence is not matched.
So, ^B is not available for reading at the tty until the next char is received. If no next char is sent, ^B is _never_ received. This can cause all kinds of weird process behavior, like no context help.
This will also interfere with any portion of the process key bindings (as opposed to only the first). For example, the default emacs key-binding for list-buffers is ^X ^B.
No means is provided to escape the trigger sequence (and pass ^B^R^K to the underlying process) however the sequence is proved reasonably pretty collision resistant in practice. The most significant consequence is that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like backward-char bindings (^B) because the character movement will become lumpy (two characters every two key presses rather than one character per key press). Arrow keys or ^B^B^F provide workarounds.
Special note for tmux users: tmux defaults to using ^B as its escape character but does not have a default binding for ^B^R. Likewise tmux had no visual indicator showing the beginning of break sequence meaning delayed the delivery of ^B is not observable. Thus serial break emulation does not interfere with the use of tmux's default key bindings.
Cataloging the user-space collisions here is really pointless; it's best to make it clear that widespread userspace key binding interference is likely.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++---------- lib/Kconfig.debug | 15 ++++++++ 2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cf9c2ce9479d..56f8e3daf42c 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -160,6 +160,9 @@ struct uart_port { struct console *cons; /* struct console, if any */ #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ) unsigned long sysrq; /* sysrq timeout */ +#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
- char sysrq_emul; /* emulation state */
+#endif #endif /* flags must be updated while holding port mutex */ @@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport, extern void uart_insert_char(struct uart_port *port, unsigned int status, unsigned int overrun, unsigned int ch, unsigned int flag); -#ifdef SUPPORT_SYSRQ -static inline int -uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) -{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return 0;
-} -#else -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; }) -#endif
/*
- We do the SysRQ and SAK checking like this...
*/ @@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port) return 0; } +#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION) +/*
- Emulate a break if we are the serial console and receive ^B, ^R, ^K.
- */
+static inline int +uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch) +{
- const unsigned int ctrlb = 'B' & 31;
- const unsigned int ctrlr = 'R' & 31;
- const unsigned int ctrlk = 'K' & 31;
- if (uart_console(port)) {
if ((port->sysrq_emul == 0 && ch == ctrlb) ||
(port->sysrq_emul == ctrlb && ch == ctrlr)) {
/* for either of the first two trigger characters
* update the state variable and move on.
*/
port->sysrq_emul = ch;
return 1;
} else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
uart_handle_break(port)) {
/* the break has already been emulated whilst
* evaluating the condition, tidy up and move on
*/
port->sysrq_emul = 0;
return 1;
}
- }
- if (port->sysrq_emul) {
/* received a partial (false) trigger, tidy up and move on */
uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
if (port->sysrq_emul == ctrlr)
uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
port->sysrq_emul = 0;
- }
I think this function would be shorter and clearer if the sequence was in a byte array and the state variable was an index.
That would also allow the sequence to be configurable.
- return 0;
+} +#else +#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; }) +#endif
+#ifdef SUPPORT_SYSRQ +static inline int +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return uart_handle_sysrq_break_emulation(port, ch);
+} +#else +#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; }) +#endif
Why did your diff re-insert this whole function for a 1-line change?
/*
- UART_ENABLE_MS - determine if port should enable modem status irqs
*/ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c2d51af327bc..3f54e85c27d2 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE This may be set to 1 or 0 to enable or disable them all, or to a bitmask as described in Documentation/sysrq.txt. +config MAGIC_SYSRQ_BREAK_EMULATION
- bool "Enable magic SysRq serial break emulation"
- depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
I think this option should depend on DEBUG_KERNEL.
- default n
- help
If you say Y here, then you can use the character sequence ^B^R^K
to simulate a BREAK on the serial console. This is useful if for
some reason you cannot send a BREAK to your console's serial port.
For example, if you have a serial device server that cannot
send a BREAK. Enabling this feature can delay the delivery of
characters to the TTY because the ^B and a subsequent ^R will be
delayed until we know what the next character is.
This help text should be more pessimistic and suggest this option _only_ if an inadequate terminal server is actually encountered and other known methods of triggering sysrq remotely have failed.
And perhaps contain a warning.
Regards, Peter Hurley
If unsure, say N.
config DEBUG_KERNEL bool "Kernel debugging" help
On 19/03/15 22:05, Peter Hurley wrote:
[ + linux-serial ]
Hi Daniel,
I apologize for not reviewing this back in Sept when you first RFC'd this patch.
No worries.
The original RFC really was a "would anyone else find this useful?" question and that question seemed to be answered by the absence of responses!
On 03/18/2015 10:20 AM, Daniel Thompson wrote:
Currently the magic SysRq functions can accessed by sending a break on the serial port. Unfortunately some networked serial proxies make it difficult to send a break meaning SysRq functions cannot be used. This patch provides a workaround by allowing the (fairly unlikely) sequence of ^B^R^K characters to emulate a real break.
I really feel that support for this belongs in the terminal server; terminal servers designed to be console servers already provide this functionality.
I agree that the best place to fix this is in the terminal server.
However I wrote (and still personally use) this patch to cover those occasions where that is not possible, typically because the terminal server is implemented in a closed source firmware that I can't modify. In one notable case I have a networked JTAG debugger that has a built in serial proxy. This is *such* a great way to simplifying wiring up a lab that I am prepared to tolerate the limitations and/or bugs in its serial proxy.
More recently with the arm64 foundation model, I've been using a closed source simulator where serial break doesn't work (or at least where sending a break via telnet protocol did not work for me). Therefore I have rolled the patch out again because without it my code cannot be tested.
That said, my review comments are below in case others feel differently.
Thanks for the review and, for the record, I don't think I disagree with anything you say below. However unless others really do feels differently (or are persuaded by the above that it is a worthwhile idea) then I'll probably just keep the patch to myself.
This approach is very nearly as robust as normal sysrq/break handling because all trigger recognition happens during interrupt handling. Only major difference is that to emulate a break we must enter the ISR four times (instead of twice) and manage an extra byte of state.
It may not be immediately obvious to others that this means that portions of the sequence are not processed as input until the sequence is not matched.
So, ^B is not available for reading at the tty until the next char is received. If no next char is sent, ^B is _never_ received. This can cause all kinds of weird process behavior, like no context help.
This will also interfere with any portion of the process key bindings (as opposed to only the first). For example, the default emacs key-binding for list-buffers is ^X ^B.
No means is provided to escape the trigger sequence (and pass ^B^R^K to the underlying process) however the sequence is proved reasonably pretty collision resistant in practice. The most significant consequence is that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like backward-char bindings (^B) because the character movement will become lumpy (two characters every two key presses rather than one character per key press). Arrow keys or ^B^B^F provide workarounds.
Special note for tmux users: tmux defaults to using ^B as its escape character but does not have a default binding for ^B^R. Likewise tmux had no visual indicator showing the beginning of break sequence meaning delayed the delivery of ^B is not observable. Thus serial break emulation does not interfere with the use of tmux's default key bindings.
Cataloging the user-space collisions here is really pointless; it's best to make it clear that widespread userspace key binding interference is likely.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++---------- lib/Kconfig.debug | 15 ++++++++ 2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cf9c2ce9479d..56f8e3daf42c 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -160,6 +160,9 @@ struct uart_port { struct console *cons; /* struct console, if any */ #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ) unsigned long sysrq; /* sysrq timeout */ +#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
- char sysrq_emul; /* emulation state */
+#endif #endif
/* flags must be updated while holding port mutex */ @@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport, extern void uart_insert_char(struct uart_port *port, unsigned int status, unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ -static inline int -uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) -{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return 0;
-} -#else -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; }) -#endif
- /*
*/
- We do the SysRQ and SAK checking like this...
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port) return 0; }
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION) +/*
- Emulate a break if we are the serial console and receive ^B, ^R, ^K.
- */
+static inline int +uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch) +{
- const unsigned int ctrlb = 'B' & 31;
- const unsigned int ctrlr = 'R' & 31;
- const unsigned int ctrlk = 'K' & 31;
- if (uart_console(port)) {
if ((port->sysrq_emul == 0 && ch == ctrlb) ||
(port->sysrq_emul == ctrlb && ch == ctrlr)) {
/* for either of the first two trigger characters
* update the state variable and move on.
*/
port->sysrq_emul = ch;
return 1;
} else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
uart_handle_break(port)) {
/* the break has already been emulated whilst
* evaluating the condition, tidy up and move on
*/
port->sysrq_emul = 0;
return 1;
}
- }
- if (port->sysrq_emul) {
/* received a partial (false) trigger, tidy up and move on */
uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
if (port->sysrq_emul == ctrlr)
uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
port->sysrq_emul = 0;
- }
I think this function would be shorter and clearer if the sequence was in a byte array and the state variable was an index.
That would also allow the sequence to be configurable.
- return 0;
+} +#else +#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; }) +#endif
+#ifdef SUPPORT_SYSRQ +static inline int +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return uart_handle_sysrq_break_emulation(port, ch);
+} +#else +#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; }) +#endif
Why did your diff re-insert this whole function for a 1-line change?
- /*
*/
- UART_ENABLE_MS - determine if port should enable modem status irqs
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c2d51af327bc..3f54e85c27d2 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE This may be set to 1 or 0 to enable or disable them all, or to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION
- bool "Enable magic SysRq serial break emulation"
- depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
I think this option should depend on DEBUG_KERNEL.
- default n
- help
If you say Y here, then you can use the character sequence ^B^R^K
to simulate a BREAK on the serial console. This is useful if for
some reason you cannot send a BREAK to your console's serial port.
For example, if you have a serial device server that cannot
send a BREAK. Enabling this feature can delay the delivery of
characters to the TTY because the ^B and a subsequent ^R will be
delayed until we know what the next character is.
This help text should be more pessimistic and suggest this option _only_ if an inadequate terminal server is actually encountered and other known methods of triggering sysrq remotely have failed.
And perhaps contain a warning.
Regards, Peter Hurley
If unsure, say N.
- config DEBUG_KERNEL bool "Kernel debugging" help
On Wed, Mar 18, 2015 at 02:20:22PM +0000, Daniel Thompson wrote:
Currently the magic SysRq functions can accessed by sending a break on the serial port. Unfortunately some networked serial proxies make it difficult to send a break meaning SysRq functions cannot be used. This patch provides a workaround by allowing the (fairly unlikely) sequence of ^B^R^K characters to emulate a real break.
This is neat, but as it stands it feels like a bit of a hack. It would be preferable to make the magic string configurable, since almost any choice is going to upset somebody.
Since this also breaks the console (i.e., changes the behaviour) It should probably not be on by default: a command-line option or /proc/sys/kernel tweak should be required in order to turn it on. Otherwise, this is likely to get activated unconditionally in production kernels.
A particular concern is that something other than a human user could be connected to the UART.
I also feel it doesn't really belong in this series. NMI doesn't require this in order to be useful, this patch doesn't require NMI, and anyway it's not specific to arm.
I suggest posting this separately, CCing linux-serial.
Cheers --Dave
This approach is very nearly as robust as normal sysrq/break handling because all trigger recognition happens during interrupt handling. Only major difference is that to emulate a break we must enter the ISR four times (instead of twice) and manage an extra byte of state.
No means is provided to escape the trigger sequence (and pass ^B^R^K to the underlying process) however the sequence is proved reasonably pretty collision resistant in practice. The most significant consequence is that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like backward-char bindings (^B) because the character movement will become lumpy (two characters every two key presses rather than one character per key press). Arrow keys or ^B^B^F provide workarounds.
Special note for tmux users: tmux defaults to using ^B as its escape character but does not have a default binding for ^B^R. Likewise tmux had no visual indicator showing the beginning of break sequence meaning delayed the delivery of ^B is not observable. Thus serial break emulation does not interfere with the use of tmux's default key bindings.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++---------- lib/Kconfig.debug | 15 ++++++++ 2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cf9c2ce9479d..56f8e3daf42c 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -160,6 +160,9 @@ struct uart_port { struct console *cons; /* struct console, if any */ #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ) unsigned long sysrq; /* sysrq timeout */ +#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
- char sysrq_emul; /* emulation state */
+#endif #endif /* flags must be updated while holding port mutex */ @@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport, extern void uart_insert_char(struct uart_port *port, unsigned int status, unsigned int overrun, unsigned int ch, unsigned int flag); -#ifdef SUPPORT_SYSRQ -static inline int -uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) -{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return 0;
-} -#else -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; }) -#endif
/*
- We do the SysRQ and SAK checking like this...
*/ @@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port) return 0; } +#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION) +/*
- Emulate a break if we are the serial console and receive ^B, ^R, ^K.
- */
+static inline int +uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch) +{
- const unsigned int ctrlb = 'B' & 31;
- const unsigned int ctrlr = 'R' & 31;
- const unsigned int ctrlk = 'K' & 31;
- if (uart_console(port)) {
if ((port->sysrq_emul == 0 && ch == ctrlb) ||
(port->sysrq_emul == ctrlb && ch == ctrlr)) {
/* for either of the first two trigger characters
* update the state variable and move on.
*/
port->sysrq_emul = ch;
return 1;
} else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
uart_handle_break(port)) {
/* the break has already been emulated whilst
* evaluating the condition, tidy up and move on
*/
port->sysrq_emul = 0;
return 1;
}
- }
- if (port->sysrq_emul) {
/* received a partial (false) trigger, tidy up and move on */
uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
if (port->sysrq_emul == ctrlr)
uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
port->sysrq_emul = 0;
- }
- return 0;
+} +#else +#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; }) +#endif
+#ifdef SUPPORT_SYSRQ +static inline int +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return uart_handle_sysrq_break_emulation(port, ch);
+} +#else +#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; }) +#endif
/*
- UART_ENABLE_MS - determine if port should enable modem status irqs
*/ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c2d51af327bc..3f54e85c27d2 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE This may be set to 1 or 0 to enable or disable them all, or to a bitmask as described in Documentation/sysrq.txt. +config MAGIC_SYSRQ_BREAK_EMULATION
- bool "Enable magic SysRq serial break emulation"
- depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
- default n
- help
If you say Y here, then you can use the character sequence ^B^R^K
to simulate a BREAK on the serial console. This is useful if for
some reason you cannot send a BREAK to your console's serial port.
For example, if you have a serial device server that cannot
send a BREAK. Enabling this feature can delay the delivery of
characters to the TTY because the ^B and a subsequent ^R will be
delayed until we know what the next character is.
If unsure, say N.
config DEBUG_KERNEL bool "Kernel debugging" help -- 2.1.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 20/03/15 14:28, Dave Martin wrote:
On Wed, Mar 18, 2015 at 02:20:22PM +0000, Daniel Thompson wrote:
Currently the magic SysRq functions can accessed by sending a break on the serial port. Unfortunately some networked serial proxies make it difficult to send a break meaning SysRq functions cannot be used. This patch provides a workaround by allowing the (fairly unlikely) sequence of ^B^R^K characters to emulate a real break.
This is neat, but as it stands it feels like a bit of a hack. It would be preferable to make the magic string configurable, since almost any choice is going to upset somebody.
Since this also breaks the console (i.e., changes the behaviour) It should probably not be on by default: a command-line option or /proc/sys/kernel tweak should be required in order to turn it on. Otherwise, this is likely to get activated unconditionally in production kernels.
It hadn't really occurred to me that it would ever be a good idea to activate this for production kernels. Aren't these code paths rather hot when the serial ports are running as super high speeds?
That said if the magic string were configurable then it could simply default to the empty string and that would result in the serial break emulation being disabled.
A particular concern is that something other than a human user could be connected to the UART.
I also feel it doesn't really belong in this series. NMI doesn't require this in order to be useful, this patch doesn't require NMI, and anyway it's not specific to arm.
To be clear I included the patch in this series only because:
1. I couldn't figure out any way to send a serial break to the ARM Foundation Model making it impossible for me to provoke SysRq actions from interrupt context,
2. SysRq-L is a really good way to test the code and, when launched from interrupt context proves that pre-emption by pseudo-NMI works.
I only really included the patch as a convenience for anyone wanting to do any runtime testing.
I suggest posting this separately, CCing linux-serial.
Don't worry, I shared the patch on linux-serial quite some time ago although, as it happens, the patch has got a lot more review comments when I included it as a convenience in an unrelated patchset than it did when I RFCed it separately.
This approach is very nearly as robust as normal sysrq/break handling because all trigger recognition happens during interrupt handling. Only major difference is that to emulate a break we must enter the ISR four times (instead of twice) and manage an extra byte of state.
No means is provided to escape the trigger sequence (and pass ^B^R^K to the underlying process) however the sequence is proved reasonably pretty collision resistant in practice. The most significant consequence is that ^B and ^B^R are delayed until a new character is observed.
The most significant collision I am aware of is with emacs-like backward-char bindings (^B) because the character movement will become lumpy (two characters every two key presses rather than one character per key press). Arrow keys or ^B^B^F provide workarounds.
Special note for tmux users: tmux defaults to using ^B as its escape character but does not have a default binding for ^B^R. Likewise tmux had no visual indicator showing the beginning of break sequence meaning delayed the delivery of ^B is not observable. Thus serial break emulation does not interfere with the use of tmux's default key bindings.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org
include/linux/serial_core.h | 83 +++++++++++++++++++++++++++++++++++---------- lib/Kconfig.debug | 15 ++++++++ 2 files changed, 80 insertions(+), 18 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cf9c2ce9479d..56f8e3daf42c 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -160,6 +160,9 @@ struct uart_port { struct console *cons; /* struct console, if any */ #if defined(CONFIG_SERIAL_CORE_CONSOLE) || defined(SUPPORT_SYSRQ) unsigned long sysrq; /* sysrq timeout */ +#ifdef CONFIG_MAGIC_SYSRQ_BREAK_EMULATION
- char sysrq_emul; /* emulation state */
+#endif #endif
/* flags must be updated while holding port mutex */ @@ -420,24 +423,6 @@ extern void uart_handle_cts_change(struct uart_port *uport, extern void uart_insert_char(struct uart_port *port, unsigned int status, unsigned int overrun, unsigned int ch, unsigned int flag);
-#ifdef SUPPORT_SYSRQ -static inline int -uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) -{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return 0;
-} -#else -#define uart_handle_sysrq_char(port,ch) ({ (void)port; 0; }) -#endif
- /*
*/
- We do the SysRQ and SAK checking like this...
@@ -462,6 +447,68 @@ static inline int uart_handle_break(struct uart_port *port) return 0; }
+#if defined(SUPPORT_SYSRQ) && defined(CONFIG_MAGIC_SYSRQ_BREAK_EMULATION) +/*
- Emulate a break if we are the serial console and receive ^B, ^R, ^K.
- */
+static inline int +uart_handle_sysrq_break_emulation(struct uart_port *port, unsigned int ch) +{
- const unsigned int ctrlb = 'B' & 31;
- const unsigned int ctrlr = 'R' & 31;
- const unsigned int ctrlk = 'K' & 31;
- if (uart_console(port)) {
if ((port->sysrq_emul == 0 && ch == ctrlb) ||
(port->sysrq_emul == ctrlb && ch == ctrlr)) {
/* for either of the first two trigger characters
* update the state variable and move on.
*/
port->sysrq_emul = ch;
return 1;
} else if (port->sysrq_emul == ctrlr && ch == ctrlk &&
uart_handle_break(port)) {
/* the break has already been emulated whilst
* evaluating the condition, tidy up and move on
*/
port->sysrq_emul = 0;
return 1;
}
- }
- if (port->sysrq_emul) {
/* received a partial (false) trigger, tidy up and move on */
uart_insert_char(port, 0, 0, ctrlb, TTY_NORMAL);
if (port->sysrq_emul == ctrlr)
uart_insert_char(port, 0, 0, ctrlr, TTY_NORMAL);
port->sysrq_emul = 0;
- }
- return 0;
+} +#else +#define uart_handle_sysrq_break_emulation(port, ch) ({ (void)port; 0; }) +#endif
+#ifdef SUPPORT_SYSRQ +static inline int +uart_handle_sysrq_char(struct uart_port *port, unsigned int ch) +{
- if (port->sysrq) {
if (ch && time_before(jiffies, port->sysrq)) {
handle_sysrq(ch);
port->sysrq = 0;
return 1;
}
port->sysrq = 0;
- }
- return uart_handle_sysrq_break_emulation(port, ch);
+} +#else +#define uart_handle_sysrq_char(port, ch) ({ (void)port; 0; }) +#endif
- /*
*/
- UART_ENABLE_MS - determine if port should enable modem status irqs
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c2d51af327bc..3f54e85c27d2 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -372,6 +372,21 @@ config MAGIC_SYSRQ_DEFAULT_ENABLE This may be set to 1 or 0 to enable or disable them all, or to a bitmask as described in Documentation/sysrq.txt.
+config MAGIC_SYSRQ_BREAK_EMULATION
- bool "Enable magic SysRq serial break emulation"
- depends on MAGIC_SYSRQ && SERIAL_CORE_CONSOLE
- default n
- help
If you say Y here, then you can use the character sequence ^B^R^K
to simulate a BREAK on the serial console. This is useful if for
some reason you cannot send a BREAK to your console's serial port.
For example, if you have a serial device server that cannot
send a BREAK. Enabling this feature can delay the delivery of
characters to the TTY because the ^B and a subsequent ^R will be
delayed until we know what the next character is.
If unsure, say N.
- config DEBUG_KERNEL bool "Kernel debugging" help
-- 2.1.0
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Mon, Mar 23, 2015 at 03:28:37PM +0000, Daniel Thompson wrote:
On 20/03/15 14:28, Dave Martin wrote:
On Wed, Mar 18, 2015 at 02:20:22PM +0000, Daniel Thompson wrote:
Currently the magic SysRq functions can accessed by sending a break on the serial port. Unfortunately some networked serial proxies make it difficult to send a break meaning SysRq functions cannot be used. This patch provides a workaround by allowing the (fairly unlikely) sequence of ^B^R^K characters to emulate a real break.
This is neat, but as it stands it feels like a bit of a hack. It would be preferable to make the magic string configurable, since almost any choice is going to upset somebody.
Since this also breaks the console (i.e., changes the behaviour) It should probably not be on by default: a command-line option or /proc/sys/kernel tweak should be required in order to turn it on. Otherwise, this is likely to get activated unconditionally in production kernels.
It hadn't really occurred to me that it would ever be a good idea to activate this for production kernels. Aren't these code paths rather
Maybe, but there's no way to force people not to. In general, many useful debugging options are left on in production kernels :)
This is useful (since production kernels are not really bug-free, and debugging tools are therefore useful), but it means that debugging options should be low-overhead and unobtrusive unless something extra is done explicitly to turn them on...
hot when the serial ports are running as super high speeds?
It's probably not that bad, since this processing is only done for the console and not other ports -- I don't know whether a serial port would ever be used for the the console and simultaneously for some other high- throughput purpose, since random printk spam is going to break most protocols...
That said if the magic string were configurable then it could simply default to the empty string and that would result in the serial break emulation being disabled.
A particular concern is that something other than a human user could be connected to the UART.
I also feel it doesn't really belong in this series. NMI doesn't require this in order to be useful, this patch doesn't require NMI, and anyway it's not specific to arm.
To be clear I included the patch in this series only because:
- I couldn't figure out any way to send a serial break to the ARM Foundation Model making it impossible for me to provoke SysRq actions from interrupt context,
Agreed, there's no direct way to do it (annoyingly).
Arguably that's a deficiency in the model, though that's not much help to you right now.
- SysRq-L is a really good way to test the code and, when launched from interrupt context proves that pre-emption by pseudo-NMI works.
I only really included the patch as a convenience for anyone wanting to do any runtime testing.
Sure, that's all fine.
[...]
Cheers ---Dave
To be clear I included the patch in this series only because:
- I couldn't figure out any way to send a serial break to the ARM Foundation Model making it impossible for me to provoke SysRq actions from interrupt context,
Agreed, there's no direct way to do it (annoyingly).
Arguably that's a deficiency in the model, though that's not much help to you right now.
If it isn't affecting real hardware and it's just for a flawed model then hack a check for a suitable symbol into the specific patches for the model and its serial driver and claim it was a break - don't dump it into the mainstream.
There are some "conventional" escapes platforms have used (ctrl and symbols for example such as ctrl-^) which are less likely to cause conflicts than sequences. Nevetheless its not something you want anywhere mainstream if you can avoid it because those symbols could be sent over a remote management modem or similar.
They may be better if it does need to end up in the core kernel.
Alan
Currently there is a quite a pile of code sitting in arch/x86/kernel/apic/hw_nmi.c to support safe all-cpu backtracing from NMI. The code is inaccessible to backtrace implementations for other architectures, which is a shame because they would probably like to be safe too.
Copy this code into printk, reworking it a little as we do so to make it easier to exploit as library code.
We'll port the x86 NMI backtrace logic to it in a later patch.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Cc: Steven Rostedt rostedt@goodmis.org --- include/linux/printk.h | 20 ++++++ init/Kconfig | 3 + kernel/printk/Makefile | 1 + kernel/printk/nmi_backtrace.c | 148 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 172 insertions(+) create mode 100644 kernel/printk/nmi_backtrace.c
diff --git a/include/linux/printk.h b/include/linux/printk.h index baa3f97d8ce8..44bb85ad1f62 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -228,6 +228,26 @@ static inline void show_regs_print_info(const char *log_lvl) } #endif
+#ifdef CONFIG_PRINTK_NMI_BACKTRACE +/* + * printk_nmi_backtrace_prepare/complete are called to prepare the + * system for some or all cores to issue trace from NMI. + * printk_nmi_backtrace_complete will print buffered output and cannot + * (safely) be called from NMI. + */ +extern int printk_nmi_backtrace_prepare(void); +extern void printk_nmi_backtrace_complete(void); + +/* + * printk_nmi_backtrace_this_cpu_begin/end are used divert/restore printk + * on this cpu. The result is the output of printk() (by this CPU) will be + * stored in temporary buffers for later printing by + * printk_nmi_backtrace_complete. + */ +extern void printk_nmi_backtrace_this_cpu_begin(void); +extern void printk_nmi_backtrace_this_cpu_end(void); +#endif + extern asmlinkage void dump_stack(void) __cold;
#ifndef pr_fmt diff --git a/init/Kconfig b/init/Kconfig index f5dbc6d4261b..0107e9b4d2cf 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1421,6 +1421,9 @@ config PRINTK very difficult to diagnose system problems, saying N here is strongly discouraged.
+config PRINTK_NMI_BACKTRACE + bool + config BUG bool "BUG() support" if EXPERT default y diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile index 85405bdcf2b3..1849b001384a 100644 --- a/kernel/printk/Makefile +++ b/kernel/printk/Makefile @@ -1,2 +1,3 @@ obj-y = printk.o +obj-$(CONFIG_PRINTK_NMI_BACKTRACE) += nmi_backtrace.o obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o diff --git a/kernel/printk/nmi_backtrace.c b/kernel/printk/nmi_backtrace.c new file mode 100644 index 000000000000..e9a06929c4f3 --- /dev/null +++ b/kernel/printk/nmi_backtrace.c @@ -0,0 +1,148 @@ +#include <linux/kernel.h> +#include <linux/seq_buf.h> + +#define NMI_BUF_SIZE 4096 + +struct nmi_seq_buf { + unsigned char buffer[NMI_BUF_SIZE]; + struct seq_buf seq; +}; + +/* Safe printing in NMI context */ +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq); + +static DEFINE_PER_CPU(printk_func_t, nmi_print_saved_print_func); + +/* "in progress" flag of NMI printing */ +static unsigned long nmi_print_flag; + +static int __init printk_nmi_backtrace_init(void) +{ + struct nmi_seq_buf *s; + int cpu; + + for_each_possible_cpu(cpu) { + s = &per_cpu(nmi_print_seq, cpu); + seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE); + } + + return 0; +} +pure_initcall(printk_nmi_backtrace_init); + +/* + * It is not safe to call printk() directly from NMI handlers. + * It may be fine if the NMI detected a lock up and we have no choice + * but to do so, but doing a NMI on all other CPUs to get a back trace + * can be done with a sysrq-l. We don't want that to lock up, which + * can happen if the NMI interrupts a printk in progress. + * + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes + * the content into a per cpu seq_buf buffer. Then when the NMIs are + * all done, we can safely dump the contents of the seq_buf to a printk() + * from a non NMI context. + * + * This is not a generic printk() implementation and must be used with + * great care. In particular there is a static limit on the quantity of + * data that may be emitted during NMI, only one client can be active at + * one time (arbitrated by the return value of printk_nmi_begin() and + * it is required that something at task or interrupt context be scheduled + * to issue the output. + */ +static int nmi_vprintk(const char *fmt, va_list args) +{ + struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq); + unsigned int len = seq_buf_used(&s->seq); + + seq_buf_vprintf(&s->seq, fmt, args); + return seq_buf_used(&s->seq) - len; +} + +/* + * Reserve the NMI printk mechanism. Return an error if some other component + * is already using it. + */ +int printk_nmi_backtrace_prepare(void) +{ + if (test_and_set_bit(0, &nmi_print_flag)) { + /* + * If something is already using the NMI print facility we + * can't allow a second one... + */ + return -EBUSY; + } + + return 0; +} + +static void print_seq_line(struct nmi_seq_buf *s, int start, int end) +{ + const char *buf = s->buffer + start; + + printk("%.*s", (end - start) + 1, buf); +} + +void printk_nmi_backtrace_complete(void) +{ + struct nmi_seq_buf *s; + int len, cpu, i, last_i; + + /* + * Now that all the NMIs have triggered, we can dump out their + * back traces safely to the console. + */ + for_each_possible_cpu(cpu) { + s = &per_cpu(nmi_print_seq, cpu); + last_i = 0; + + len = seq_buf_used(&s->seq); + if (!len) + continue; + + /* Print line by line. */ + for (i = 0; i < len; i++) { + if (s->buffer[i] == '\n') { + print_seq_line(s, last_i, i); + last_i = i + 1; + } + } + /* Check if there was a partial line. */ + if (last_i < len) { + print_seq_line(s, last_i, len - 1); + pr_cont("\n"); + } + + /* Wipe out the buffer ready for the next time around. */ + seq_buf_clear(&s->seq); + } + + clear_bit(0, &nmi_print_flag); + smp_mb__after_atomic(); +} + +void printk_nmi_backtrace_this_cpu_begin(void) +{ + /* + * Detect double-begins and report them. This code is unsafe (because + * it will print from NMI) but things are pretty badly damaged if the + * NMI re-enters and is somehow granted permission to use NMI printk, + * so how much worse can it get? Also since this code interferes with + * the operation of printk it is unlikely that any consequential + * failures will be able to log anything making this our last + * opportunity to tell anyone that something is wrong. + */ + if (this_cpu_read(nmi_print_saved_print_func)) { + this_cpu_write(printk_func, + this_cpu_read(nmi_print_saved_print_func)); + BUG(); + } + + this_cpu_write(nmi_print_saved_print_func, this_cpu_read(printk_func)); + this_cpu_write(printk_func, nmi_vprintk); +} + +void printk_nmi_backtrace_this_cpu_end(void) +{ + this_cpu_write(printk_func, this_cpu_read(nmi_print_saved_print_func)); + this_cpu_write(nmi_print_saved_print_func, NULL); +}
On Wed, 18 Mar 2015 14:20:23 +0000 Daniel Thompson daniel.thompson@linaro.org wrote:
Currently there is a quite a pile of code sitting in arch/x86/kernel/apic/hw_nmi.c to support safe all-cpu backtracing from NMI. The code is inaccessible to backtrace implementations for other architectures, which is a shame because they would probably like to be safe too.
Copy this code into printk, reworking it a little as we do so to make it easier to exploit as library code.
We'll port the x86 NMI backtrace logic to it in a later patch.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Cc: Steven Rostedt rostedt@goodmis.org
include/linux/printk.h | 20 ++++++ init/Kconfig | 3 + kernel/printk/Makefile | 1 + kernel/printk/nmi_backtrace.c | 148 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 172 insertions(+) create mode 100644 kernel/printk/nmi_backtrace.c
diff --git a/include/linux/printk.h b/include/linux/printk.h index baa3f97d8ce8..44bb85ad1f62 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -228,6 +228,26 @@ static inline void show_regs_print_info(const char *log_lvl) } #endif +#ifdef CONFIG_PRINTK_NMI_BACKTRACE +/*
- printk_nmi_backtrace_prepare/complete are called to prepare the
- system for some or all cores to issue trace from NMI.
- printk_nmi_backtrace_complete will print buffered output and cannot
- (safely) be called from NMI.
- */
+extern int printk_nmi_backtrace_prepare(void); +extern void printk_nmi_backtrace_complete(void);
+/*
- printk_nmi_backtrace_this_cpu_begin/end are used divert/restore printk
- on this cpu. The result is the output of printk() (by this CPU) will be
- stored in temporary buffers for later printing by
- printk_nmi_backtrace_complete.
- */
+extern void printk_nmi_backtrace_this_cpu_begin(void); +extern void printk_nmi_backtrace_this_cpu_end(void); +#endif
extern asmlinkage void dump_stack(void) __cold; #ifndef pr_fmt diff --git a/init/Kconfig b/init/Kconfig index f5dbc6d4261b..0107e9b4d2cf 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1421,6 +1421,9 @@ config PRINTK very difficult to diagnose system problems, saying N here is strongly discouraged. +config PRINTK_NMI_BACKTRACE
- bool
config BUG bool "BUG() support" if EXPERT default y diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile index 85405bdcf2b3..1849b001384a 100644 --- a/kernel/printk/Makefile +++ b/kernel/printk/Makefile @@ -1,2 +1,3 @@ obj-y = printk.o +obj-$(CONFIG_PRINTK_NMI_BACKTRACE) += nmi_backtrace.o obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o diff --git a/kernel/printk/nmi_backtrace.c b/kernel/printk/nmi_backtrace.c new file mode 100644 index 000000000000..e9a06929c4f3 --- /dev/null +++ b/kernel/printk/nmi_backtrace.c @@ -0,0 +1,148 @@ +#include <linux/kernel.h> +#include <linux/seq_buf.h>
+#define NMI_BUF_SIZE 4096
+struct nmi_seq_buf {
- unsigned char buffer[NMI_BUF_SIZE];
- struct seq_buf seq;
+};
+/* Safe printing in NMI context */ +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
+static DEFINE_PER_CPU(printk_func_t, nmi_print_saved_print_func);
+/* "in progress" flag of NMI printing */ +static unsigned long nmi_print_flag;
+static int __init printk_nmi_backtrace_init(void) +{
- struct nmi_seq_buf *s;
- int cpu;
- for_each_possible_cpu(cpu) {
s = &per_cpu(nmi_print_seq, cpu);
seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
- }
- return 0;
+} +pure_initcall(printk_nmi_backtrace_init);
+/*
- It is not safe to call printk() directly from NMI handlers.
- It may be fine if the NMI detected a lock up and we have no choice
- but to do so, but doing a NMI on all other CPUs to get a back trace
- can be done with a sysrq-l. We don't want that to lock up, which
- can happen if the NMI interrupts a printk in progress.
- Instead, we redirect the vprintk() to this nmi_vprintk() that writes
- the content into a per cpu seq_buf buffer. Then when the NMIs are
- all done, we can safely dump the contents of the seq_buf to a printk()
- from a non NMI context.
- This is not a generic printk() implementation and must be used with
- great care. In particular there is a static limit on the quantity of
- data that may be emitted during NMI, only one client can be active at
- one time (arbitrated by the return value of printk_nmi_begin() and
- it is required that something at task or interrupt context be scheduled
- to issue the output.
- */
+static int nmi_vprintk(const char *fmt, va_list args) +{
- struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
- unsigned int len = seq_buf_used(&s->seq);
- seq_buf_vprintf(&s->seq, fmt, args);
- return seq_buf_used(&s->seq) - len;
+}
+/*
- Reserve the NMI printk mechanism. Return an error if some other component
- is already using it.
- */
+int printk_nmi_backtrace_prepare(void) +{
- if (test_and_set_bit(0, &nmi_print_flag)) {
/*
* If something is already using the NMI print facility we
* can't allow a second one...
*/
return -EBUSY;
- }
- return 0;
+}
+static void print_seq_line(struct nmi_seq_buf *s, int start, int end) +{
- const char *buf = s->buffer + start;
- printk("%.*s", (end - start) + 1, buf);
+}
+void printk_nmi_backtrace_complete(void) +{
- struct nmi_seq_buf *s;
- int len, cpu, i, last_i;
- /*
* Now that all the NMIs have triggered, we can dump out their
* back traces safely to the console.
*/
- for_each_possible_cpu(cpu) {
s = &per_cpu(nmi_print_seq, cpu);
last_i = 0;
len = seq_buf_used(&s->seq);
if (!len)
continue;
/* Print line by line. */
for (i = 0; i < len; i++) {
if (s->buffer[i] == '\n') {
print_seq_line(s, last_i, i);
last_i = i + 1;
}
}
/* Check if there was a partial line. */
if (last_i < len) {
print_seq_line(s, last_i, len - 1);
pr_cont("\n");
}
/* Wipe out the buffer ready for the next time around. */
seq_buf_clear(&s->seq);
- }
- clear_bit(0, &nmi_print_flag);
- smp_mb__after_atomic();
Is this really necessary. What is the mb synchronizing?
[ Added Peter Zijlstra to confirm it's not needed ]
-- Steve
+}
+void printk_nmi_backtrace_this_cpu_begin(void) +{
- /*
* Detect double-begins and report them. This code is unsafe (because
* it will print from NMI) but things are pretty badly damaged if the
* NMI re-enters and is somehow granted permission to use NMI printk,
* so how much worse can it get? Also since this code interferes with
* the operation of printk it is unlikely that any consequential
* failures will be able to log anything making this our last
* opportunity to tell anyone that something is wrong.
*/
- if (this_cpu_read(nmi_print_saved_print_func)) {
this_cpu_write(printk_func,
this_cpu_read(nmi_print_saved_print_func));
BUG();
- }
- this_cpu_write(nmi_print_saved_print_func, this_cpu_read(printk_func));
- this_cpu_write(printk_func, nmi_vprintk);
+}
+void printk_nmi_backtrace_this_cpu_end(void) +{
- this_cpu_write(printk_func, this_cpu_read(nmi_print_saved_print_func));
- this_cpu_write(nmi_print_saved_print_func, NULL);
+}
On Thu, Mar 19, 2015 at 01:39:58PM -0400, Steven Rostedt wrote:
+void printk_nmi_backtrace_complete(void) +{
- struct nmi_seq_buf *s;
- int len, cpu, i, last_i;
- /*
* Now that all the NMIs have triggered, we can dump out their
* back traces safely to the console.
*/
- for_each_possible_cpu(cpu) {
s = &per_cpu(nmi_print_seq, cpu);
last_i = 0;
len = seq_buf_used(&s->seq);
if (!len)
continue;
/* Print line by line. */
for (i = 0; i < len; i++) {
if (s->buffer[i] == '\n') {
print_seq_line(s, last_i, i);
last_i = i + 1;
}
}
/* Check if there was a partial line. */
if (last_i < len) {
print_seq_line(s, last_i, len - 1);
pr_cont("\n");
}
/* Wipe out the buffer ready for the next time around. */
seq_buf_clear(&s->seq);
- }
- clear_bit(0, &nmi_print_flag);
- smp_mb__after_atomic();
Is this really necessary. What is the mb synchronizing?
[ Added Peter Zijlstra to confirm it's not needed ]
It surely looks suspect; and it lacks a comment, which is a clear sign its buggy.
Now it if tries to order the accesses to the seqbuf againt the clearing of the bit one would have expected a _before_ barrier, not an _after_.
On 19/03/15 18:30, Peter Zijlstra wrote:
On Thu, Mar 19, 2015 at 01:39:58PM -0400, Steven Rostedt wrote:
+void printk_nmi_backtrace_complete(void) +{
- struct nmi_seq_buf *s;
- int len, cpu, i, last_i;
- /*
* Now that all the NMIs have triggered, we can dump out their
* back traces safely to the console.
*/
- for_each_possible_cpu(cpu) {
s = &per_cpu(nmi_print_seq, cpu);
last_i = 0;
len = seq_buf_used(&s->seq);
if (!len)
continue;
/* Print line by line. */
for (i = 0; i < len; i++) {
if (s->buffer[i] == '\n') {
print_seq_line(s, last_i, i);
last_i = i + 1;
}
}
/* Check if there was a partial line. */
if (last_i < len) {
print_seq_line(s, last_i, len - 1);
pr_cont("\n");
}
/* Wipe out the buffer ready for the next time around. */
seq_buf_clear(&s->seq);
- }
- clear_bit(0, &nmi_print_flag);
- smp_mb__after_atomic();
Is this really necessary. What is the mb synchronizing?
[ Added Peter Zijlstra to confirm it's not needed ]
It surely looks suspect; and it lacks a comment, which is a clear sign its buggy.
Now it if tries to order the accesses to the seqbuf againt the clearing of the bit one would have expected a _before_ barrier, not an _after_.
It's nothing to do with the seqbuf since I added the seqbuf code myself but the barrier was already in the code that I copied from.
In the mainline code today it looks like this as part of the x86 code (note that call to put_cpu() in my patchset but it lives in the arch/ specific code rather than the generic code):
: /* Check if there was a partial line. */ : if (last_i < len) { : print_seq_line(s, last_i, len - 1); : pr_cont("\n"); : } : } : : clear_bit(0, &backtrace_flag); : smp_mb__after_atomic(); : put_cpu(); : }
The barrier was not intended to have anything to do with put_cpu() either though since the barrier was added before put_cpu() arrived: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5...
There's nothing in the commit comment explaining the barrier and I really can't see what it is for.
Daniel.
On Thu, 19 Mar 2015 18:48:10 +0000 Daniel Thompson daniel.thompson@linaro.org wrote: \
The barrier was not intended to have anything to do with put_cpu() either though since the barrier was added before put_cpu() arrived: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5...
There's nothing in the commit comment explaining the barrier and I really can't see what it is for.
Looks like it wasn't needed then either.
-- Steve
On 19/03/15 19:01, Steven Rostedt wrote:
On Thu, 19 Mar 2015 18:48:10 +0000 Daniel Thompson daniel.thompson@linaro.org wrote: \
The barrier was not intended to have anything to do with put_cpu() either though since the barrier was added before put_cpu() arrived: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5...
There's nothing in the commit comment explaining the barrier and I really can't see what it is for.
Looks like it wasn't needed then either.
Agreed. I'll respin the patchset with the barrier removed.
Daniel.
Currently, when running on FVP, CPU 0 boots up with its BPR changed from the reset value. This renders it impossible to (preemptively) prioritize interrupts on CPU 0.
This is harmless on normal systems but prevents preemption by NMIs on systems with CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS enabled.
Many thanks to Andrew Thoelke for suggesting the BPR as having the potential to harm preemption.
Suggested-by: Andrew Thoelke andrew.thoelke@arm.com Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- drivers/irqchip/irq-gic-v3.c | 13 +++++++++++++ include/linux/irqchip/arm-gic-v3.h | 2 ++ 2 files changed, 15 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index fd8850def1b8..32533650494c 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -120,6 +120,11 @@ static void __maybe_unused gic_write_pmr(u64 val) asm volatile("msr_s " __stringify(ICC_PMR_EL1) ", %0" : : "r" (val)); }
+static void __maybe_unused gic_write_bpr1(u64 val) +{ + asm volatile("msr_s " __stringify(ICC_BPR1_EL1) ", %0" : : "r" (val)); +} + static void __maybe_unused gic_write_ctlr(u64 val) { asm volatile("msr_s " __stringify(ICC_CTLR_EL1) ", %0" : : "r" (val)); @@ -383,6 +388,14 @@ static void gic_cpu_sys_reg_init(void) /* Set priority mask register */ gic_write_pmr(DEFAULT_PMR_VALUE);
+ /* + * On FVP, CPU 0 arrives in the kernel with its BPR changed from the + * reset value (and the value is large enough to prevent pre-emptive + * interrupts from working at all). Writing a zero to BPR restores the + * reset value. + */ + gic_write_bpr1(0); + /* EOI deactivates interrupt too (mode 0) */ gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 781974afff9f..79d6645897e6 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -270,6 +270,8 @@ #define ICH_VMCR_PMR_SHIFT 24 #define ICH_VMCR_PMR_MASK (0xffUL << ICH_VMCR_PMR_SHIFT)
+#define ICC_BPR0_EL1 sys_reg(3, 0, 12, 8, 3) +#define ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3) #define ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1) #define ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0) #define ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5)
Separate out the local fiq & async macros from the various arch inlines. This makes is easier for us (in a later patch) to provide an alternative implementation of these inlines.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/include/asm/irqflags.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index 11cc941bd107..df7477af6389 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -53,12 +53,6 @@ static inline void arch_local_irq_disable(void) : "memory"); }
-#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") -#define local_fiq_disable() asm("msr daifset, #1" : : : "memory") - -#define local_async_enable() asm("msr daifclr, #4" : : : "memory") -#define local_async_disable() asm("msr daifset, #4" : : : "memory") - /* * Save the current interrupt enable state. */ @@ -90,6 +84,12 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) return flags & PSR_I_BIT; }
+#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") +#define local_fiq_disable() asm("msr daifset, #1" : : : "memory") + +#define local_async_enable() asm("msr daifclr, #4" : : : "memory") +#define local_async_disable() asm("msr daifset, #4" : : : "memory") + /* * save and restore debug state */
Currently irqflags is implemented using the PSR's I bit. It is possible to implement irqflags by using the co-processor interface to the GIC. Using the co-processor interface makes it feasible to simulate NMIs using GIC interrupt prioritization.
This patch changes the irqflags macros to modify, save and restore ICC_PMR_EL1. This has a substantial knock on effect for the rest of the kernel. There are three reasons for this:
1. The state of the ICC_PMR_EL1_G_BIT becomes part of the CPU context and must be saved and restored during traps. To simplify the additional context management the ICC_PMR_EL1_G_BIT is converted into a fake (reserved) bit within the PSR (PSR_G_BIT). Naturally this approach will need to be changed if future ARM architecture extensions make use of this bit.
2. The hardware automatically masks the I bit (at boot, during traps, etc). When the I bit is set by hardware we must add code to switch from I bit masking and PMR masking.
3. Some instructions, noteably wfi, require that the PMR not be used for interrupt masking. Before calling these instructions we must switch from PMR masking to I bit masking.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/Kconfig | 15 ++++++ arch/arm64/include/asm/assembler.h | 72 ++++++++++++++++++++++++++--- arch/arm64/include/asm/irqflags.h | 89 ++++++++++++++++++++++++++++++++++++ arch/arm64/include/asm/ptrace.h | 10 ++++ arch/arm64/include/uapi/asm/ptrace.h | 8 ++++ arch/arm64/kernel/entry.S | 70 ++++++++++++++++++++++++---- arch/arm64/kernel/head.S | 27 +++++++++++ arch/arm64/mm/cache.S | 4 +- arch/arm64/mm/proc.S | 19 ++++++++ drivers/irqchip/irq-gic-v3.c | 29 +++++++++++- include/linux/irqchip/arm-gic-v3.h | 4 +- include/linux/irqchip/arm-gic.h | 2 +- 12 files changed, 329 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0725a6051872..91c0babb6132 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -611,6 +611,21 @@ config SETEND_EMULATION If unsure, say Y endif
+config USE_ICC_SYSREGS_FOR_IRQFLAGS + bool "Use ICC system registers for IRQ masking" + select CONFIG_ARM_GIC_V3 + help + Using the ICC system registers for IRQ masking makes it possible + to simulate NMI on ARM64 systems. This allows several interesting + features (especially debug features) to be used on these systems. + + Say Y here to implement IRQ masking using ICC system + registers. This will result in an unbootable kernel if these + registers are not implemented or made inaccessible by the + EL3 firmare or EL2 hypervisor (if present). + + If unsure, say N + endmenu
menu "Boot options" diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 750bac4e637e..0a0a97f3c3c1 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -23,6 +23,7 @@ #ifndef __ASM_ASSEMBLER_H #define __ASM_ASSEMBLER_H
+#include <linux/irqchip/arm-gic-v3.h> #include <asm/ptrace.h> #include <asm/thread_info.h>
@@ -39,26 +40,79 @@ .endm
/* + * Enable and disable pseudo NMI. + */ + .macro disable_nmi +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + msr daifset, #2 +#endif + .endm + + .macro enable_nmi +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + msr daifclr, #2 +#endif + .endm + +/* + * Save/disable and restore pseudo NMI. + */ + .macro save_and_disable_nmis, olddaif +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mrs \olddaif, daif /* Get flags */ + disable_nmi +#endif + .endm + + .macro restore_nmis, olddaif +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + msr daif, \olddaif +#endif + .endm + +/* * Enable and disable interrupts. */ - .macro disable_irq + .macro disable_irq, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mov \tmp, #ICC_PMR_EL1_MASKED + msr_s ICC_PMR_EL1, \tmp + isb +#else msr daifset, #2 +#endif .endm
- .macro enable_irq + .macro enable_irq, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + enable_nmi + mov \tmp, #ICC_PMR_EL1_UNMASKED + msr_s ICC_PMR_EL1, \tmp + isb +#else msr daifclr, #2 +#endif .endm
/* * Save/disable and restore interrupts. */ - .macro save_and_disable_irqs, olddaif - mrs \olddaif, daif - disable_irq + .macro save_and_disable_irqs, olddaif, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mrs_s \olddaif, ICC_PMR_EL1 /* Get PMR */ +#else + mrs \olddaif, daif /* Get flags */ +#endif + disable_irq \tmp .endm
.macro restore_irqs, olddaif +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + msr_s ICC_PMR_EL1, \olddaif /* Write to PMR */ + isb +#else msr daif, \olddaif +#endif .endm
/* @@ -90,13 +144,19 @@ 9990: .endm
+ /* * Enable both debug exceptions and interrupts. This is likely to be * faster than two daifclr operations, since writes to this register * are self-synchronising. */ - .macro enable_dbg_and_irq + .macro enable_dbg_and_irq, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + enable_dbg + enable_irq \tmp +#else msr daifclr, #(8 | 2) +#endif .endm
/* diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index df7477af6389..7b6866022f82 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -20,6 +20,8 @@
#include <asm/ptrace.h>
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + /* * CPU interrupt mask handling. */ @@ -84,6 +86,93 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) return flags & PSR_I_BIT; }
+#else /* CONFIG_IRQFLAGS_GIC_MASKING */ + +#include <linux/irqchip/arm-gic-v3.h> + +/* + * CPU interrupt mask handling. + */ +static inline unsigned long arch_local_irq_save(void) +{ + unsigned long flags, masked = ICC_PMR_EL1_MASKED; + + asm volatile( + "// arch_local_irq_save\n" + "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n" + "msr_s " __stringify(ICC_PMR_EL1) ",%1\n" + "isb\n" + : "=&r" (flags) + : "r" (masked) + : "memory"); + + return flags; +} + +static inline void arch_local_irq_enable(void) +{ + unsigned long unmasked = ICC_PMR_EL1_UNMASKED; + + asm volatile( + "// arch_local_irq_enable\n" + "msr_s " __stringify(ICC_PMR_EL1) ",%0\n" + "isb\n" + : + : "r" (unmasked) + : "memory"); +} + +static inline void arch_local_irq_disable(void) +{ + unsigned long masked = ICC_PMR_EL1_MASKED; + + asm volatile( + "// arch_local_irq_disable\n" + "msr_s " __stringify(ICC_PMR_EL1) ",%0\n" + "isb\n" + : + : "r" (masked) + : "memory"); +} + +/* + * Save the current interrupt enable state. + */ +static inline unsigned long arch_local_save_flags(void) +{ + unsigned long flags; + + asm volatile( + "// arch_local_save_flags\n" + "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n" + : "=r" (flags) + : + : "memory"); + + return flags; +} + +/* + * restore saved IRQ state + */ +static inline void arch_local_irq_restore(unsigned long flags) +{ + asm volatile( + "// arch_local_irq_restore\n" + "msr_s " __stringify(ICC_PMR_EL1) ",%0\n" + "isb\n" + : + : "r" (flags) + : "memory"); +} + +static inline int arch_irqs_disabled_flags(unsigned long flags) +{ + return !(flags & ICC_PMR_EL1_G_BIT); +} + +#endif /* CONFIG_IRQFLAGS_GIC_MASKING */ + #define local_fiq_enable() asm("msr daifclr, #1" : : : "memory") #define local_fiq_disable() asm("msr daifset, #1" : : : "memory")
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index a379596e0888..da4c593b596d 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -25,6 +25,16 @@ #define CurrentEL_EL1 (1 << 2) #define CurrentEL_EL2 (2 << 2)
+/* PMR values used to mask/unmask interrupts */ +#define ICC_PMR_EL1_G_SHIFT 6 +#define ICC_PMR_EL1_G_BIT (1 << ICC_PMR_EL1_G_SHIFT) +#define ICC_PMR_EL1_UNMASKED 0xf0 +#define ICC_PMR_EL1_MASKED (ICC_PMR_EL1_UNMASKED ^ ICC_PMR_EL1_G_BIT) + +#define PSR_G_SHIFT 22 +#define PSR_G_PMR_G_SHIFT (PSR_G_SHIFT - ICC_PMR_EL1_G_SHIFT) +#define PSR_I_PMR_G_SHIFT (7 - ICC_PMR_EL1_G_SHIFT) + /* AArch32-specific ptrace requests */ #define COMPAT_PTRACE_GETREGS 12 #define COMPAT_PTRACE_SETREGS 13 diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 6913643bbe54..6c640d6430d8 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -51,6 +51,14 @@ #define PSR_N_BIT 0x80000000
/* + * This is the GIC interrupt mask bit. It is not actually part of the + * PSR, we are simply using some reserved bits in the PSR to store some state + * from the interrupt controller. The context save/restore functions will + * extract the ICC_PMR_EL1_G_BIT and save it as the PSR_G_BIT. + */ +#define PSR_G_BIT 0x00400000 + +/* * Groups of PSR bits */ #define PSR_f 0xff000000 /* Flags */ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 0aa5747639f8..964573308e68 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -20,6 +20,7 @@
#include <linux/init.h> #include <linux/linkage.h> +#include <linux/irqchip/arm-gic-v3.h>
#include <asm/assembler.h> #include <asm/asm-offsets.h> @@ -94,6 +95,26 @@ .endif mrs x22, elr_el1 mrs x23, spsr_el1 +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + /* + * Save the context held in the PMR register and copy the current + * I bit state to the PMR. Re-enable of the I bit happens in later + * code that knows what type of trap we are handling. + */ + mrs_s x20, ICC_PMR_EL1 // Get PMR + and x20, x20, #ICC_PMR_EL1_G_BIT // Extract mask bit + lsl x20, x20, #PSR_G_PMR_G_SHIFT // Shift to a PSTATE RES0 bit + eor x20, x20, #PSR_G_BIT // Invert bit + orr x23, x20, x23 // Store PMR within PSTATE + orr x20, x23, #PSR_I_BIT // Extract I bit + .if PSR_I_PMR_G_SHIFT != 0 + lsr x20, x20, #PSR_I_PMR_G_SHIFT // Shift down to meet mask bit + .endif + eor x20, x20, #ICC_PMR_EL1_UNMASKED // If I set: 0xb0 else: 0xf0 + msr_s ICC_PMR_EL1, x20 // Write to PMR + isb +#endif + stp lr, x21, [sp, #S_LR] stp x22, x23, [sp, #S_PC]
@@ -121,6 +142,18 @@ ldr x23, [sp, #S_SP] // load return stack pointer msr sp_el0, x23 .endif +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + /* + * Restore the context to the PMR (and ensure the reserved bit is + * restored to zero before being copied back to the PSR). + */ + and x20, x22, #PSR_G_BIT // Get stolen PSTATE bit + and x22, x22, #~PSR_G_BIT // Clear stolen bit + lsr x20, x20, #PSR_G_PMR_G_SHIFT // Shift back to PMR mask + eor x20, x20, #ICC_PMR_EL1_UNMASKED // x20 gets 0xf0 or 0xb0 + msr_s ICC_PMR_EL1, x20 // Write to PMR + isb +#endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22 .if \ret @@ -288,16 +321,22 @@ el1_da: * Data abort handling */ mrs x0, far_el1 + enable_nmi enable_dbg // re-enable interrupts if they were enabled in the aborted context +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + tbnz x23, #PSR_G_SHIFT, 1f // PSR_G_BIT +#else tbnz x23, #7, 1f // PSR_I_BIT - enable_irq +#endif + enable_irq x2 1: mov x2, sp // struct pt_regs bl do_mem_abort
// disable interrupts before pulling preserved data off the stack - disable_irq + disable_irq x21 + disable_nmi kernel_exit 1 el1_sp_pc: /* @@ -337,6 +376,7 @@ ENDPROC(el1_sync) .align 6 el1_irq: kernel_entry 1 + enable_nmi enable_dbg #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_off @@ -356,6 +396,7 @@ el1_irq: #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_on #endif + disable_nmi kernel_exit 1 ENDPROC(el1_irq)
@@ -450,7 +491,8 @@ el0_da: */ mrs x26, far_el1 // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_nmi + enable_dbg_and_irq x0 ct_user_exit bic x0, x26, #(0xff << 56) mov x1, x25 @@ -463,7 +505,8 @@ el0_ia: */ mrs x26, far_el1 // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_nmi + enable_dbg_and_irq x0 ct_user_exit mov x0, x26 orr x1, x25, #1 << 24 // use reserved ISS bit for instruction aborts @@ -496,7 +539,8 @@ el0_sp_pc: */ mrs x26, far_el1 // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_nmi + enable_dbg_and_irq x0 mov x0, x26 mov x1, x25 mov x2, sp @@ -507,7 +551,8 @@ el0_undef: * Undefined instruction */ // enable interrupts before calling the main handler - enable_dbg_and_irq + enable_nmi + enable_dbg_and_irq x0 ct_user_exit mov x0, sp bl do_undefinstr @@ -538,6 +583,7 @@ ENDPROC(el0_sync) el0_irq: kernel_entry 0 el0_irq_naked: + enable_nmi enable_dbg #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_off @@ -587,11 +633,12 @@ ENDPROC(cpu_switch_to) * and this includes saving x0 back into the kernel stack. */ ret_fast_syscall: - disable_irq // disable interrupts + disable_irq x21 // disable interrupts ldr x1, [tsk, #TI_FLAGS] and x2, x1, #_TIF_WORK_MASK cbnz x2, fast_work_pending enable_step_tsk x1, x2 + disable_nmi kernel_exit 0, ret = 1
/* @@ -606,7 +653,8 @@ work_pending: mov x0, sp // 'regs' tst x2, #PSR_MODE_MASK // user mode regs? b.ne no_work_pending // returning to kernel - enable_irq // enable interrupts for do_notify_resume() + enable_nmi + enable_irq x21 // enable interrupts for do_notify_resume() bl do_notify_resume b ret_to_user work_resched: @@ -616,12 +664,13 @@ work_resched: * "slow" syscall return path. */ ret_to_user: - disable_irq // disable interrupts + disable_irq x21 // disable interrupts ldr x1, [tsk, #TI_FLAGS] and x2, x1, #_TIF_WORK_MASK cbnz x2, work_pending enable_step_tsk x1, x2 no_work_pending: + disable_nmi kernel_exit 0, ret = 0 ENDPROC(ret_to_user)
@@ -652,7 +701,8 @@ el0_svc: adrp stbl, sys_call_table // load syscall table pointer el0_svc_naked: // compat entry point stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number - enable_dbg_and_irq + enable_nmi + enable_dbg_and_irq x16 ct_user_exit 1
ldr x16, [tsk, #TI_FLAGS] // check for syscall hooks diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 9a4ef52abf7c..6226d24f8107 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -445,6 +445,30 @@ __switch_data: .quad init_thread_union + THREAD_START_SP // sp
/* + * Conditional macro to enable interrupt controller system register access. + * + * Before we jump into generic code we must enable interrupt controller system + * register access because this is required by the irqflags macros. We must + * also mask interrupts at the PMR and unmask them within the PSR. That leaves + * us set up and ready for the kernel to make its first call to + * arch_local_irq_enable(). + * + * Corrupts: tmp + */ + .macro enable_icc_sysregs, tmp +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mrs_s \tmp, ICC_SRE_EL1 + orr \tmp, \tmp, #ICC_SRE_EL1_SRE + msr_s ICC_SRE_EL1, \tmp // Set ICC_SRE_EL1.SRE==1 + isb // Make sure SRE is now set + mov \tmp, ICC_PMR_EL1_MASKED + msr_s ICC_PMR_EL1, \tmp // Prepare for unmask of I bit + isb + msr daifclr, #2 // Clear the I bit +#endif + .endm + +/* * The following fragment of code is executed with the MMU on in MMU mode, and * uses absolute addresses; this is not position independent. */ @@ -464,6 +488,8 @@ __mmap_switched: str x22, [x4] // Save processor ID str x21, [x5] // Save FDT pointer str x24, [x6] // Save PHYS_OFFSET + + enable_icc_sysregs x29 // May be nop mov x29, #0 b start_kernel ENDPROC(__mmap_switched) @@ -652,6 +678,7 @@ ENDPROC(secondary_startup) ENTRY(__secondary_switched) ldr x0, [x21] // get secondary_data.stack mov sp, x0 + enable_icc_sysregs x29 // may be nop mov x29, #0 b secondary_start_kernel ENDPROC(__secondary_switched) diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 2560e1e1562e..f34aab45f948 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -46,10 +46,12 @@ loop1: and x1, x1, #7 // mask of the bits for current cache only cmp x1, #2 // see what cache we have at this level b.lt skip // skip if no cache, or just i-cache - save_and_disable_irqs x9 // make CSSELR and CCSIDR access atomic + save_and_disable_irqs x9, x4 // make CSSELR and CCSIDR access atomic + save_and_disable_nmis x4 msr csselr_el1, x10 // select current cache level in csselr isb // isb to sych the new cssr&csidr mrs x1, ccsidr_el1 // read the new ccsidr + restore_nmis x4 restore_irqs x9 and x2, x1, #7 // extract the length of the cache lines add x2, x2, #4 // add 4 (line length offset) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 28eebfb6af76..f91e66ecef80 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -20,6 +20,7 @@
#include <linux/init.h> #include <linux/linkage.h> +#include <linux/irqchip/arm-gic-v3.h> #include <asm/assembler.h> #include <asm/asm-offsets.h> #include <asm/hwcap.h> @@ -95,10 +96,28 @@ ENDPROC(cpu_soft_restart) * cpu_do_idle() * * Idle the processor (wait for interrupt). + * + * If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is set we must do additional + * work to ensure that interrupts are not masked at the PMR (because the + * core will not wake up if we block the wake up signal in the interrupt + * controller). */ ENTRY(cpu_do_idle) +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + mrs x0, daif // save I bit + msr daifset, #2 // set I bit + mrs_s x1, ICC_PMR_EL1 // save PMR + mov x2, #ICC_PMR_EL1_UNMASKED + msr_s ICC_PMR_EL1, x2 // unmask at PMR + isb +#endif dsb sy // WFI may enter a low-power mode wfi +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + msr_s ICC_PMR_EL1, x1 // restore PMR + isb + msr daif, x0 // restore I bit +#endif ret ENDPROC(cpu_do_idle)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 32533650494c..3923b2a2150c 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -110,8 +110,33 @@ static void gic_redist_wait_for_rwp(void) static u64 __maybe_unused gic_read_iar(void) { u64 irqstat; + u64 __maybe_unused daif; + u64 __maybe_unused pmr; + u64 __maybe_unused default_pmr_value = DEFAULT_PMR_VALUE;
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r" (irqstat)); +#else + /* + * The PMR may be configured to mask interrupts when this code is + * called, thus in order to acknowledge interrupts we must set the + * PMR to its default value before reading from the IAR. + * + * To do this without taking an interrupt we also ensure the I bit + * is set whilst we are interfering with the value of the PMR. + */ + asm volatile( + "mrs %1, daif\n" /* save I bit */ + "msr daifset, #2\n" /* set I bit */ + "mrs_s %2, " __stringify(ICC_PMR_EL1) "\n" /* save PMR */ + "msr_s " __stringify(ICC_PMR_EL1) ",%3\n" /* set PMR */ + "mrs_s %0, " __stringify(ICC_IAR1_EL1) "\n" /* ack int */ + "msr_s " __stringify(ICC_PMR_EL1) ",%2\n" /* restore PMR */ + "isb\n" + "msr daif, %1" /* restore I */ + : "=r" (irqstat), "=&r" (daif), "=&r" (pmr) + : "r" (default_pmr_value)); +#endif return irqstat; }
@@ -142,7 +167,7 @@ static void __maybe_unused gic_write_sgi1r(u64 val) asm volatile("msr_s " __stringify(ICC_SGI1R_EL1) ", %0" : : "r" (val)); }
-static void gic_enable_sre(void) +static void __maybe_unused gic_enable_sre(void) { u64 val;
@@ -382,11 +407,13 @@ static int gic_populate_rdist(void)
static void gic_cpu_sys_reg_init(void) { +#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS /* Enable system registers */ gic_enable_sre();
/* Set priority mask register */ gic_write_pmr(DEFAULT_PMR_VALUE); +#endif
/* * On FVP, CPU 0 arrives in the kernel with its BPR changed from the diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 79d6645897e6..67b90813b619 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -242,7 +242,7 @@ */ #define ICC_CTLR_EL1_EOImode_drop_dir (0U << 1) #define ICC_CTLR_EL1_EOImode_drop (1U << 1) -#define ICC_SRE_EL1_SRE (1U << 0) +#define ICC_SRE_EL1_SRE (1 << 0)
/* * Hypervisor interface registers (SRE only) @@ -346,6 +346,8 @@
#include <linux/stringify.h>
+struct device_node; + /* * We need a value to serve as a irq-type for LPIs. Choose one that will * hopefully pique the interest of the reviewer. diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h index 71d706d5f169..65340ce7b5af 100644 --- a/include/linux/irqchip/arm-gic.h +++ b/include/linux/irqchip/arm-gic.h @@ -49,7 +49,7 @@ #define GICD_INT_EN_CLR_X32 0xffffffff #define GICD_INT_EN_SET_SGI 0x0000ffff #define GICD_INT_EN_CLR_PPI 0xffff0000 -#define GICD_INT_DEF_PRI 0xa0 +#define GICD_INT_DEF_PRI 0xc0 #define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\ (GICD_INT_DEF_PRI << 16) |\ (GICD_INT_DEF_PRI << 8) |\
This is self-test code to identify circumstances where the I bit is set by hardware but no software exists to copy its state to the PMR.
I don't really expect this patch to be retained much after the RFC stage. However I have included it in this RFC series to document the testing I have done and to allow further testing under different workloads.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/include/asm/irqflags.h | 29 +++++++++++++++++++++++++++++ arch/arm64/kernel/irq.c | 6 ++++++ 2 files changed, 35 insertions(+)
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index 7b6866022f82..89be5f830857 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -18,6 +18,7 @@
#ifdef __KERNEL__
+#include <asm/bug.h> #include <asm/ptrace.h>
#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS @@ -90,6 +91,23 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
#include <linux/irqchip/arm-gic-v3.h>
+extern bool enable_i_bit_check; + +static inline void check_for_i_bit(void) +{ +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS_SELF_TEST + unsigned long psr; + + if (enable_i_bit_check) { + asm volatile("mrs %0, daif" : "=r"(psr)); + if (psr & PSR_I_BIT) { + enable_i_bit_check = false; + WARN(true, "I bit is set: %08lx\n", psr); + } + } +#endif +} + /* * CPU interrupt mask handling. */ @@ -97,6 +115,8 @@ static inline unsigned long arch_local_irq_save(void) { unsigned long flags, masked = ICC_PMR_EL1_MASKED;
+ check_for_i_bit(); + asm volatile( "// arch_local_irq_save\n" "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n" @@ -113,6 +133,8 @@ static inline void arch_local_irq_enable(void) { unsigned long unmasked = ICC_PMR_EL1_UNMASKED;
+ check_for_i_bit(); + asm volatile( "// arch_local_irq_enable\n" "msr_s " __stringify(ICC_PMR_EL1) ",%0\n" @@ -126,6 +148,8 @@ static inline void arch_local_irq_disable(void) { unsigned long masked = ICC_PMR_EL1_MASKED;
+ check_for_i_bit(); + asm volatile( "// arch_local_irq_disable\n" "msr_s " __stringify(ICC_PMR_EL1) ",%0\n" @@ -142,6 +166,8 @@ static inline unsigned long arch_local_save_flags(void) { unsigned long flags;
+ check_for_i_bit(); + asm volatile( "// arch_local_save_flags\n" "mrs_s %0, " __stringify(ICC_PMR_EL1) "\n" @@ -157,6 +183,8 @@ static inline unsigned long arch_local_save_flags(void) */ static inline void arch_local_irq_restore(unsigned long flags) { + check_for_i_bit(); + asm volatile( "// arch_local_irq_restore\n" "msr_s " __stringify(ICC_PMR_EL1) ",%0\n" @@ -168,6 +196,7 @@ static inline void arch_local_irq_restore(unsigned long flags)
static inline int arch_irqs_disabled_flags(unsigned long flags) { + check_for_i_bit(); return !(flags & ICC_PMR_EL1_G_BIT); }
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c index 240b75c0e94f..7d68193af26c 100644 --- a/arch/arm64/kernel/irq.c +++ b/arch/arm64/kernel/irq.c @@ -31,6 +31,12 @@
unsigned long irq_err_count;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS_SELF_TEST +/* enable_i_bit_check is declared in asm/irqflags.h */ +bool enable_i_bit_check = true; +EXPORT_SYMBOL(enable_i_bit_check); +#endif + int arch_show_interrupts(struct seq_file *p, int prec) { #ifdef CONFIG_SMP
Currently arm64 has no implementation of arch_trigger_all_cpu_backtace. The patch provides one for arm64 systems that are built with CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS (i.e. those that have a pseudo-NMI).
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/hardirq.h | 2 +- arch/arm64/include/asm/irq.h | 5 +++ arch/arm64/include/asm/smp.h | 4 +++ arch/arm64/kernel/smp.c | 70 ++++++++++++++++++++++++++++++++++++++++ drivers/irqchip/irq-gic-v3.c | 28 ++++++++++++++++ 6 files changed, 109 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 91c0babb6132..cb8d3dce73c3 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -76,6 +76,7 @@ config ARM64 select PERF_USE_VMALLOC select POWER_RESET select POWER_SUPPLY + select PRINTK_NMI_BACKTRACE if USE_ICC_SYSREGS_FOR_IRQFLAGS select RTC_LIB select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h index 6aae421f4d73..e8a3268a891c 100644 --- a/arch/arm64/include/asm/hardirq.h +++ b/arch/arm64/include/asm/hardirq.h @@ -20,7 +20,7 @@ #include <linux/threads.h> #include <asm/irq.h>
-#define NR_IPI 5 +#define NR_IPI 6
typedef struct { unsigned int __softirq_pending; diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h index 94c53674a31d..4f974a2959bb 100644 --- a/arch/arm64/include/asm/irq.h +++ b/arch/arm64/include/asm/irq.h @@ -8,4 +8,9 @@ struct pt_regs; extern void migrate_irqs(void); extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
+#if defined CONFIG_SMP && defined CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +extern void arch_trigger_all_cpu_backtrace(bool); +#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x) +#endif + #endif diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 780f82c827b6..9b3b5663b88b 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -24,6 +24,10 @@ # error "<asm/smp.h> included in non-SMP build" #endif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS +#define SMP_IPI_NMI_MASK (1 << 5) +#endif + #define raw_smp_processor_id() (current_thread_info()->cpu)
struct seq_file; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 328b8ce4b007..9369eee1fa9f 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -68,6 +68,7 @@ enum ipi_msg_type { IPI_CPU_STOP, IPI_TIMER, IPI_IRQ_WORK, + IPI_CPU_BACKTRACE, };
/* @@ -485,6 +486,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { S(IPI_CPU_STOP, "CPU stop interrupts"), S(IPI_TIMER, "Timer broadcast interrupts"), S(IPI_IRQ_WORK, "IRQ work interrupts"), + S(IPI_CPU_BACKTRACE, "backtrace interrupts"), };
static void smp_cross_call(const struct cpumask *target, unsigned int ipinr) @@ -559,6 +561,66 @@ static void ipi_cpu_stop(unsigned int cpu) cpu_relax(); }
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + +/* For reliability, we're prepared to waste bits here. */ +static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly; + +void arch_trigger_all_cpu_backtrace(bool include_self) +{ + int i; + int this_cpu = get_cpu(); + + if (0 != printk_nmi_backtrace_prepare()) { + /* + * If there is already an nmi printk sequence in + * progress then just give up... + */ + put_cpu(); + return; + } + + cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask); + if (!include_self) + cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask)); + + if (!cpumask_empty(to_cpumask(backtrace_mask))) { + pr_info("Sending NMI to %s CPUs:\n", + (include_self ? "all" : "other")); + smp_cross_call(to_cpumask(backtrace_mask), IPI_CPU_BACKTRACE); + } + + /* Wait for up to 10 seconds for all CPUs to do the backtrace */ + for (i = 0; i < 3 * 1000; i++) { + if (cpumask_empty(to_cpumask(backtrace_mask))) + break; + mdelay(1); + touch_softlockup_watchdog(); + } + + printk_nmi_backtrace_complete(); + put_cpu(); +} + +void ipi_cpu_backtrace(struct pt_regs *regs) +{ + int cpu = smp_processor_id(); + + BUILD_BUG_ON(SMP_IPI_NMI_MASK != BIT(IPI_CPU_BACKTRACE)); + + if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { + printk_nmi_backtrace_this_cpu_begin(); + pr_warn("NMI backtrace for cpu %d\n", cpu); + show_regs(regs); + show_stack(NULL, NULL); + printk_nmi_backtrace_this_cpu_end(); + + cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); + } +} + +#endif /* CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS */ + /* * Main handler for inter-processor interrupts */ @@ -605,6 +667,14 @@ void handle_IPI(int ipinr, struct pt_regs *regs) break; #endif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS + case IPI_CPU_BACKTRACE: + nmi_enter(); + ipi_cpu_backtrace(regs); + nmi_exit(); + break; +#endif + default: pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr); break; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 3923b2a2150c..bdb134ed4519 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -34,6 +34,16 @@ #include "irq-gic-common.h" #include "irqchip.h"
+#ifndef SMP_IPI_NMI_MASK +#define SMP_IPI_NMI_MASK 0 +#endif + +/* + * Copied from arm-gic.h (which we cannot include here because it conflicts + * with arm-gic-v3.h) + */ +#define GIC_DIST_PRI 0x400 + struct redist_region { void __iomem *redist_base; phys_addr_t phys_base; @@ -438,6 +448,7 @@ static int gic_dist_supports_lpis(void) static void gic_cpu_init(void) { void __iomem *rbase; + unsigned long nmimask, hwirq;
/* Register ourselves with the rest of the world */ if (gic_populate_rdist()) @@ -455,6 +466,23 @@ static void gic_cpu_init(void)
/* initialise system registers */ gic_cpu_sys_reg_init(); + + /* Boost the priority of any IPI in the mask */ + nmimask = SMP_IPI_NMI_MASK; + for_each_set_bit(hwirq, &nmimask, 16) { + unsigned int pri_reg = (hwirq / 4) * 4; + u32 pri_mask = BIT(6 + ((hwirq % 4) * 8)); + u32 pri_val = readl_relaxed(rbase + GIC_DIST_PRI + pri_reg); + u32 actual; + + pri_mask |= BIT(7 + ((hwirq % 4) * 8)); + pri_val &= ~pri_mask; /* priority boost */ + writel_relaxed(pri_val, rbase + GIC_DIST_PRI + pri_reg); + + actual = readl_relaxed(rbase + GIC_DIST_PRI + pri_reg); + } + gic_dist_wait_for_rwp(); + gic_redist_wait_for_rwp(); }
#ifdef CONFIG_SMP
On Wed, Mar 18, 2015 at 02:20:21PM +0000, Daniel Thompson wrote:
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The pseudo-NMI changes are support by a prototype implementation of arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Hi there,
Interesting series -- I've not gone into all the code in detail yet, but I have some high-level comments and questions first.
In addition to the arm64 changes I've bundled in a few patches from other patchsets to make the patchset self-contained. Of particular note of the serial break emulation patch which allows ^B^R^K to be used instead of a serial break to trigger SysRq-L (FVP UART sockets don't seem to support serial breaks). This makes it easy to run arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with interrupts masked so we are forced to preempt and take the NMI).
The code works-for-me (tm) but there are currently some pretty serious limitations.
Exercised only on the foundation model with gicv3 support. It has not been exercised on real silicon or even on the more advanced ARM models.
It has been written without any documentation describing GICv3 architecture (which has not yet been released by ARM). I've been guessing about the behaviour based on the ARMv8 and GICv2 architecture specs. The code works on the foundation model but I cannot check that it conforms architecturally.
Review is the best way to approact that for now. If the code can be demonstrated on the model, that's a good start.
- Requires GICv3+ hardware together with firmware support to enable GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is enabled the kernel will not boot on older hardware. It will be hard to diagnose because we will crash very early in the boot (i.e. before the call to start_kernel). Auto-detection might be possible but the performance and code size cost of adding conditional code to the irqflags macros probably makes it impractical. As such it may never be possible to remove this limitation (although it might be possible to find a way to survive long enough to panic and show the results on the console).
This can (and should) be done via patching -- otherwise we risk breaking single kernel image for GICv2+v3.
No benchmarking (see #1 above). Unlike the PSR, updates to the PMR do not self synchronize which requires me to sprinkle isb instructions fairly liberally. I've been told the cost of isb varies from almost-free (A53) to somewhat costly (A57) thus we export this code to reduce kernel performance. However this needs to be quantified and I am currently unable to do this. I'd really like to but don't have any suitable hardware.
There is no code in el1_irq to detect NMI and switch from IRQ to NMI handling. This means all the irq handling machinary is re-entered in order to handle the NMI. This not safe and deadlocks are likely. This is a severe limitation although, in this proof-of-concept work, NMI can only be triggered by SysRq-L or severe kernel damage. This means we just about get away with it for simple test (lockdep detects that we are doing wrong and shows a backtrace). This is definitely the first thing that needs to be tackled to take this code further.
Indeed, and this does look a bit weird at present... it took me a while to figure out where NMIs could possibly be coming from in this series.
Note also that alternative approaches to implementing a pseudo-NMI on arm64 are possible but only through runtime cooperation with other software components in the system, potentially both those running at EL3 and at secure EL1. I should like to explore these options in future but,
For the KVM case, vFIQ is an obvious choice, but you're correct that all other scenarios would require cooperation from a separate hypervisor/ firmware etc.
Ideally, we should avoid having multiple ways of implementing the same thing.
as far as I know, this is the only sane way to provide NMI-like features whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3 firmware (and already implemented by ARM trusted firmware).
Even that would require more of the memory-mapped GIC CPU interface to be NS-accessible than is likely to be the case on product platforms. Note also that the memory-mapped interface is not mandated for GICv3, so some platforms may simply not have it.
Some other generalities that don't seem to be addressed yet:
* How are NMIs prioritised with respect to other interrupts and exceptions? This needs to be concretely specified. A sensible answer would probably be that the effect is to split the existing single-priority IRQ into two bands: ordinary IRQs and NMIs. Prioritisation against FIQ and other exceptions would be unaffected.
I think this is effectively what you've implemented so far.
* Should it be possible to map SPIs as NMIs? How would they be configured/registed? Should it be possible to register multiple interrupts as NMIs?
* What about interrupt affinity?
Some other points:
* I feel uneasy about using reserved SPSR fields to store information. This is probably OK for now, but it might be cleaner simply to save/restore the PMR directly.
Providing that the affected bit is cleared before writing to the SPSR (as you do already in kernel_exit) looks workable, but wonder whether the choice of bit should be UAPI -- it may have to change in the future.
* You can probably thin out the ISBs.
I believe that the via the system register interface, the GICC PMR is intended to be self-synchronising.
* The value BPR resets to is implementation-dependent. It should be initialised on each CPU if we are going to rely on its value, on all platforms. This isn't specific to FVP.
* Is ICC_CTLR_EL1.CBPR set correctly?
Cheers ---Dave
On 20/03/15 15:45, Dave Martin wrote:
On Wed, Mar 18, 2015 at 02:20:21PM +0000, Daniel Thompson wrote:
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The pseudo-NMI changes are support by a prototype implementation of arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Hi there,
Interesting series -- I've not gone into all the code in detail yet, but I have some high-level comments and questions first.
In addition to the arm64 changes I've bundled in a few patches from other patchsets to make the patchset self-contained. Of particular note of the serial break emulation patch which allows ^B^R^K to be used instead of a serial break to trigger SysRq-L (FVP UART sockets don't seem to support serial breaks). This makes it easy to run arch_trigger_all_cpu_backtrace from an IRQ handler (i.e. somewhere with interrupts masked so we are forced to preempt and take the NMI).
The code works-for-me (tm) but there are currently some pretty serious limitations.
Exercised only on the foundation model with gicv3 support. It has not been exercised on real silicon or even on the more advanced ARM models.
It has been written without any documentation describing GICv3 architecture (which has not yet been released by ARM). I've been guessing about the behaviour based on the ARMv8 and GICv2 architecture specs. The code works on the foundation model but I cannot check that it conforms architecturally.
Review is the best way to approact that for now. If the code can be demonstrated on the model, that's a good start.
- Requires GICv3+ hardware together with firmware support to enable GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is enabled the kernel will not boot on older hardware. It will be hard to diagnose because we will crash very early in the boot (i.e. before the call to start_kernel). Auto-detection might be possible but the performance and code size cost of adding conditional code to the irqflags macros probably makes it impractical. As such it may never be possible to remove this limitation (although it might be possible to find a way to survive long enough to panic and show the results on the console).
This can (and should) be done via patching -- otherwise we risk breaking single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and rewrite them) or simply implementing irqflags with an ops table? If the former I didn't look at this because I didn't release we could do that...
No benchmarking (see #1 above). Unlike the PSR, updates to the PMR do not self synchronize which requires me to sprinkle isb instructions fairly liberally. I've been told the cost of isb varies from almost-free (A53) to somewhat costly (A57) thus we export this code to reduce kernel performance. However this needs to be quantified and I am currently unable to do this. I'd really like to but don't have any suitable hardware.
There is no code in el1_irq to detect NMI and switch from IRQ to NMI handling. This means all the irq handling machinary is re-entered in order to handle the NMI. This not safe and deadlocks are likely. This is a severe limitation although, in this proof-of-concept work, NMI can only be triggered by SysRq-L or severe kernel damage. This means we just about get away with it for simple test (lockdep detects that we are doing wrong and shows a backtrace). This is definitely the first thing that needs to be tackled to take this code further.
Indeed, and this does look a bit weird at present... it took me a while to figure out where NMIs could possibly be coming from in this series.
My plan was to check the running priority register early in el1_irq and branch to a handler specific to NMI when the priority indicates we are handling a pseudo-NMI.
Note also that alternative approaches to implementing a pseudo-NMI on arm64 are possible but only through runtime cooperation with other software components in the system, potentially both those running at EL3 and at secure EL1. I should like to explore these options in future but,
For the KVM case, vFIQ is an obvious choice, but you're correct that all other scenarios would require cooperation from a separate hypervisor/ firmware etc.
Ideally, we should avoid having multiple ways of implementing the same thing.
as far as I know, this is the only sane way to provide NMI-like features whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3 firmware (and already implemented by ARM trusted firmware).
Even that would require more of the memory-mapped GIC CPU interface to be NS-accessible than is likely to be the case on product platforms. Note also that the memory-mapped interface is not mandated for GICv3, so some platforms may simply not have it.
Perhaps I used clumsy phrasing here.
There is a main difference I care about is between runtime cooperation and boot-time cooperation. The approach I have taken in the patchset requires boot time cooperation (to configure GIC appropriately) but no runtime cooperation.
Some other generalities that don't seem to be addressed yet:
How are NMIs prioritised with respect to other interrupts and exceptions? This needs to be concretely specified. A sensible answer would probably be that the effect is to split the existing single-priority IRQ into two bands: ordinary IRQs and NMIs. Prioritisation against FIQ and other exceptions would be unaffected.
I think this is effectively what you've implemented so far.
Pretty much. Normal interrupts run at the default priority and NMIs run at default priority but with bit 6 cleared.
In addition I would expect most kernel exception handlers to unmask the I-bit as soon as the context has been saved. This allows them to be pre-empted by an NMI.
- Should it be possible to map SPIs as NMIs? How would they be configured/registed? Should it be possible to register multiple interrupts as NMIs?
Yes, although not quite yet.
The work on arm64 is following in the footsteps of similar work for arm.
My initial ideas are here (although as you can see from the review I've got a *long* way to go): http://thread.gmane.org/gmane.linux.kernel/1871163
However the basic theme is:
1. Use existing interrupt API to register NMI handlers (with special flag).
2. Flag makes callback to irqchip driver. In case of GICv3 this would alter the priority of interrupt (for ARMv7+FIQ it would also change interrupt group but this is not needed on ARMv8+GICv3).
3. "Simple" demux. We cannot traverse all the IRQ data structures from NMI due to locks within the structures so we need some simplifying assumptions. My initial simplifying assumptions were:
a) NMI only for first level interrupt controller (i.e. peripherals directly attached to this controller).
b) No shared interrupt lines.
Based on tglx's review I'm working on the basis that b) above is simplication too many but I've not yet had the chance to go back and have anyother go.
As you can see from the reviews I have a bit of work to do in orde
- What about interrupt affinity?
It should "just work" as normal if I can get the rest of the interrupt system right. Do you foresee a specific problem?
Some other points:
I feel uneasy about using reserved SPSR fields to store information. This is probably OK for now, but it might be cleaner simply to save/restore the PMR directly.
Providing that the affected bit is cleared before writing to the SPSR (as you do already in kernel_exit) looks workable, but wonder whether the choice of bit should be UAPI -- it may have to change in the future.
I agree we ought to keep this out of the uapi.
Regarding stealing a bit from the SPSR this was mostly an implementation convenience. It might be interesting (eventually) to benchmark it against a more obvious approach.
You can probably thin out the ISBs.
I believe that the via the system register interface, the GICC PMR is intended to be self-synchronising.
That sounds great. I've just found the relevant line in the ARMv8 manual... I'd overlooked that before.
- The value BPR resets to is implementation-dependent. It should be initialised on each CPU if we are going to rely on its value, on all platforms. This isn't specific to FVP.
Really? As mentioned I only have a GICv2 spec but on that revision the reset value is the minimum supported value (i.e. the same effect as attempting to set it to zero). In other words it is technically implementation-dependent but nevertheless defaults to a setting that avoids any weird "globbing" effect on the interrupt priorities.
On FVP something has reached in and changed the BPR for CPU0 from its proper reset value (all oter CPUs have correct reset value). Of course that could be the firmware rather than the FVP itself that has caused this.
I guess it is good practice for a driver to re-establish the reset value for register it owns and cares about but nevertheles I still expect this register to as-reset when we handover to the kernel.
- Is ICC_CTLR_EL1.CBPR set correctly?
I've never checked. On GICv2 that would be secure state so to be honest I didn't think its value is any of my business.
Cheers ---Dave
Apologies for the slow reply... :/
Anyway,
On Mon, Mar 23, 2015 at 06:47:53PM +0000, Daniel Thompson wrote:
On 20/03/15 15:45, Dave Martin wrote:
On Wed, Mar 18, 2015 at 02:20:21PM +0000, Daniel Thompson wrote:
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The pseudo-NMI changes are support by a prototype implementation of arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Minor nit: the "pseudo NMI" terminology could lead to confusion if something more closely resembling a real NMI comes along.
I'll have to have a think, but nothing comes to mind right now...
[...]
- Requires GICv3+ hardware together with firmware support to enable GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is enabled the kernel will not boot on older hardware. It will be hard to diagnose because we will crash very early in the boot (i.e. before the call to start_kernel). Auto-detection might be possible but the performance and code size cost of adding conditional code to the irqflags macros probably makes it impractical. As such it may never be possible to remove this limitation (although it might be possible to find a way to survive long enough to panic and show the results on the console).
This can (and should) be done via patching -- otherwise we risk breaking single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and rewrite them) or simply implementing irqflags with an ops table? If the former I didn't look at this because I didn't release we could do that...
A generic patching framework was introduced by Andre Przywara in this patch:
e039ee4 arm64: add alternative runtime patching
I believe you should be able to use this to patch between DAIF and ICC_PMR accesses.
You should be able to find examples of this framework being used by grepping. I've not played with it myself yet.
[...]
- There is no code in el1_irq to detect NMI and switch from IRQ to NMI handling. This means all the irq handling machinary is re-entered in order to handle the NMI. This not safe and deadlocks are likely. This is a severe limitation although, in this proof-of-concept work, NMI can only be triggered by SysRq-L or severe kernel damage. This means we just about get away with it for simple test (lockdep detects that we are doing wrong and shows a backtrace). This is definitely the first thing that needs to be tackled to take this code further.
Indeed, and this does look a bit weird at present... it took me a while to figure out where NMIs could possibly be coming from in this series.
My plan was to check the running priority register early in el1_irq and branch to a handler specific to NMI when the priority indicates we are handling a pseudo-NMI.
Sounds reasonable.
Note also that alternative approaches to implementing a pseudo-NMI on arm64 are possible but only through runtime cooperation with other software components in the system, potentially both those running at EL3 and at secure EL1. I should like to explore these options in future but,
For the KVM case, vFIQ is an obvious choice, but you're correct that all other scenarios would require cooperation from a separate hypervisor/ firmware etc.
Ideally, we should avoid having multiple ways of implementing the same thing.
as far as I know, this is the only sane way to provide NMI-like features whilst being implementable entirely in non-secure EL1[1]
[1] Except for a single register write to ICC_SRE_EL3 by the EL3 firmware (and already implemented by ARM trusted firmware).
Even that would require more of the memory-mapped GIC CPU interface to be NS-accessible than is likely to be the case on product platforms. Note also that the memory-mapped interface is not mandated for GICv3, so some platforms may simply not have it.
Perhaps I used clumsy phrasing here.
There is a main difference I care about is between runtime cooperation and boot-time cooperation. The approach I have taken in the patchset requires boot time cooperation (to configure GIC appropriately) but no runtime cooperation.
I think that's reasonable. Any new boot requirements will need to be documented (probably in booting.txt) as part of the final series and alongside the relevant Kconfig option.
Some other generalities that don't seem to be addressed yet:
How are NMIs prioritised with respect to other interrupts and exceptions? This needs to be concretely specified. A sensible answer would probably be that the effect is to split the existing single-priority IRQ into two bands: ordinary IRQs and NMIs. Prioritisation against FIQ and other exceptions would be unaffected.
I think this is effectively what you've implemented so far.
Pretty much. Normal interrupts run at the default priority and NMIs run at default priority but with bit 6 cleared.
In addition I would expect most kernel exception handlers to unmask the I-bit as soon as the context has been saved. This allows them to be pre-empted by an NMI.
Yep, that matches my expectation.
- Should it be possible to map SPIs as NMIs? How would they be configured/registed? Should it be possible to register multiple interrupts as NMIs?
Yes, although not quite yet.
The work on arm64 is following in the footsteps of similar work for arm.
My initial ideas are here (although as you can see from the review I've got a *long* way to go): http://thread.gmane.org/gmane.linux.kernel/1871163
However the basic theme is:
Use existing interrupt API to register NMI handlers (with special flag).
Flag makes callback to irqchip driver. In case of GICv3 this would alter the priority of interrupt (for ARMv7+FIQ it would also change interrupt group but this is not needed on ARMv8+GICv3).
"Simple" demux. We cannot traverse all the IRQ data structures from NMI due to locks within the structures so we need some simplifying assumptions. My initial simplifying assumptions were:
a) NMI only for first level interrupt controller (i.e. peripherals directly attached to this controller).
b) No shared interrupt lines.
Do other arches have ways of addressing the same problems?
Based on tglx's review I'm working on the basis that b) above is simplication too many but I've not yet had the chance to go back and have anyother go.
I think that it's best to avoid adding arbitrary restrictions that make this look excessively different from working with a regular irqchip, unless there is really some fundamental constraint in play.
As you can see from the reviews I have a bit of work to do in orde
- What about interrupt affinity?
It should "just work" as normal if I can get the rest of the interrupt system right. Do you foresee a specific problem?
So long as NMI-ness is just an extra flag when registering an interrupt, things should probably work. I was wondering about special cases like perf (PPI on sensible SoCs) versus, say, debug UART (SPI).
Some other points:
I feel uneasy about using reserved SPSR fields to store information. This is probably OK for now, but it might be cleaner simply to save/restore the PMR directly.
Providing that the affected bit is cleared before writing to the SPSR (as you do already in kernel_exit) looks workable, but wonder whether the choice of bit should be UAPI -- it may have to change in the future.
I agree we ought to keep this out of the uapi.
Regarding stealing a bit from the SPSR this was mostly an implementation convenience. It might be interesting (eventually) to benchmark it against a more obvious approach.
I think your current approach is OK for now, at least while the series is under development.
You can probably thin out the ISBs.
I believe that the via the system register interface, the GICC PMR is intended to be self-synchronising.
That sounds great. I've just found the relevant line in the ARMv8 manual... I'd overlooked that before.
- The value BPR resets to is implementation-dependent. It should be initialised on each CPU if we are going to rely on its value, on all platforms. This isn't specific to FVP.
Really? As mentioned I only have a GICv2 spec but on that revision the reset value is the minimum supported value (i.e. the same effect as attempting to set it to zero). In other words it is technically implementation-dependent but nevertheless defaults to a setting that avoids any weird "globbing" effect on the interrupt priorities.
On FVP something has reached in and changed the BPR for CPU0 from its proper reset value (all oter CPUs have correct reset value). Of course that could be the firmware rather than the FVP itself that has caused this.
Quite possibly. Of course, there is a strong possibility that some real firmware will also do this (and never get fixed).
Forcing BPR to a sane state from Linux makes sense, since we can do it.
I guess it is good practice for a driver to re-establish the reset value for register it owns and cares about but nevertheles I still expect this register to as-reset when we handover to the kernel.
- Is ICC_CTLR_EL1.CBPR set correctly?
I've never checked. On GICv2 that would be secure state so to be honest I didn't think its value is any of my business.
If we have a dependency on how this is set up, it needs to be documented alongside the other booting requirements.
Cheers ---Dave
On 01/04/15 16:15, Dave P Martin wrote:
Apologies for the slow reply... :/
Anyway,
On Mon, Mar 23, 2015 at 06:47:53PM +0000, Daniel Thompson wrote:
On 20/03/15 15:45, Dave Martin wrote:
On Wed, Mar 18, 2015 at 02:20:21PM +0000, Daniel Thompson wrote:
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The pseudo-NMI changes are support by a prototype implementation of arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Minor nit: the "pseudo NMI" terminology could lead to confusion if something more closely resembling a real NMI comes along.
I'll have to have a think, but nothing comes to mind right now...
[...]
- Requires GICv3+ hardware together with firmware support to enable GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is enabled the kernel will not boot on older hardware. It will be hard to diagnose because we will crash very early in the boot (i.e. before the call to start_kernel). Auto-detection might be possible but the performance and code size cost of adding conditional code to the irqflags macros probably makes it impractical. As such it may never be possible to remove this limitation (although it might be possible to find a way to survive long enough to panic and show the results on the console).
This can (and should) be done via patching -- otherwise we risk breaking single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and rewrite them) or simply implementing irqflags with an ops table? If the former I didn't look at this because I didn't release we could do that...
A generic patching framework was introduced by Andre Przywara in this patch:
e039ee4 arm64: add alternative runtime patching
I believe you should be able to use this to patch between DAIF and ICC_PMR accesses.
You should be able to find examples of this framework being used by grepping. I've not played with it myself yet.
To follow-up on this, I have a few patches queued that use the runtime patching code to deal with GICv3 in KVM:
http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/616
The first few patches are already queued for v4.1, and the rest should follow shortly after.
Cheers,
M.
On 01/04/15 16:29, Marc Zyngier wrote:
On 01/04/15 16:15, Dave P Martin wrote:
Apologies for the slow reply... :/
Anyway,
On Mon, Mar 23, 2015 at 06:47:53PM +0000, Daniel Thompson wrote:
On 20/03/15 15:45, Dave Martin wrote:
On Wed, Mar 18, 2015 at 02:20:21PM +0000, Daniel Thompson wrote:
This patchset provides a pseudo-NMI for arm64 kernels by reimplementing the irqflags macros to modify the GIC PMR (the priority mask register is accessible as a system register on GICv3 and later) rather than the PSR. The pseudo-NMI changes are support by a prototype implementation of arch_trigger_all_cpu_backtrace that allows the new code to be exercised.
Minor nit: the "pseudo NMI" terminology could lead to confusion if something more closely resembling a real NMI comes along.
I'll have to have a think, but nothing comes to mind right now...
[...]
- Requires GICv3+ hardware together with firmware support to enable GICv3 features at EL3. If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is enabled the kernel will not boot on older hardware. It will be hard to diagnose because we will crash very early in the boot (i.e. before the call to start_kernel). Auto-detection might be possible but the performance and code size cost of adding conditional code to the irqflags macros probably makes it impractical. As such it may never be possible to remove this limitation (although it might be possible to find a way to survive long enough to panic and show the results on the console).
This can (and should) be done via patching -- otherwise we risk breaking single kernel image for GICv2+v3.
Do you mean real patching (hunting down all those inlines and rewrite them) or simply implementing irqflags with an ops table? If the former I didn't look at this because I didn't release we could do that...
A generic patching framework was introduced by Andre Przywara in this patch:
e039ee4 arm64: add alternative runtime patching
I believe you should be able to use this to patch between DAIF and ICC_PMR accesses.
You should be able to find examples of this framework being used by grepping. I've not played with it myself yet.
To follow-up on this, I have a few patches queued that use the runtime patching code to deal with GICv3 in KVM:
http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/616
The first few patches are already queued for v4.1, and the rest should follow shortly after.
Thanks (both).
That's really helpful: links for that sort of thing are not easily googleable (things like kpatch floods the results).
linaro-kernel@lists.linaro.org