On 20/Mar/2024 Palmer Dabbelt wrote:
On Tue, 13 Feb 2024 02:26:40 PST (-0800), tglx@linutronix.de wrote:
Nam!
On Wed, Jan 31 2024 at 09:19, Nam Cao wrote:
RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as explained in the description of Interrupt Completion in the PLIC spec:
"The PLIC signals it has completed executing an interrupt handler by writing the interrupt ID it received from the claim to the claim/complete register. The PLIC does not check whether the completion ID is the same as the last claim ID for that target. If the completion ID does not match an interrupt source that *is currently enabled* for the target, the completion is silently ignored."
Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked") ensured that EOI is successful by enabling interrupt first, before EOI.
Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations") removed the interrupt enabling code from the previous commit, because it assumes that interrupt should already be enabled at the point of EOI. However, this is incorrect: there is a window after a hart claiming an interrupt and before irq_desc->lock getting acquired, interrupt can be disabled during this window. Thus, EOI can be invoked while the interrupt is disabled, effectively nullify this EOI. This results in the interrupt never gets asserted again, and the device who uses this interrupt appears frozen.
Nice detective work!
Make sure that interrupt is really enabled before EOI.
Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations") Cc: stable@vger.kernel.org Signed-off-by: Nam Cao namcao@linutronix.de
v2:
- add unlikely() for optimization
- re-word commit message to make it clearer
drivers/irqchip/irq-sifive-plic.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c index e1484905b7bd..0a233e9d9607 100644 --- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d) { struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
- writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
- if (unlikely(irqd_irq_disabled(d))) {
plic_toggle(handler, d->hwirq, 1);
writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
plic_toggle(handler, d->hwirq, 0);
It's unfortunate to have this condition in the hotpath, though it should be cache hot, easy to predict and compared to the writel() completely in the noise.
Ya, I think it's fine.
I guess we could try and play some tricks. Maybe hide the load latency with a relaxed writel and some explict fencing, or claim interrupts when
^ you mean complete?
enabling them. Those both seem somewhat race-prone, though, so I'm not even sure if they're sane.
The latter option is what I also have in mind. Just need to make sure the interrupt is masked and we should be safe. Though there is the question of whether it's worth the effort.
I may do that one day when I stop being lazy.
Best regards, Nam