Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids

6 Feb 2021


      Hi Juergen,
On 06/02/2021 10:49, Juergen Gross wrote:
...
The first three patches are fixes for XSA-332. The avoid WARN splats
and a performance issue with interdomain events.
Thanks for helping to figure out the problem. Unfortunately, I still see 
reliably the WARN splat with the latest Linux master (1e0d27fce010) + 
your first 3 patches.
I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L 
events ABI.
After some debugging, I think I have an idea what's went wrong. The 
problem happens when the event is initially bound from vCPU0 to a 
different vCPU.
From the comment in xen_rebind_evtchn_to_cpu(), we are masking the 
event to prevent it being delivered on an unexpected vCPU. However, I 
believe the following can happen:
vCPU0				| vCPU1
    			|
    			| Call xen_rebind_evtchn_to_cpu()
receive event X			|
    			| mask event X
    			| bind to vCPU1
<vCPU descheduled>		| unmask event X
    			|
    			| receive event X
    			|
    			| handle_edge_irq(X)
handle_edge_irq(X)		|  -> handle_irq_event()
    			|   -> set IRQD_IN_PROGRESS
  -> set IRQS_PENDING		|
    			|   -> evtchn_interrupt()
    			|   -> clear IRQD_IN_PROGRESS
    			|  -> IRQS_PENDING is set
    			|  -> handle_irq_event()
    			|   -> evtchn_interrupt()
    			|     -> WARN()
    			|
All the lateeoi handlers expect a ONESHOT semantic and 
evtchn_interrupt() is doesn't tolerate any deviation.
I think the problem was introduced by 7f874a0447a9 ("xen/events: fix 
lateeoi irq acknowledgment") because the interrupt was disabled 
previously. Therefore we wouldn't do another iteration in handle_edge_irq().
Aside the handlers, I think it may impact the defer EOI mitigation 
because in theory if a 3rd vCPU is joining the party (let say vCPU A 
migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, irq_epoch, 
eoi_time} could possibly get mangled?
For a fix, we may want to consider to hold evtchn_rwlock with the write 
permission. Although, I am not 100% sure this is going to prevent 
everything.
Does my write-up make sense to you?
Cheers,
-- 
Julien Grall

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 0/7] xen/events: bug fixes and some diagnostic aids