On Thu, 2018-11-08 at 11:46 +0000, Gustavo Pimentel wrote:
On 07/11/2018 18:32, Trent Piepho wrote:
On Wed, 2018-11-07 at 12:57 +0000, Gustavo Pimentel wrote:
On 06/11/2018 16:00, Marc Zyngier wrote:
On 06/11/18 14:53, Lorenzo Pieralisi wrote:
On Sat, Oct 27, 2018 at 12:00:57AM +0000, Trent Piepho wrote:
This gives the following race scenario:
- An MSI is received by, and the status bit for the MSI is set in, the
DWC PCI-e controller. 2. dw_handle_msi_irq() calls a driver's registered interrupt handler for the MSI received. 3. At some point, the interrupt handler must decide, correctly, that there is no more work to do and return. 4. The hardware generates a new MSI. As the MSI's status bit is still set, this new MSI is ignored. 6. dw_handle_msi_irq() unsets the MSI status bit.
The MSI received at point 4 will never be acted upon. It occurred after the driver had finished checking the hardware status for interrupt conditions to act on. Since the MSI status was masked, it does not generated a new IRQ, neither when it was received nor when the MSI is unmasked.
This status register indicates whether exists or not a MSI interrupt on that controller [0..7] to be handle.
While the status for an MSI is set, no new interrupt will be triggered
Yes
if another identical MSI is received, correct?
You cannot receive another identical MSI till you acknowledge the current one (This is ensured by the PCI protocol, I guess).
I don't believe this is a requirement of PCI. We designed our hardware to not send another MSI until our hardware's interrupt status register is read, but we didn't have to do that.
In theory, we should clear the interrupt flag only after the interrupt has actually handled (which can take some time to process on the worst case scenario).
But see above, there is a race if a new MSI arrives while still masked. I can see no possible way to solve this in software that does not involve unmasking the MSI before calling the handler. To leave the interrupt masked while calling the handler requires the hardware to queue an interrupt that arrives while masked. We have no docs, but the designware controller doesn't appear to do this in practice.
See my reply to Marc about the interrupt masking. Like you said, probably the solution pass through using interrupt mask/unmask register instead of interrupt enable/disable register.
Can you do a quick test, since you can easily reproduce the issue? Can you change register offset on both functions dw_pci_bottom_mask() and dw_pci_bottom_unmask()?
Basically exchange the PCIE_MSI_INTR0_ENABLE register by PCIE_MSI_INTR0_MASK.
Of course MSI still need to be enabled to work at all, which happens once when the driver using the MSI registers a handler. Masking can be done via mask register after that.
It is not so easy for me to test on the newest kernel, as imx7d does not work due to yet more bugs. I have to port a set of patches to each new kernel. A set that does not shrink due to holdups like this.
I understand the new flow would look like this (assume dw controller MSI interrupt output signal is connected to one of the ARM GIC interrupt lines, there could be different or more controllers above the dwc of course (but usually aren't)):
1. MSI arrives, status bit is set in dwc, interrupt raised to GIC. 2. dwc handler runs 3. dwc handler sees status bit is set for a(n) MSI(s) 4. dwc handler sets mask for those MSIs 5. dwc handler clears status bit 6. dwc handler runs driver handler for the received MSI 7. ** an new MSI arrives, racing with 6 ** 8. status bit becomes set again, but no interrupt is raised due to mask 9. dwc handler unmasks MSI, which raises the interrupt to GIC because of new MSI received in 7. 10. The original GIC interrupt is EOI'ed. 11. The interrupt for the dwc is re-raised by the GIC due to 9, and we go back to 2.
It is very important that 5 be done before 6. Less so 4 before 5, but reversing the order there would allow re-raising even if the 2nd MSI arrived before the driver handler ran, which is not necessary.
I do not see a race in this design and it appears correct to me. But, I also do not think there is any immediate improvement due to extra steps of masking and unmasking the MSI.
The reason is that the GIC interrupt above the dwc is non-reentrant. It remains masked (aka active[1]) during this entire process (1 to 10). This means every MSI is effectively already masked. So masking the active MSI(s) a 2nd time gains nothing besides preventing some extra edges for a masked interrupt going to the ARM GIC.
In theory, if the GIC interrupt handler was reentrant, then on receipt of a new MSI we could re-enter the dwc handler on a different CPU and run the new MSI (a different MSI!) at the same time as the original MSI handler is still running.
There difference here is that by unmasking in the interrupt in the GIC before the dwc handler is finished, masking an individual MSI in the dwc is no longer a 2nd redundant masking.
[1] When I say masked in GIC, I mean the interrupt is in the "active" or "active and pending" states. In these states the interrupt will not be raised to the CPU and can be considered masked.