On 1/31/2019 3:39 PM, Logan Gunthorpe wrote:
On 2019-01-31 1:58 p.m., Dave Jiang wrote:
On 1/31/2019 1:48 PM, Logan Gunthorpe wrote:
On 2019-01-31 1:20 p.m., Dave Jiang wrote:
Does this work when the system moves the MSI vector either via software (irqbalance) or BIOS APIC programming (some modes cause round robin behavior)?
I don't know how irqbalance works, and I'm not sure what you are referring to by BIOS APIC programming, however I would expect these things would not be a problem.
The MSI code I'm presenting here doesn't do anything crazy with the interrupts, it allocates and uses them just as any PCI driver would. The only real difference here is that instead of a piece of hardware sending the IRQ TLP, it will be sent through the memory window (which, from the OS's perspective, is just coming from an NTB hardware proxy alias).
Logan
Right. I did that as a hack a while back for some silicon errata workaround. When the vector moves, the address for the LAPIC changes. So unless it gets updated, you end up writing to the old location and lose all the new interrupts. irqbalance is a user daemon that rotates the system interrupts around to ensure that not all interrupts are pinned on a single core.
Yes, that would be a problem if something changes the MSI vectors out from under us. Seems like that would be a bit difficult to do even with regular hardware. So far I haven't seen anything that would do that. If you know of where in the kernel this happens I'd be interested in getting a pointer to the flow in the code. If that is the case this MSI stuff will need to get much more complicated...
I believe irqbalance writes to the file /proc/irq/N/smp_affinity. So maybe take a look at the code that starts from there and see if it would have any impact on your stuff.
I think it's enabled by default on several distros. Although MSIX has nothing to do with the IOAPIC, the mode that the APIC is programmed can have an influence on how the interrupts are delivered. There are certain Intel platforms (I don't know if AMD does anything like that) puts the IOAPIC in a certain configuration that causes the interrupts to be moved in a round robin fashion. I think it's physical flat mode? I don't quite recall. Normally on the low end Xeons. It's probably worth doing a test run with the irqbalance daemon running and make sure you traffic stream doesn't all of sudden stop.
I've tested with irqbalance running and haven't found any noticeable difference.
Logan