On 8/5/24 01:56, Thomas Gleixner wrote:
On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote:
On 8/4/24 11:36, Guenter Roeck wrote:
Rafael J. Wysocki rafael.j.wysocki@intel.com genirq: Set IRQF_COND_ONESHOT in request_irq()
With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
[ 0.000000] ============================================================================= [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 [ 0.000000] -----------------------------------------------------------------------------
Do you have a full boot log? It's unclear to me at which point of the boot process this happens. Is this before or after the secondary CPUs have been brought up?
This never stops until the emulation aborts.
Do you have a recipe how to reproduce?
Reverting this patch fixes the problem for me.
I noticed a similar problem in the mainline kernel but it is either spurious there or the problem has been fixed.
As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me. I guess that suggests some kind of race condition.
@@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler, struct irq_desc *desc; int retval;
udelay(1);
if (irq == IRQ_NOTCONNECTED) return -ENOTCONN;
That all makes absolutely no sense to me.
Same here, really. I can reproduce the problem with v6.10.3, using my configuration, but whatever debugging I add makes the problem disappear. I had seen the same problem on mainline with v6.11-rc1-272-g17712b7ea075. Log is at https://kerneltests.org/builders/qemu-parisc64-master/builds/168/steps/qemub... However, I can no longer reproduce it there. What makes it even more weird / odd is that I can bisect the problem between v6.10.2 and v6.10.3 and it points to this commit, but reproducing it outside that chain seems to be all but impossible.
Guenter
IRQF_COND_ONESHOT has only an effect on shared interrupts, when the interrupt was already requested with IRQF_ONESHOT.
If this is really a race then the following must be true:
no delay
CPU0 CPU1 request_irq(IRQF_ONESHOT) request_irq(IRQF_COND_ONESHOT)
delay
CPU0 CPU1 request_irq(IRQF_COND_ONESHOT) request_irq(IRQF_ONESHOT)
In this case the request on CPU 0 fails with -EBUSY ...
Confused
tglx