On Wed, Apr 03 2024 at 12:35, John Stultz wrote:
On Wed, Apr 3, 2024 at 12:10 PM Thomas Gleixner tglx@linutronix.de wrote:
On Wed, Apr 03 2024 at 11:16, John Stultz wrote:
On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner tglx@linutronixde wrote: Thanks for this, Thomas!
Just FYI: testing with 6.1, the test no longer hangs, but I don't see the SKIP behavior. It just fails: not ok 6 check signal distribution # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
I've not had time yet to dig into what's going on, but let me know if you need any further details.
That's weird. I ran it on my laptop with 6.1.y ...
What kind of machine is that?
I was running it in a VM.
Interestingly with 64cpus it sometimes will do the skip behavior, but with 4 cpus it seems to always fail.
Duh, yes. The problem is that any thread might grab the signal as it is process wide.
What was I thinking? Not much obviously.
The distribution mechanism is only targeting the wakeup at signal queuing time and therefore avoids the wakeup of idle tasks. But it does not guarantee that the signal is evenly distributed to the threads on actual signal delivery.
Even with the change to stop the worker threads when they got a signal it's not guaranteed that the last worker will actually get one within the timeout simply because the main thread can win the race to collect the signal every time. I just managed to make the patched test fail in one out of 100 runs.
IOW, we cannot test this reliably at all with the current approach.
I'll think about it tomorrow again with brain awake.
Thanks,
tglx