Hi,
[ Marc, can you help reviewing? Esp. the first patch? ]
This series of backports from upstream to stable 5.15 and 5.10 fixes an issue we're seeing on AWS ARM instances where attaching an EBS volume (which is a nvme device) to the instance after offlining CPUs causes the device to take several minutes to show up and eventually nvme kworkers and other threads start getting stuck.
This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
An easy reproducer is:
1. Start an ARM instance with 32 CPUs 2. Once the instance is booted, offline all CPUs but CPU 0. Eg: # for i in $(seq 1 32); do chcpu -d $i; done 3. Once the CPUs are offline, attach an EBS volume 4. Watch lsblk and dmesg in the instance
Eventually, you get this stack trace:
[ 71.842974] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802 [ 71.843966] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff] [ 71.845149] pci 0000:00:1f.0: PME# supported from D0 D1 D2 D3hot D3cold [ 71.846694] pci 0000:00:1f.0: BAR 0: assigned [mem 0x8011c000-0x8011ffff] [ 71.848458] ACPI: _SB_.PCI0.GSI3: Enabled at IRQ 38 [ 71.850852] nvme nvme1: pci function 0000:00:1f.0 [ 71.851611] nvme 0000:00:1f.0: enabling device (0000 -> 0002) [ 135.887787] nvme nvme1: I/O 22 QID 0 timeout, completion polled [ 197.328276] nvme nvme1: I/O 23 QID 0 timeout, completion polled [ 197.329221] nvme nvme1: 1/0/0 default/read/poll queues [ 243.408619] INFO: task kworker/u64:2:275 blocked for more than 122 seconds. [ 243.409674] Not tainted 5.15.79 #1 [ 243.410270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.411389] task:kworker/u64:2 state:D stack: 0 pid: 275 ppid: 2 flags:0x00000008 [ 243.412602] Workqueue: events_unbound async_run_entry_fn [ 243.413417] Call trace: [ 243.413797] __switch_to+0x15c/0x1a4 [ 243.414335] __schedule+0x2bc/0x990 [ 243.414849] schedule+0x68/0xf8 [ 243.415334] schedule_timeout+0x184/0x340 [ 243.415946] wait_for_completion+0xc8/0x220 [ 243.416543] __flush_work.isra.43+0x240/0x2f0 [ 243.417179] flush_work+0x20/0x2c [ 243.417666] nvme_async_probe+0x20/0x3c [ 243.418228] async_run_entry_fn+0x3c/0x1e0 [ 243.418858] process_one_work+0x1bc/0x460 [ 243.419437] worker_thread+0x164/0x528 [ 243.420030] kthread+0x118/0x124 [ 243.420517] ret_from_fork+0x10/0x20 [ 258.768771] nvme nvme1: I/O 20 QID 0 timeout, completion polled [ 320.209266] nvme nvme1: I/O 21 QID 0 timeout, completion polled
For completion, I tested the same test-case on x86 with this series applied on 5.15.79 and 5.10.155 as well. It works as expected.
Thanks,
Marc Zyngier (4): genirq/msi: Shutdown managed interrupts with unsatifiable affinities genirq: Always limit the affinity to online CPUs irqchip/gic-v3: Always trust the managed affinity provided by the core code genirq: Take the proposed affinity at face value if force==true
drivers/irqchip/irq-gic-v3-its.c | 2 +- kernel/irq/manage.c | 31 +++++++++++++++++++++++-------- kernel/irq/msi.c | 7 +++++++ 3 files changed, 31 insertions(+), 9 deletions(-)