On Tue, May 24, 2022 at 02:42:04PM +0800, Guoqing Jiang wrote:
From: Vitaly Kuznetsov vkuznets@redhat.com
Backport of commit 2f15d027c05fac406decdb5eceb9ec0902b68f53 upstream.
Async PF 'page ready' event may happen when LAPIC is (temporary) disabled. In particular, Sebastien reports that when Linux kernel is directly booted by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is initialized. On initialization KVM tries to inject 'wakeup all' event and puts the corresponding token to the slot. It is, however, failing to inject an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled()) so the guest never gets notified and the whole APF mechanism gets stuck. The same issue is likely to happen if the guest temporary disables LAPIC and a previously unavailable page becomes available.
Do two things to resolve the issue:
- Avoid dequeuing 'page ready' events from APF queue when LAPIC is disabled.
- Trigger an attempt to deliver pending 'page ready' events when LAPIC becomes enabled (SPIV or MSR_IA32_APICBASE).
Reported-by: Sebastien Boeuf sebastien.boeuf@intel.com Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20210422092948.568327-1-vkuznets@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [Guoqing: backport to 5.10-stable ] Signed-off-by: Guoqing Jiang guoqing.jiang@linux.dev
Hi,
We encountered below task hang issue with 5.10.113 stable kernel.
[ 246.845061] INFO: task rpmbuild:2303 blocked for more than 122 seconds. [ 246.846269] Not tainted 5.10.113-1.1.se2-default #1 [ 246.847103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.848248] task:rpmbuild state:D stack: 0 pid: 2303 ppid: 2302 flags:0x00000000 [ 246.848252] Call Trace: [ 246.848266] __schedule+0x3f6/0x7c0 [ 246.848289] ? __handle_mm_fault+0x3dd/0x6d0 [ 246.848291] schedule+0x46/0xb0 [ 246.848295] kvm_async_pf_task_wait_schedule+0x4b/0x90 [ 246.848297] ? handle_mm_fault+0xbc/0x280 [ 246.848300] __kvm_handle_async_pf+0x4f/0xb0 [ 246.848303] exc_page_fault+0x204/0x540 [ 246.848305] ? asm_exc_page_fault+0x8/0x30 [ 246.848307] asm_exc_page_fault+0x1e/0x30 [ 246.848310] RIP: 0033:0x7f122fbdfc90
And after investigating, this patch resolve the issue. 5.12 stable kernel has already merged it by commit 36825931c607.
Now queued up, thanks.
greg k-h